I want to parse an externally defined (and undocumented) file format in Python. It looks somewhat similar to TOML, but with different text styles, and no quoting. For example:
[Schedule_Step122]
m_nMaxCurrent=0
m_szAddIn=Relay OFF
m_szLabel=06 - End Charge
m_uLimitNum=2
[Schedule_Step122_Limit0]
Equation0_szCompareSign=>=
Equation0_szRight=F_05_Charge_Capacity
Equation0_szLeft=PV_CHAN_Charge_Capacity
m_bStepLimit=1
m_szGotoStep=End Test
[Schedule_Step122_Limit1]
Equation0_szCompareSign=>=
Equation0_szLeft=PV_CHAN_Voltage
Equation0_szRight=3
m_bStepLimit=1
m_szGotoStep=End Test
(This is Arbin's test schedule format.)
I would like the parsed structure to be something like:
"steps": [
{
"max_current": 0,
"add_in": RELAY_OFF,
"label": "09 - End Charge",
"limits": [
{
"equations": [
{
"left": PV_CHAN_CHARGE_CAPACITY,
"compare_sign": ">=",
"right": F_05_CHARGE_CAPACITY
}
],
"step_limit": 1,
"goto_step": END_TEST
},
{
"equations": [
{
"left": PV_CHAN_VOLTAGE,
"compare_sign": ">=",
"right": 6
}
],
"step_limit": 1,
"goto_step": END_TEST
}
]
}
]
The format seems superficially similar to TOML, including some of the nesting, but the string handling is different. I would also like to capture certain values as named constants.
I was also looking into defining a context-free grammar and using a lexer/parser like ANTLR, PLY, pyparsing, or Lark. I'm familiar with reading grammars in documentation, but haven't written or used one with a parser before. However, I don't know how one would represent the nesting structure (such as Schedule_Step122_Limit0 being a member of Schedule_Step122) or the lack of guaranteed order among related keys (like Equation0_szCompareSign, Equation0_szLeft`, etc).
Is there a generic parsing tool I could write a definition for, which would give me the parsed/structured output? Or is the best approach here to write custom parsing logic?
Tools like ANTLR, PLY, pyparsing, or Lark will give you almost no help with this problem. configparser might help a little, but I suspect it'd be more bother than it's worth.
The following code is close to what you want. You'll need to tweak it based on what you discover about the input-format, and what you'd like for the output-structure.
For the example input, it prints: