I am trying to parse nested column type definitions such as
1 string
2 struct<col_1:string,col_2:int>
3 row(col_1 string,array(col_2 string),col_3 boolean)
4 array<struct<col_1:string,col_2:int>,col_3:boolean>
5 array<struct<col_1:string,col2:int>>
Using nestedExpr works as expected for cases 1-4, but throws a parse error on case 5. Adding a space between double closing brackets like "> >" seems work, and might be explained by this quote from the author.
By default, nestedExpr will look for space-delimited words of printables https://sourceforge.net/p/pyparsing/bugs/107/
I'm mostly looking for alternatives to pre and post processing the input string
type_str = type_str.replace(">", "> ")
# parse string here
type_str = type_str.replace("> ", ">")
I've tried using the infix_notation but I haven't been able to figure out how to use it in this situation. I'm probably just using this the wrong way...
Code snippet
array_keyword = pp.Keyword('array')
row_keyword = pp.Keyword('row')
struct_keyword = pp.Keyword('struct')
nest_open = pp.Word('<([')
nest_close = pp.Word('>)]')
col_name = pp.Word(pp.alphanums + '_')
col_type = pp.Forward()
col_type_delimiter = pp.Word(':') | pp.White(' ')
column = col_name('name') + col_type_delimiter + col_type('type')
col_list = pp.delimitedList(pp.Group(column))
struct_type = pp.nestedExpr(
opener=struct_keyword + nest_open, closer=nest_close, content=col_list | col_type, ignoreExpr=None
)
row_type = pp.locatedExpr(pp.nestedExpr(
opener=row_keyword + nest_open, closer=nest_close, content=col_list | col_type, ignoreExpr=None
))
array_type = pp.nestedExpr(
opener=array_keyword + nest_open, closer=nest_close, content=col_type, ignoreExpr=None
)
col_type <<= struct_type('children') | array_type('children') | row_type('children') | scalar_type('type')
nestedExprandinfixNotationare not really appropriate for this project.nestedExpris generally a short-cut expression for stuff you don't really want to go into details parsing, you just want to detect and step over some chunk of text that happens to have some nesting in opening and closing punctuation.infixNotationis intended for parsing expressions with unary and binary operators, usually some kind of arithmetic. You might be able to treat the punctuation in your grammar as operators, but it is a stretch, and definitely doing things the hard way.For your project, you will really need to define the different elements, and it will be a recursive grammar (since the array and struct types will themselves be defined in terms of other types, which could also be arrays or structs).
I took a stab at a BNF, for a subset of your grammar using scalar types
int,float,boolean, andstring, and compound typesarrayandstruct, with just the '<' and '>' nesting punctuation. An array will take a single type argument, to define the type of the elements in the array. A struct will take one or more struct fields, where each field is anidentifier:typepair.(If you later want to add a row definition also, think about what the row is supposed to look like, and how its elements would be defined, and then add it to this BNF.)
You look pretty comfortable with the basics of pyparsing, so I'll just start you off with some intro pieces, and then let you fill in the rest.
Once you have that finished, try it out with some tests:
And you should get something like:
Once you have that, you can play around with extending it to other types, such as your row type, maybe unions, or arrays that take multiple types (if that was your intention in your posted example). Always start by updating the BNF - then the changes you'll need to make in the code will generally follow.