I'm trying to split a string like aaa:bbb(123) into tokens using Pyparsing.
I can do this with regular expression, but I need to do it via Pyparsing.
With re the solution will look like:
>>> import re
>>> string = 'aaa:bbb(123)'
>>> regex = '(\S+):(\S+)\((\d+)\)'
>>> re.match(regex, string).groups()
('aaa', 'bbb', '123')
This is clear and simple enough. The key point here is \S+ which means "everything except whitespaces".
Now I'll try to do this with Pyparsing:
>>> from pyparsing import Word, Suppress, nums, printables
>>> expr = (
... Word(printables, excludeChars=':')
... + Suppress(':')
... + Word(printables, excludeChars='(')
... + Suppress('(')
... + Word(nums)
... + Suppress(')')
... )
>>> expr.parseString(string).asList()
['aaa', 'bbb', '123']
Okay, we've got the same result, but this does not look good. We've set excludeChars to make Pyparsing expressions to stop where we need, but this doesn't look robust. If we will have "excluded" chars in source string, same regex will work fine:
>>> string = 'a:aa:b(bb(123)'
>>> re.match(regex, string).groups()
('a:aa', 'b(bb', '123')
while Pyparsing exception will obviously break:
>>> expr.parseString(string).asList()
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/long/path/to/pyparsing.py", line 1111, in parseString
raise exc
ParseException: Expected W:(0123...) (at char 7), (line:1, col:8)
So, the question is how can we implement needed logic with Pyparsing?
Use a regex with a look-ahead assertion: