Greedy expressions in Pyparsing

Question

Greedy expressions in Pyparsing

593 Views Asked by oblalex At 22 November 2014 at 20:26

I'm trying to split a string like aaa:bbb(123) into tokens using Pyparsing.

I can do this with regular expression, but I need to do it via Pyparsing.

With re the solution will look like:

>>> import re
>>> string = 'aaa:bbb(123)'
>>> regex = '(\S+):(\S+)\((\d+)\)'
>>> re.match(regex, string).groups()
('aaa', 'bbb', '123')

This is clear and simple enough. The key point here is \S+ which means "everything except whitespaces".

Now I'll try to do this with Pyparsing:

>>> from pyparsing import Word, Suppress, nums, printables
>>> expr = (
...     Word(printables, excludeChars=':')
...     + Suppress(':')
...     + Word(printables, excludeChars='(')
...     + Suppress('(')
...     + Word(nums)
...     + Suppress(')')
... )
>>> expr.parseString(string).asList()
['aaa', 'bbb', '123']

Okay, we've got the same result, but this does not look good. We've set excludeChars to make Pyparsing expressions to stop where we need, but this doesn't look robust. If we will have "excluded" chars in source string, same regex will work fine:

>>> string = 'a:aa:b(bb(123)'
>>> re.match(regex, string).groups()
('a:aa', 'b(bb', '123')

while Pyparsing exception will obviously break:

>>> expr.parseString(string).asList()
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/long/path/to/pyparsing.py", line 1111, in parseString
    raise exc
ParseException: Expected W:(0123...) (at char 7), (line:1, col:8)

So, the question is how can we implement needed logic with Pyparsing?

Original Q&A

There are 2 best solutions below

**blaze** · Answer 1 · 2014-11-23T00:02:55.083000

Use a regex with a look-ahead assertion:

from pyparsing import Word, Suppress, Regex, nums, printables

expr = (
     Word(printables, excludeChars=':')
     + Suppress(':')
     + Regex(r'\S+[^\(](?=\()')
     + Suppress('(')
     + Word(nums)
     + Suppress(')')
 )

**PaulMcG** · Answer 2 · 2014-11-23T02:20:49.457000

Unlike regex, pyparsing is purely left-to-right seeking, with no implicit lookahead.

If you want regex's lookahead and backtracking, you could just use a Regex containing your original re:

expr = Regex(r"(\S+):(\S+)\((\d+)\)")
print expr.parseString(string).dump()

['aaa:b(bb(123)']

However, I see that this returns just the whole match as a single string. If you want to be able to access the individual groups, you'll have to define them as named groups:

expr = Regex(r"(?P<field1>\S+):(?P<field2>\S+)\((?P<field3>\d+)\)")
print expr.parseString(string).dump()

['aaa:b(bb(123)']
- field1: aaa
- field2: b(bb
- field3: 123

This suggests to me that a good enhancement would be to add a constructor arg to Regex to return the results as a list of all the re groups rather than the string.

Greedy expressions in Pyparsing

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PYPARSING

Trending Questions

Popular # Hahtags

Popular Questions