Problem statement
Suppose I have two parsers p and q and I concatenate them like this:
r = try p *> q
In Parsec, the behavior of this is:
- If
pfails without consuming input, thenrfails without consuming input. - If
pfails after consuming input, thenrfails without consuming input. - if
qfails without consuming input, thenrfails after consumingp. - if
qfails after consuming input, thenrfails after consumingpand parts ofq.
However, the behavior I'm looking for is a bit unusual:
- If
pfails without consuming input, thenrshould fail without consuming input. - If
pfails after consuming input, thenrshould fail without consuming input. - if
qfails without consuming input, thenrshould fail without consuming input. - if
qfails after consuming input, thenrshould fail after consuming some input.
I can't seem to think of a clean way to do this.
Rationale
The reason is that I have a parser like this:
s = (:) <$> q <*> many r
The q parser, embedded inside the r parser, needs a way to signal either: invalid input (which occurs when q consumes input but fails), or end of the many loop (which occurs when q doesn't consume anything and fails). If the input is invalid, it should just fail the parser entirely and report the problem to the user. If there is no more input to consume, then it should end the many loop (without reporting a parser error to the user). The trouble is that it's possible that the input ends with a p but without any more valid q's to consume, in which case q will fail but without consuming any input.
So I was wondering if anyone have an elegant way to solve this problem? Thanks.
Addendum: Example
p = string "P"
q = (++) <$> try (string "xy") <*> string "z"
Test input on (hypothetical) parser s, had it worked the way I wanted:
xyz(accept)xyzP(accept;Premains unparsed)xyzPx(accept;Pxremains unparsed;qfailed but did not consume any input)xyzPxy(reject; parserqconsumedxybut failed)xyzPxyz(accept)
In the form r = try p *> q, s will actually reject both case #2 and case #3 above. Of course, it's possible to achieve the above behavior by writing:
r = (++) <$> try (string "P" *> string "xy") <*> string "z"
but this isn't a general solution that works for any parsers p and q. (Perhaps a general solution doesn't exist?)
I believe I found a solution. It's not especially nice, but seems to works. At least something to start with:
This is the
rcombinator you're asking for. The idea is that we first execute the parsers in a "test" run (usinglookAhead . try) and if any of them fails without consuming input, we record it asNothinginsideMaybeT. This is accomplished byopt, it converts a failure toNothingand wraps it intoMaybeT. Thanks toMaybeT, ifopt preturnsNothing,opt qis skipped.If both
pandqsucceed, thetest ..part returnsJust (). And if one of them consumes input, the wholetest ..fails. This way, we distinguish the 3 possibilities:porq.After
<|> pure (Just ())both 1. and 3. result inJust (), while 2. results in Nothing. Finally, themaybepart convertsNothinginto a non-consuming failure, andJust ()into running the parsers again, now without any protection. This means that 1. fails again with consuming some input, and 3. succeeds.Testing: