Isn't Parsec's try supposed to backtrack when it encounters failure?
For instance, if I have the code
import Control.Applicative ((<|>))
import Debug.Trace
import Text.Parsec (try)
import Text.Parsec.Combinator (eof)
import Text.Parsec.String (Parser)
import qualified Data.Char as Char
import qualified Text.Parsec as Parsec
data Func = Add | Sub deriving (Show)
add :: Parser Func
add = do
Parsec.string "+"
Parsec.notFollowedBy $ Parsec.satisfy (\_-> True)
return Add
sub :: Parser Func
sub = do
Parsec.string "-"
Parsec.notFollowedBy $ Parsec.satisfy (\_-> True)
return Sub
func :: Parser Func
func = add <|> sub
data AST =
Primitive String
| BinOp Func AST AST
deriving (Show)
primitive :: Parser AST
primitive = do
str <- Parsec.many1 $ Parsec.satisfy $ not . Char.isSpace
trace "primitive returning" $ return $ Primitive str
binOp :: Parser AST
binOp = do
lhs <- parser
Parsec.spaces
operation <- func
Parsec.spaces
rhs <- parser
return $ BinOp operation lhs rhs
parser :: Parser AST
parser = try primitive <|> binOp
sample :: String
sample = "+#-! + -$!&+"
main :: IO ()
main = print $ Parsec.parse (parser <* eof) "failure" sample
I get the output:
primitive returning
Left "failure" (line 1, column 5):
unexpected ' '
expecting end of input
However, from my understanding, when the parser encounters try primitive <|> binOp and primitive fails, it should go to the binOp option.
Desired parse is
BinOp Add (Primitive "+#-!") (Primitive "-$!&+")
The documentation for
trydoesn't actually use the word "backtrack". What it says is:This may make little sense until you match it up with the documentation for the
(<|>)operator which states:Note the emphasized phrase -- that's emphasis in the original documentation, not emphasis I've added.
Here's how this leads to "backtracking". Consider a simplified version of your parser that allows no spaces, only single-letter primitives, and addition of two primitives as the only operation:
This parser doesn't work. It parses
"x+y"okay:but it fails to parse
"x"alone:The problem in this second case is that the
<|>operator triesbinOpfirst. ThebinOpparser consumes a primitive, namely"x", and then fails to parseadd(because there's no"+"character). As a result,binOpfails after consuming input. The<|>doesn't try its right-hand side alternative parser (primitive) becausep <|> qonly triesqifpfails without consuming input, and that's not what happened here. In other words, by default, after the left-hand side of the alternative consumes some input and then fails, there is no "backtracking" to put that consumed input back.The
trycombinator changes this behavior by "undoing" the consumption of input in case of failure. If we fix the parser:then it works fine on the input
"x":As before, the
<|>operator tries thetry binOpparser first. Thetry binOpparser acts just likebinOp-- it consumes a primitive and then fails to parseadd, resulting in a "failure after consuming input". Buttry"backtracks" by replacing this with a "failure without consuming any input" (essentially, putting the input back). This satisfies the requirement for<|>to try the alternative parserprimitivewhich, of course, successfully parses"x", and that's the final result of the parse.How is this different from your situation? Well, you've got the equivalent of:
This parser is broken, too. It works on
"x":and it "works" on
"x+y"but returns the wrong answer:The problem is that the
<|>first tries thetry primitiveparser, which acts just likeprimitive. Even in the second test case, it succeeds after consuming the primitive"x"(leaving"+y"yet to be parsed). Note thattryhas no effect, because the only thingtrydoes is take "failure after consuming input" and replace it with "failure without consuming any input", butprimitivedoesn't fail! It succeeds! So, there's no backtracking because there's no failure to backtrack from. The result ofparserisPrimitive "x", and now there's some unparseable garbage"+y"left that will trigger a failure in some other parser that followsparser, likeeof.So, to start to fix your parser, you could replace the definition of
parserwith:Here, the
try binOpwill start to parse abinOp, and if it fails part way through (e.g., after parsing a primitive but not finding an operator), thentrywill trigger backtracking so theprimitiveparser can be tried instead.Unfortunately, this will cause an infinite loop in your parser because
parserstarts by tryingbinOpwhich starts by invokingparser(for the left-hand side) which starts by tryingbinOp, etc., etc., and neither parser makes any progress.There are a few solutions to this problem. There is machinery in
Text.Parsec.Exprfor handling expressions in general, and there are functions likesepByorchainl1for correctly parsing a sequence of terms separated by operators (without creating an infinite loop).If you want to do it manually, you'll end up having to do something like:
This is super ugly, which is why
chainl1was invented. The above can be replaced with:With either of these versions of
parser, your parser still won't work because of two more problems. First, yourprimitivedoesn't know when to stop:Here, the only way to end a primitive is with a space. That means that the expression
"x+y+z"is a valid single primitive. If you try to parse this withparser, the initiallhs <- primitivewill eat the whole string, and you'll get backPrimitive "x+y+z". If you don't mind spaces being mandatory, then you'll be able to get it to parse"x + y + z", but if you'd like spaces to be optional, you'll need to somehow restrict the allowed primitives. (There's a reason most languages doesn't allow completely arbitrary identifiers. Ifx+yor -- worse --+#-!-- is a valid identifier, it's hard to parse expressions with infix operators.)If you allow only alphanumeric strings as primitives, that'll get you a little closer:
However, your parser still won't work because
addandsubare misusingnotFollowedBy. Consideradd:This works by first parsing a
"+"character. Then, it tries to parse another character. If that character satisfies the predicate (which it does because it's alwaysTrue), then the whole parse will fail after consuming input. That's becausenotFollowedByis designed to fail if its parser succeeds. So, except in the case where a"+"occurs at the end of the string, thisaddparser always fails after consuming input, aborting the entire parse.For now, at least, it's probably best to just delete those
notFollowedBylines.Anyway, here's a modified version of your parser that more or less works, but only accepts alphanumeric primitives. It actually doesn't need to backtrack, so it doesn't use
tryanywhere: