I am attempting to develop a lambda calculus interpreter, which supports definitions of terms. For that, I'm using the Megaparsec library in Haskell.
I have the following grammar:
term := var | lambda var . term | term term
def := var = term
instruction := :e t | def
My parser should return a list of instructions, where :e t will evaluate to whatever t is reduced, and def will allow me to have named definitions.
So far, I've managed to parse a very similar grammar, except I'm using semicolons to separate each instruction:
...
instruction := :e t; | def;
I'd like to remove the need for those semicolons, but I'm having no luck with that. My code is currently:
sc :: Parser ()
sc = L.space space1 (L.skipLineComment "--") (L.skipBlockComment "{-" "-}")
lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc
symbol :: Text -> Parser Text
symbol = L.symbol sc
semicolon :: Parser Text
semicolon = symbol ";"
lowercaseWord :: Parser [Char]
lowercaseWord = some lowerChar
variable :: Parser Term
variable = lexeme lowercaseWord <&> Var
lamAbs :: Parser Term
lamAbs = do
symbol "lambda"
space
v <- lexeme lowercaseWord
space
symbol "."
space
t <- term
space
return $ Abs v t
term :: Parser Term
term = foldl1 App <$> sepBy1 (parens term <|> try lamAbs <|> variable) sc
where parens = between (symbol "(") (symbol ")")
definition :: Parser (Instruction Term)
definition = do
n <- lexeme lowercaseWord
space
symbol "="
space
t <- term
return $ IDef n t
commandEval :: Parser (Instruction Term)
commandEval = do
symbol ":e"
t <- term
optional newline
return $ IEval t
program :: Parser [Instruction Term]
program = sc *> sepEndBy (definition <|> commandEval) semicolon <* eof
parse :: ProgramParser (Instruction Term)
parse text = case runParser program "" text of
Left err -> Left $ errorBundlePretty err
Right ast -> Right ast
When I attempt to remove the need for semicolons by changing the definition of program to:
program = sc *> some (definition <|> commandEval) <* eof
the parser will not be able to figure out that it needs to stop parsing a term once it finds something like var = ..., to continue parsing the next definition.
For example, the following input:
a = (lambda x . x)
x
:e b a
b = lambda y . y
returns the following error:
6:3:
|
6 | b = lambda y . y
| ^
unexpected '='
expecting ":e", "lambda", '(', end of input, lowercase letter, or newline
I tried to change the definition of term to make it explicit that it should not be followed by a lowercase word and a =. But that didn't work either:
term = foldl1 App <$> sepBy1 ((parens term <|> try lamAbs <|> variable) <* notFollowedBy newdef) sc
where parens = between (symbol "(") (symbol ")")
newdef = (lexeme lowercaseWord *> space) *> symbol "="
How can I make it so that I don't have the need for semicolons, and my parser automatically stops parsing a term application once it finds var = term?
Thank you in advance for your help.
As noted by @Poselsky, the point of using
Text.Megaparsec.Char.Lexeris that you don't need (and shouldn't have) to deal with white space in your higher-level parsers. So, I'd start by deleting every use ofspaceinlamAbsanddefinition, deleting theoptional newlinefromcommandEvaland replacingsepBy1 xxx scwithsome xxxinterm. (In all these cases, these parsers perform no function. They are run on the input stream after the previous lexeme parser has already absorbed all trailing whitespace.)I'd also move the
lexemecall intolowercaseWord:Then, except for your primitive parsers (now,
symbolandlowercaseWord), the only place you should need to deal with whitespace is thescat the start of your top-level parserprogram. Every other parser should be assumed to start at non-whitespace and absorb trailing whitespace, and those parsers can be combined in sequence to create another parser that follows the same rule, so no strayspaceorscornewlinecalls are needed.That will simplify your code, though it won't fix your problem.
Your attempted fix is the right idea. The problem is that it can't parse the last variable before a
newdefbecause thatvariableis followed by anewdef.You don't actually need to try to parse a whole definition. All you really need to do is make sure a
variableisn't followed by asymbol "=", and you need to wrap it in atryblock to backtrack on the name parsed byvariable:Do be warned that there are still some problems with your parser. It accepts the following input, for example, which you might not expect:
(Note that this has nothing to do with removing the
spacecalls -- your original parser parsed this, too.)The problem is that
symbol "lambda"can match the prefix of a larger word, so"lambdalicious"is parsed as"lambda licious". There's a similar issue withsymbol ":e", if you don't want":eval"to parse as":e val".The usual solution is to define:
and then the following work as intended:
Here's my full test program with all the above revisions: