I've written an LALR1 parser which seems to work. It can be parameterised for the type of symbols it receives. The only binding is Eq.
module LALR.Parser where
data Rule symbolT = Rule
{
lhs :: symbolT,
rhs :: Int
}
data Action symbolT = Accept
| Goto (State symbolT)
| Shift (State symbolT)
| Reduce (Rule symbolT)
data State symbolT = State
{
transitions :: [ (symbolT, Action symbolT) ]
}
data Parser symbolT = Parser
{
states :: [State symbolT],
input :: [symbolT]
}
data Result symbolT = UnexpectedSymbol symbolT
| Accepted
| Pending (Parser symbolT)
| EmptyInput
| EmptyStack
build :: State symbolT -> [symbolT] -> Parser symbolT
build state input = Parser
{
states = [state],
input = input
}
step :: Eq symbolT => Parser symbolT -> Result symbolT
step (Parser [] _) = EmptyStack
step (Parser _ []) = EmptyInput
step parser@(Parser states@(state : statesTail) input@(lookAhead : inputTail) ) = case (action) of
Nothing -> UnexpectedSymbol lookAhead
Just Accept -> Accepted
Just (Shift target) -> Pending (Parser (target : states) inputTail)
Just (Reduce rule@(Rule lhs rhs) ) -> Pending (Parser (drop rhs states) (lhs : input) )
Just (Goto target) -> Pending (Parser (target : states) inputTail )
where
action = lookup lookAhead (transitions state)
run :: Eq symbolT => Parser symbolT -> Result symbolT
run parser = case (step parser) of
Pending parser -> run parser
result -> result
However, now I want to transform this parser into a compiler, which not only accepts or rejects a given input, but actually converts it into something useful. Basically, I am thinking to add a handler which can handle the different actions taken. Now my question is the following:
The compiler will require not only to receive mere Symbols, but Symbols with bells and whistles, e.g. a payload (whatever the lexer lexed), or a line number, a file name, or something else.
In an object oriented language I would recur to inheritance and derive a class from Symbol. But what is the way to go in Haskell?
Should I define a class Token which requires a function symbol :: Token -> Symbol and then I would create some data structure that instances that class? Changing the bindings from Eq symbolT to Token symbolT and then calling symbol when I need to lookup the symbol?
(I am trying to study the language limiting myself mostly to the Prelude.)
UPDATE:
Your entire code is parameterised by symbolT. Why just not assume that symbolT is a symbol with a payload? Do you want different types of payload for different symbols?
Yes, I want different payloads. The Lexer produces a [Token].
data Token = Assignment | Name String | Number Int
lex :: String -> Maybe [Token]
I could derive Eq so that it ignores the payload, e.g.
instance Eq Token where
(==) (Name _) (Name _) = True
-- etc
but then in the definition of the transitions of the states I would need to give it a dummy token e.g. [ (Number 0, Accept), (Name "", Shift state3) ] which looks kind of wrong.
I was thinking about something like:
class Tkn a symbolT where
symbol :: a -> symbolT
and then when I lookup the transitions, I would call that method to extract the symbol.
I am not sure if I am explaining myself or if any of this makes any sense.