How to avoid backtracking parsing SQL with FParsec?

83 Views Asked by At

I'm parsing SQL (Sqlite) using FParsec, and I'm running into issues with backtracking when parsing constructs like START_KEYWORD element_0, …, element_N END_KEYWORD. Since the parser for recognizing identifiers also recognizes keywords, I have to make sure that I try to parse the keyword first and only if that fails then I continue with the identifier. For that, I'm using the parser combinator many1Till, but it still seems like I cannot escape backtracking.

Here's an example of how I'm currently parsing an ORDER BY clause:

let sep1CommaEnd p endP =
    let commaOrEnd = symbol S.Comma <|> followedBy endP
    many1Till (p .>> commaOrEnd) (followedBy endP)

let orderBy =
    parse {
        do! keyword K.Order >>. keyword K.By
        let! xs = sep1CommaEnd ident (keyword K.Asc <|> keyword K.Desc)
        let! asc = (keyword K.Asc >>. preturn true <|> (keyword K.Desc >>. preturn false) <|> preturn true )
        return { columns = xs; asc = asc }
    }

This implementation has a few issues:

  • It uses followedBy twice in sep1CommaEnd, which can cause backtracking.
  • It repeats parsing the keyword used for ending the construct, which is inefficient.

To solve these issues, I came up with the idea of creating a parser combinator called sepBy1Cont that allows me to:

  • End right after the keyword used for ending is detected.
  • Accumulate all results, including the one returned by the parser of the keyword used for ending. Here's my implementation of sepBy1Cont:
type Either<'a, 'b> =
    | Left of 'a
    | Right of 'b

let sepBy1Cont (p: Parser<'a, unit>) (sepEnd: Parser<Either<'b, 'c>, unit>) =
    let rec loop (acc: 'a list) =
        parse {
            let! x = p
            let! d = sepEnd

            match d with
            | Left _ -> return! loop (x :: acc)
            | Right next ->
                let xs = List.rev (x :: acc)
                return (xs, next)
        }

    loop []

My questions are:

  • Is there a better way to solve this problem in FParsec that I'm missing?
  • If sepBy1Cont is a good solution, is there a better way to implement it? Should I learn about the internals of FParsec to improve it?
1

There are 1 best solutions below

1
RonaldSchlenker On

Answering your 1st question (in general): There are many builtin string parsers in FParsec that can / should be used for clarity and performance.

Given a pseudo-grammar that should be parsed:

START_KEYWORD element_0, …, element_N END_KEYWORD

Here's an idea:

let sigStart = "START"
let sigEnd = "END"

let pDemo : Parser<_,unit> = 
    let field = many1Chars2 letter (letter <|> digit <|> pchar '_')
    let separator = pchar ',' .>> spaces
    let fields = sepBy1 field separator
    between (pstring sigStart .>> spaces1) (spaces1 >>. pstring sigEnd) fields

let run p s =
    match runParserOnString p () "" s with
    | Success (result, _, _) -> printfn "Success: %A" result
    | Failure (msg, _, _) -> printfn "Failure: %s" msg

$"{sigStart} field_0, field_1, field_N {sigEnd}"
|> run pDemo

// Success: ["field_0"; "field_1"; "field_N"]