I'm trying to use instaparse on a dimacs file less than 700k in size, with the following grammar
<file>=<comment*> <problem?> clause+
comment=#'c.*'
problem=#'p\s+cnf\s+\d+\s+\d+\s*'
clause=literal* <'0'>
<literal>=#'[1-9]\d*'|#'-\d+'
calling like so
(def parser
(insta/parser (clojure.java.io/resource "dimacs.bnf") :auto-whitespace :standard))
...
(time (parser (slurp filename)))
and it's taking about a hundred seconds. That's three orders of magnitude slower than I was hoping for. Is there some way to speed it up, some way to tweak the grammar or some option I'm missing?
The grammar is wrong. It can't be satisfied.
fileends with aclause.clauseends with a'0'.literalin theclause, being a greedy reg-exp,will eat the final'0'.Conclusion: No
clausewill ever be found.For example ...
We can parse a
literal... but not a
clauseWhy is it so slow?
If there is a
comment:'c'character into successivecomments;'c'.This implies that every tail has to be presented to the rest of the grammar, which includes a reg-exp for
literalthat Instaparse can't see inside. Hence all have to be tried, and all will ultimately fail. No wonder it's slow.I suspect that this file is actually divided into lines. And that your problems arise from trying to conflate newlines with other forms of white-space.
May I gently point out that playing with a few tiny examples - which is all I've done - might have saved you a deal of trouble.