Any way to speed up instaparse?

Question

Any way to speed up instaparse?

412 Views Asked by rwallace At 12 April 2016 at 12:16

I'm trying to use instaparse on a dimacs file less than 700k in size, with the following grammar

<file>=<comment*> <problem?> clause+
comment=#'c.*'
problem=#'p\s+cnf\s+\d+\s+\d+\s*'
clause=literal* <'0'>
<literal>=#'[1-9]\d*'|#'-\d+'

calling like so

(def parser
  (insta/parser (clojure.java.io/resource "dimacs.bnf") :auto-whitespace :standard))
...
(time (parser (slurp filename)))

and it's taking about a hundred seconds. That's three orders of magnitude slower than I was hoping for. Is there some way to speed it up, some way to tweak the grammar or some option I'm missing?

Original Q&A

There are 2 best solutions below

carocad On 25 April 2016 at 11:15

I think that your extensive use of * is causing the problem. Your grammar is too ambiguous/ambitious (I guess). I would check two things:

;;run it as
 (insta/parses grammar input)
;; with a small input

That will show you how much ambiguity is in your grammar definition: check "ambiguous grammar".

Read Engelberg performance notes, it would help understand your own problem and probably find out what fits best for you.

**Thumbnail** · Accepted Answer · 2016-05-01T21:26:22.547000

The grammar is wrong. It can't be satisfied.

Every file ends with a clause.
Every clause ends with a '0'.
The literal in the clause, being a greedy reg-exp,will eat the final '0'.

Conclusion: No clause will ever be found.

For example ...

=> (parser "60")
Parse error at line 1, column 3:
60
  ^
Expected one of:
"0"
#"\s+"
#"-\d+"
#"[1-9]\d*"

We can parse a literal

=> (parser "60" :start :literal)
("60")

... but not a clause

=> (parser "60" :start :clause)
Parse error at line 1, column 3:
60
  ^
Expected one of:
"0" (followed by end-of-string)
#"\s+"
#"-\d+"
#"[1-9]\d*"

Why is it so slow?

If there is a comment:

it can swallow the whole file;
or be broken at any 'c' character into successive comments;
or terminate at any point after the initial 'c'.

This implies that every tail has to be presented to the rest of the grammar, which includes a reg-exp for literal that Instaparse can't see inside. Hence all have to be tried, and all will ultimately fail. No wonder it's slow.

I suspect that this file is actually divided into lines. And that your problems arise from trying to conflate newlines with other forms of white-space.

May I gently point out that playing with a few tiny examples - which is all I've done - might have saved you a deal of trouble.

Any way to speed up instaparse?

There are 2 best solutions below

Related Questions in CLOJURE

Related Questions in INSTAPARSE

Trending Questions

Popular # Hahtags

Popular Questions