Instaparse series of numbers or letters as one leaf?

139 Views Asked by At

So I've been messing around with instaparse and it's been great, however I've been trying to avoid using Regexes as a crutch and it has resulted in a bit more verbose. For the sake of keeping this readable let's just say #'[A-z]' is actually in the 'A'|'B'|etc format.

(def myprsr (instaparse.core/parser 
  "word = (ltr | num)+; 
   <ltr> = #'[A-z]';
   <num> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';"))
(myprs"foo123") ;; -> [:word "f" "o" "o" "1" "2" "3"]

Is there any way without resorting to #'[A-z]+' and #'[0-9]+ to get leaves out like [:word "foo123"] or [:number "123"] (if I had made a number toplevel rule) in order to avoid having to concatenate them as part of the post parse processing?

1

There are 1 best solutions below

0
aengelberg On BEST ANSWER

There's currently no way (besides regexes) to automatically merge those strings during the parse. I would recommend doing this concatenation in the insta/transform map.

There's also nothing wrong with using regexes in a case this simple. We know there isn't a possible parse we're missing out on by greedily parsing all the letters or all the numbers. Therefore regexes are acceptable (and more performant).