ANTLR on a noisy data stream Part 2

117 Views Asked by At

Following a very interesing discussion with Bart Kiers on parsing a noisy datastream with ANTLR, I'm ending up with another problem...

The aim is still the same : only extracting useful information with the following grammar,

VERB            : 'SLEEPING' | 'WALKING';
SUBJECT         : 'CAT'|'DOG'|'BIRD'; 
INDIRECT_OBJECT : 'CAR'| 'SOFA';  
ANY             : . {skip();};

parse 
  :  sentenceParts+ EOF 
  ;

sentenceParts  
  :  SUBJECT VERB INDIRECT_OBJECT  
  ;    

a sentence like it's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV. will produce the following

alt text

This is perfect and it's doing exactly what I want.. from a big sentence, I'm extracting only the words that had a sense for me.... But the, I founded the following error. If somewhere in the text I'm introducing a word that begin exactly like a token, I'm ending up with a MismathedTokenException or a noViableException


    it's 10PM and the Lazy CAT is currently SLEEPING heavily, 
    with a DOGGY bag, on the SOFA in front of the TV.

produce an error :

alt text

DOGGY is interpreted as the beginning for DOG which is also a part of the TOKEN SUBJECT and the lexer is lost... How could I avoid this without defining DOGGY as a special token... I would have like the parser to understand DOGGY as a word in itself.

1

There are 1 best solutions below

0
On BEST ANSWER

Well, it seems that adding this ANY2 :'A'..'Z'+ {skip();}; solves my problem !