Here is my ANTLR grammar:
It is divided into two section ,parameters and constraints;
The parameters section consists of many row,Each rowrepresents a parameter and its values.Each parameter and its values are separated by : . Each parameter value is separated by a ,.
The grammar of the constraints section was given by pict's github repository pict's github repository, I converted it into ANTLR grammar format.
grammar Pict;
model:parameters? constraints?;
//The part of Parameters and Values of Parameters
parameters:parameterRow+ '\n'*;
parameterRow: ' '* parameterName SEMI parameterValue (',' ' '* parameterValue)* '\n'*;
parameterName: Value ;
parameterValue:NUMBER|Value;
//The part of submodel
//submodel:;
//The part of constraints
constraints: constraint+ '\n'*;
constraint:(predicate ';'? '\n'*)|((IF|IFNOT) predicate THEN predicate (ELSE predicate)?) ';'? '\n'*;
predicate:
clause
|(clause LogicalOperator predicate)
;
clause:term
|'(' ' '* predicate ' '* ')'
|NOT predicate
;
term:
'['parameterName']' ' '* IN ' '* '{' ' '* (String|NUMBER) ' '* (',' ' '* (NUMBER|String))* ' '* '}' #inStatment
|'['parameterName']' ' '* Relation ' '* (NUMBER|String) #relationValueStatement
| '['parameterName']' ' '* LIKE' '* (NUMBER|String) #likeStatement
|'['parameterName']' ' '* Relation ' '* '['parameterName']'#relationParaStatement
;
SEMI:[ ]*':'[ ]* {setText(getText().trim());};
IN: ([ ]* 'in' [ ]* | [ ]* 'IN' [ ]*) {setText(getText().trim());};
LIKE:([ ]* ('LIKE'|'like') [ ]*) {setText(getText().trim());};
Relation: ('='|'<>'|'>'|'>='|'<'|'<=' ) {setText(getText().trim());};
IF:[ '\n']* ('IF'|'if') [ '\n']*;
IFNOT:[ '\n']* ('IF NOT'|'if not') [ '\n']*;
THEN:[ '\n']* ('THEN'|'then') [ '\n']*;
ELSE:[ '\n']* ('ELSE'|'else') [ '\n']*;
NOT:[ '\n']* ('NOT'|'not') [ '\n']*;
LogicalOperator:([ '\n']* ('and'|'AND') [ '\n']*)|([ '\n']* ('OR'|'or') [ '\n']*) {setText(getText().trim());};
NUMBER
: '-'? INT '.' INT EXP? // 1.35, 1.35E-9, 0.3, -4.5
| '-'? INT EXP // 1e10 -3e4
| '-'? INT // -3, 45
;
Value:LETTERNoWhiteSpace[-.?!a-zA-Z\u4e00-\u9fa5_0-9\u3002|\uff1f|\uff01|\uff0c|\u3001|\uff1b|\uff1a|\u201c|\u201d|\u2018|\u2019|\uff08|\uff09|\u300a|\u300b|\u3008|\u3009|\u3010|\u3011|\u300e|\u300f|\u300c|\u300d|\ufe43|\ufe44|\u3014|\u3015|\u2026|\u2014|\uff5e|\ufe4f|\uffe5]*(' ')?[-.?!a-zA-Z\u4e00-\u9fa5_0-9\u3002|\uff1f|\uff01|\uff0c|\u3001|\uff1b|\uff1a|\u201c|\u201d|\u2018|\u2019|\uff08|\uff09|\u300a|\u300b|\u3008|\u3009|\u3010|\u3011|\u300e|\u300f|\u300c|\u300d|\ufe43|\ufe44|\u3014|\u3015|\u2026|\u2014|\uff5e|\ufe4f|\uffe5]*{setText(getText().trim());};
String:('"' .*? '"') {setText(getText().trim());};
WS:[ \t\r\n]+ -> skip ;
COMMENT: '#' .*? '\n' ->skip;
fragment INT : '0' | '1'..'9' '0'..'9'* ; // no leading zeros
fragment EXP : [Ee] [+\-]? INT ; // \- since - means "range" inside [...]
fragment
LETTERNoWhiteSpace:[-a-zA-Z\u4e00-\u9fa5_0-9];
For the lexical rule Value ,I need it to match all English and Chinese, as well as all English punctuation and Chinese punctuation,So I used unicode,start with \u to do it.
My input is:
Size: 1, 2, 3, 4, 5
Value: a, b, c, d
IF [Size] > 3 THEN [Value] > "b";
and ANTLR reports that:
line 4:12 no viable alternative at input '[Size] > 3 THEN'

I found that 3 THEN is matched by lexical rule Value,but I want 3 to be matched by rule Number or String like my grammar above ,and THEN is a keyword,it should not be matched.
How can I change my grammar to solve this problem?Thanks!
It's probably going to help to clean things up a bit (will make things easier to digest).
Most obvious: You have a
WSrule with askipaction so you can drop all of the[ ]*(and similar) stuff. This also means you don't need the{setText(getText().trim());}stuff.You can use
options { caseInsensitive = true; }to avoid things likeIF: ('IF' | 'if');a
|in a set ([abd|c]) is the actual|character, not anoroperator. so you don't want stuff like\uff0c|\u3001|\uff1b|\uff1a(should be\uff0c\u3001\uff1b\uff1a)This gives you:
With the following errors for your input...
so we can see that your
Valuerule doesn't recognize single letter values. If you modify it it to:(Note: This rule is quite complex, and, by allowing embedded spaces, is likely to cause some problems with tokenization in more complex examples than yours, but it works fine for your sample input.)
Then there are no errors and you get the following tree: