I'm trying to extent the SQL language of SQLite at one point (file parse.y). I have a parsing conflict, however the lemon parser does not show anything besides a random "1 parsing conflicts." error message.
The problem is located where create_table can be reduced to both "CREATE" or "CREATE OR REPLACE" which is followed by temp which can also be reduced to an empty token.
cmd ::= create_table create_table_args table_properties_args.
create_table ::= createorreplace(C) temp(T) TABLE ifnotexists(E) nm(X) dbnm(Y). {
// ...
}
%type createorreplace {int}
createorreplace(A) ::= CREATE. {disableLookaside(pParse); A = 0;}
createorreplace(A) ::= CREATE OR REPLACE. {disableLookaside(pParse); A = 1;}
%type temp {int}
temp(A) ::= TEMP. {A = pParse->db->init.busy==0;}
temp(A) ::= . {A = 0;}
How can I make "OR REPLACE" reduced optionally, while preserving that it may be followed by TEMP?
Since I can only guess how and where you might have changed SQLite's SQL grammar, this answer is necessarily somewhat tentative. But it might be useful anyway.
The original SQL grammar contains the following productions (I left out the actions since they are never relevant in diagnosing conflicts):
You seem to have modified
create_tableto instead read:That change indeed creates a conflict, but it has nothing to do with
tempbeing nullable. In fact, it has very little to do with the non-terminaltempat all. You could replacetempwithTEMP(thereby making it obligatory rather than optional) and you would still have a shift-reduce conflict.The conflict occurs for inputs which start
CREATE TEMP. That input could be the start ofCREATE TEMP TABLE ...CREATE TEMP VIEW ...Those are obviously different syntaxes, and there is no ambiguityBut when the terminal
CREATEhas just been read and the terminalTEMPis the lookahead token, both of those possibilities are still available. That's not necessarily a problem; a bottom-up parser does not need to resolve which possible production will be used until it gets to the end of the production. So the original grammar works fine, without conflicts.But note that the original grammar does not have a
cmdproduction which starts with the terminalCREATE. What it has are severalcmdproductions which start with the non-terminalcreatekw. But there is no possibility of confusion there, either. The terminalCREATEis reduced tocreatekwin bothcmdproductions (and othercmdproductions I didn't list, which also start withcreatekw).However, in your modified grammar, the two productions do not both start with
createkw. One of them was changed to start withcreateorreplace.Inputs which do not include the optional keyword
TEMPstill parse without any problem. IfTEMPis not present, the lookahead token will beTABLEin thecreate_tablecommand, and the lookahead token will beVIEWin the create view command. Since the lookahead tokens differ, the parser has no trouble deciding whether to reduce tocreatekwor to reduce tocreateorreplace. Similarly, if the input were actuallyCREATE OR REPLACE ..., the lookahead token would beOR, which unambiguously forces a reduction tocreateorreplace.But the problematic input, as shown above, starts
CREATE TEMP. Now, the parser must decide, without seeing anything which follows the terminalTEMP, whether to reduceCREATEtocreatekwor to reduce it tocreateorreplace. Since that determination cannot be made, a conflict is reported. (And you'll find a lot more information about that conflict by looking through the Lemon report file,parse.out.)The solution (if my guess about your grammar modifications was correct) is to avoid forcing the parser to make an unnecessary decision. That requires a little bit of grammar duplication:
Now, the terminal
CREATEnot followed byOR REPLACEis always reduced tocreatekw, while the sequenceCREATE OR REPLACEis always reduced tocreateorreplace. This works because there is no possible parse for acmdstartingCREATE OR, other thanCREATE OR REPLACE.