I am creating a compiler using ANTLR4 C++ library. When I run the compiler with big programs, it returns bad_alloc.
I have created a minimal test case, where I just do Loop unrolling of a for loop. Here is the cpp code for the compiler:
int main(int argc, const char* argv[]) {
ANTLRInputStream input("");
qasm3Lexer lexer(&input);
CommonTokenStream tokens(&lexer);
qasm3Parser parser(&tokens);
fstream input_stream;
ofstream output_stream;
input_stream.open(argv[1]);
stringstream string_stream;
string_stream << input_stream.rdbuf();
string compiled_text = string_stream.str();
ForUnrollPass for_unroll_pass(&tokens);
input.reset();
input.load(compiled_text);
lexer.setInputStream(&input);
tokens.setTokenSource(&lexer);
parser.setTokenStream(&tokens);
ParseTreeWalker::DEFAULT.walk(&for_unroll_pass, parser.program());
compiled_text = for_unroll_pass.getText();
output_stream.open(argv[2]);
output_stream << compiled_text;
output_stream.close();
return 0;
}
After tracking down with the debugger, my ForUnrollPass does 1700 rewrite operations (i am now rewriting every token so is very inefficient). ANTLR4 does lazy writing, so the operations are not applied until the getText() function. Then in the getText() it applies 1602 operations and then it crashes on this instruction.
rop->text = iop->text + (!rop->text.empty() ? rop->text : "");
After profiling with valgrind, I have reached 800,000 Bytes of allocated memory, although that does not seem to be the problem, as the minimal case fails at the same point.
ANTLR4 uses and an augmented transition network under the hood that may be using memory, but I do not fully understand how it works, or if 800.000 bytes should be a thing to worry.
ulimit -a returns 8192KB for stack size on Linux, maybe im running short of stack space?
I am not an expert in memory so Im not sure where is the problem.
EDIT
After analyzing the call stack on the operation that produces the bad_alloc, iop->text seems to be at a corrupt memory address.
