I am playing with parsing a Newick tree format using Boost Spirit x3, and I fail at parsing a complete tree.
Minimal reproducible example
This is my attempted solution:
namespace quetzal::newick::parser
{
namespace x3 = boost::spirit::x3;
using x3::alpha;
using x3::alnum;
using x3::double_;
using x3::rule;
using x3::lit;
rule<struct branch> branch{"branch"};
auto name = alpha >> *alnum; // to be improved later
auto length = ':' >> double_;
auto leaf = -name;
auto internal= '(' >> (branch % ',') >> ')' >> -name;
auto subtree = leaf | internal;
auto tree = subtree >> ';';
auto const branch_def = subtree >> -length;
BOOST_SPIRIT_DEFINE(branch);
}
The tests for parsing the internal grammar seems to work
BOOST_AUTO_TEST_CASE(internal_grammar)
{
std::vector<std::string> inputs =
{
"(,)",
"(A,B)F",
"(A:10,B:10)F"
};
for(const auto& input : inputs)
{
auto iter = input.begin();
auto iter_end = input.end();
bool r = phrase_parse(iter, iter_end, quetzal::newick::parser::internal, x3::space );
BOOST_CHECK(r && iter == iter_end);
}
}
But the full parser tree fails to parse all but the first tree, and I don't understand why:
BOOST_AUTO_TEST_CASE(full_grammar)
{
std::vector<std::string> inputs =
{
";",
"(,);",
"(,,(,));",
"(A,B,(C,D));",
"(A,B,(C,D)E)F;",
"(:0.1,:0.2,(:0.3,:0.4):0.5);",
"(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;",
"(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);",
"(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;",
"((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;"
};
for(const auto& input : inputs)
{
auto iter = input.begin();
auto iter_end = input.end();
bool r = phrase_parse(iter, iter_end, quetzal::newick::parser::tree, x3::space );
BOOST_CHECK(r && iter == iter_end);
}
}
Possible shortcomings
- I was thinking it may be because of my misuse/nonuse of
x3::lit, but this question seems to clear it - The parser possibly fails are detecting the recursion, but I don't know what in my grammar definition would be faulty for this. I am aware that we should use
autoonly for non-recursive rules (from cppcon presentation by Michael Caisse but I was hoping I made a proper use ofx3::rulehere for the recursive rule. - Last possible caveat: maybe that the grammar fails to detect empty nodes. I am aware of null parser but it's unclear to me if I should use it here (optional and a possibly empty list should do the trick, right?).
I made it self-contained: Live On Coliru
Now, when you want to understand the X3 grammar - beyond mentally debugging - you can enable
This debugs rules. Consider adding some debug-only rules for more detailed info:
Now the output will be e.g.: Live
You can "trace" the rule invocations and results. Now, let's look at the first fail:
You can see it tries subtree, which tries leaf, which succeeds because
leafis optional by definition:A parser shaped
-pwill always succeed. So ina|bwhena = -p, the alternativebwill never be invoked. Either makenameless optional, or reorder your branches, so aninternalwill get a chance before deciding an emptyleafwas matched:Now we get:
Looking at the one remaining failing parse:
Looking at the end clearly indicates the problem is that the length (":0.0") is encountered outside the last parentheses, where it is not expected. Perhaps you forgot that you were using
treeas the rule, notbranch? Anyways, you can probably take it from here.Side Notes
You are using a skipper which will probably make your life unless you make some rules lexeme (like
name). I'd also suggest coding the skipper into your grammar:Note that
spaceincludes newlines, so maybe you really wantblankinstead. Finally, you can incorporate thef == literator check into the grammar by appending>> eoi:Full Listing
Also addressing the side notes and removing the debug/expositional stuff:
Live On Coliru
Prints