Parse Html with Boost Spirit X3

Question

Parse Html with Boost Spirit X3

633 Views Asked by Hackman Lo At 30 December 2022 at 16:43

I'm trying to write a parser to parse html with boost spirit x3, and I wrote parsers below:

The problem is these code can't compile. Error is :

fatal error C1202: recursive type or function dependency context too complex

I know this error comes out because of my parser html_element_ references tag_block_, and tag_block_ references html_element_, but I don't know how to make it work.

#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3/support/ast/position_tagged.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
#include <iostream>
using namespace boost::spirit::x3;
struct tag_name{};
struct html_tag;
struct html_comment;
struct attribute_data : boost::spirit::x3::position_tagged {
  std::string name;
  boost::optional<std::string> value;
};


struct tag_header :  boost::spirit::x3::position_tagged {
  std::string name;
  std::vector<attribute_data> attributes;
};

struct self_tag: boost::spirit::x3::position_tagged {
  tag_header header;
};

struct html_element : boost::spirit::x3::position_tagged, boost::spirit::x3::variant< std::string, self_tag, boost::recursive_wrapper<html_tag>>{
  using base_type::base_type;
  using base_type::operator=;
};



struct html_tag: boost::spirit::x3::position_tagged {
  tag_header header;
  std::vector<html_element> children;
};

BOOST_FUSION_ADAPT_STRUCT(attribute_data, name, value);
BOOST_FUSION_ADAPT_STRUCT(tag_header, name, attributes);
BOOST_FUSION_ADAPT_STRUCT(self_tag, header);
BOOST_FUSION_ADAPT_STRUCT(html_tag,header,children);

// These are the attributes parser, seems fine
struct attribute_parser_id;
auto attribute_identifier_= rule<attribute_parser_id, std::string>{"AttributeIdentifier"} = lexeme[+(char_ - char_(" /=>"))];
auto attribute_value_= rule<attribute_parser_id, std::string>{"AttributeValue"} =
                           lexeme["\"" > +(char_ - char_("\"")) > "\""]|lexeme["'" > +(char_ - char_("'")) > "'"]|
                           lexeme[+(char_ - char_(" />"))];
auto single_attribute_ = rule<attribute_parser_id, attribute_data>{"SingleAttribute"} = attribute_identifier_ > -("=">  attribute_value_);
auto attributes_ = rule<attribute_parser_id, std::vector<attribute_data>>{"Attributes"} = (*single_attribute_);


struct tag_parser_id;


auto tag_name_begin_func = [](auto &ctx){
  get<tag_name>(ctx) = _attr(ctx).name;
  //_val(ctx).header.name = _attr(ctx);
  std::cout << typeid(_val(ctx)).name() << std::endl;

};
auto tag_name_end_func = [](auto &ctx){
  _pass(ctx) = get<tag_name>(ctx) == _attr(ctx);
};

auto self_tag_name_action = [](auto &ctx){
  _val(ctx).header.name = _attr(ctx);
};
auto self_tag_attribute_action = [](auto &ctx){
  _val(ctx).header.attributes = _attr(ctx);
};

auto inner_text = lexeme[+(char_-'<')];
auto tag_name_ = rule<tag_parser_id, std::string>{"HtmlTagName"} = lexeme[*(char_ - char_(" />"))];
auto self_tag_ = rule<tag_parser_id, self_tag>{"HtmlSelfTag"} = '<' > tag_name_[self_tag_name_action] > attributes_[self_tag_attribute_action] > "/>";
auto tag_header_ = rule<tag_parser_id, tag_header>{"HtmlTagBlockHeader"} = '<' > tag_name_ > attributes_ > '>';

rule<tag_parser_id, html_tag> tag_block_;

rule<tag_parser_id, html_element> html_element_ = "HtmlElement";

auto tag_block__def = with<tag_name>(std::string())[tag_header_[tag_name_begin_func] > (*html_element_) > "</" > omit[tag_name_[tag_name_end_func]] > '>'];
auto html_element__def = inner_text | self_tag_ | tag_block_ ;

BOOST_SPIRIT_DEFINE(tag_block_, html_element_);
int main()
{
  std::string source = "<div data-src=\"https://www.google.com\" id='hello world'></div>";
  html_element result;
  auto const parser = html_element_;
  auto parse_result = phrase_parse(source.begin(), source.end(), parser, ascii::space, result);
}

I tried to read the example of boost:spirit:qi in official document and the x3 official document, in example of qi, that parser is only parse tag, but not attributes。 The example in x3 official document is different, I think in my case is harder;

Original Q&A

There are 2 best solutions below

**sehe** · Answer 1 · 2022-12-30T19:51:51.530000

On reading, the first thing I notice is that self_tag_ uses expectation points. That won't fly because it is ordered before other things that can legally start with <, like tag_block_:

auto html_element__def = inner_text | self_tag_ | tag_block_ ;

And due to the expectation points it will never backtrack to reach that.

Many places use operator+ where operator* is required, like:

auto inner_text = lexeme[*(char_-'<')];

All those charset differences can be phrased as inverse sets:

auto inner_text = lexeme[*~char_('<')];
//
    = lexeme[*~char_(" />")];

Aside from the fact that XML has specific valid charsets for e.g. element names, but I'm assuming you expressly want to avoid writing a conformant parser. Specifically you really need to be excluding '<', '>', '\r', '\t' etc. from your attribute name/value rules etc.

One smell is the re-use of parser rule tags. This should, as far as my understanding goes, be fine for immediately-defined rules, but certainly not for those that are defined through their tag type, with BOOST_SPIRIT_DEFINE.

Cleanup Exercism

First, a cleanup. This gets past the hurdle of template instantiation depth by commenting out *html_element_ inside tag_block__def. But first let's see what works then:

Live On Coliru

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
#include <iomanip>
#include <iostream>

//// Unused mixin disabled for simplicity
// #include <boost/spirit/home/x3/support/ast/position_tagged.hpp>

namespace x3 = boost::spirit::x3;
using namespace std::string_literals;

namespace Ast {
    struct tag_name {};
    struct html_tag;
    struct html_comment;

    // using mixin = x3::position_tagged;
    struct mixin {};

    struct attribute_data : mixin {
        std::string                  name;
        boost::optional<std::string> value;
    };
    using attribute_datas = std::vector<attribute_data>;

    struct tag_header : mixin {
        std::string     name;
        attribute_datas attributes;
    };

    struct self_tag : mixin {
        tag_header header;
    };

    using element_base =
        x3::variant<std::string, self_tag, boost::recursive_wrapper<html_tag>>;

    struct html_element : mixin , element_base {
        using element_base::element_base;
        using element_base::operator=;
    };

    using html_elements = std::vector<html_element>;

    struct html_tag : mixin {
        tag_header    header;
        html_elements children;
    };
} // namespace Ast

BOOST_FUSION_ADAPT_STRUCT(Ast::attribute_data, name, value)
BOOST_FUSION_ADAPT_STRUCT(Ast::tag_header, name, attributes)
BOOST_FUSION_ADAPT_STRUCT(Ast::self_tag, header)
BOOST_FUSION_ADAPT_STRUCT(Ast::html_tag, header, children)

namespace Parser {
    auto attribute_identifier_                                                         //
        = x3::rule<struct AttributeIdentifier_tag, std::string>{"AttributeIdentifier"} //
        = x3::lexeme[+~x3::char_(" /=>")];

    auto attribute_value_                                                    //
        = x3::rule<struct AttributeValue_tag, std::string>{"AttributeValue"} //
    = x3::lexeme                                                             //
        [('"' > *~x3::char_('"') > '"')                                      //
         | ("'" > *~x3::char_("'") > "'")                                    //
         | *~x3::char_(" />")                                                //
    ];

    auto single_attribute_ =
        x3::rule<struct attribute_identifier__tag, Ast::attribute_data>{"SingleAttribute"} //
        = attribute_identifier_ >> -("=" >> attribute_value_);

    auto attributes_                                                              //
        = x3::rule<struct attribute_data_tag, Ast::attribute_datas>{"Attributes"} //
        = *single_attribute_;

    [[maybe_unused]] static auto& header_of(x3::unused_type) {
        thread_local Ast::tag_header s_dummy;
        return s_dummy;
    }
    [[maybe_unused]] static auto& header_of(Ast::html_tag& ht) {
        return ht.header;
    }

    auto tag_name_begin_func = [](auto &ctx){
        get<Ast::tag_name>(ctx) = _attr(ctx).name;
        // header_of(_val(ctx)).name = _attr(ctx);
        // std::cout << typeid(_val(ctx)).name() << std::endl;
    };

    auto tag_name_end_func         = [](auto& ctx){ _pass(ctx) = (get<Ast::tag_name>(ctx) == _attr(ctx)); };
    auto self_tag_name_action      = [](auto &ctx){ header_of(_val(ctx)).name = _attr(ctx); };
    auto self_tag_attribute_action = [](auto& ctx) { header_of(_val(ctx)).attributes = _attr(ctx); };

    auto tag_name_                                                     //
        = x3::rule<struct HtmlTagName_tag, std::string>{"HtmlTagName"} //
        = x3::lexeme[*~x3::char_(" />")];

    auto self_tag_                                                       //
        = x3::rule<struct HtmlSelfTag_tag, Ast::self_tag>{"HtmlSelfTag"} //
        = '<' >> tag_name_[self_tag_name_action] >> attributes_[self_tag_attribute_action] >> "/>";

    auto tag_header_                                                                     //
        = x3::rule<struct HtmlTagBlockHeader_tag, Ast::tag_header>{"HtmlTagBlockHeader"} //
        = '<' >> tag_name_ >> attributes_ >> '>';

    x3::rule<struct tag_block__tag, Ast::html_tag>        tag_block_    = "TagBlock";
    x3::rule<struct html_element__tag, Ast::html_element> html_element_ = "HtmlElement";

    auto tag_block__def = x3::with<Ast::tag_name>(""s)                        //
        [                                                                     //
            tag_header_[tag_name_begin_func] >> /**html_element_ >>*/ "</" >> //
            x3::omit[tag_name_[tag_name_end_func]] >> '>'                     //
        ];

    auto inner_text        = x3::lexeme[*~x3::char_('<')];
    auto html_element__def = inner_text | self_tag_ | tag_block_;

    BOOST_SPIRIT_DEFINE(tag_block_, html_element_)
}

namespace unit_tests {
    template <bool ShouldSucceed = true, typename P>
    void test(P const& rule, std::initializer_list<std::string_view> cases) {
        for (auto input : cases) {
            if constexpr (ShouldSucceed) {
                typename x3::traits::attribute_of<P, x3::unused_type>::type result;

                auto ok = phrase_parse(input.begin(), input.end(), rule, x3::space, result);
                std::cout << quoted(input) << " -> " << (ok ? "Ok" : "FAILED") << std::endl;
            } else {
                auto ok = phrase_parse(input.begin(), input.end(), rule, x3::space);
                if (!ok)
                    std::cout << "Fails as expected: " << quoted(input) << std::endl;
                else
                    std::cout << "SHOULD HAVE FAILED: " << quoted(input) << std::endl;
            }
        }
    }
}

int main() {
    unit_tests::test(Parser::self_tag_,
                     {
                         R"(<simple foo="" bar='' value-less qux=bareword/>)",
                         R"(<div />)",
                         R"(<div/>)",
                         R"(< div/>)",
                     });

    unit_tests::test(Parser::html_element_,
                     {
                         R"(<simple foo="" bar='' value-less qux=bareword></simple>)",
                         R"(<div ></div>)",
                         R"(<div></div>)",
                         R"(< div></div>)",
                         R"(< div ></div>)",
                         R"(<div data-src="https://www.google.com" id='hello world'></div>)",

                         R"(<div></ div>)",
                         R"(<div></ div >)",
                     });

    unit_tests::test<false>(Parser::self_tag_,
                            {
                                R"(<div/ >)",
                                R"(<div>< /div>)",
                                R"(<div></dov>)",
                            });
}

Outputs

"<simple foo=\"\" bar='' value-less qux=bareword/>" -> Ok   
"<div />" -> Ok
"<div/>" -> Ok
"< div/>" -> Ok
"<simple foo=\"\" bar='' value-less qux=bareword></simple>" -> Ok
"<div ></div>" -> Ok
"<div></div>" -> Ok
"< div></div>" -> Ok
"< div ></div>" -> Ok
"<div data-src=\"https://www.google.com\" id='hello world'></div>" -> Ok
"<div></ div>" -> Ok
"<div></ div >" -> Ok
Fails as expected: "<div/ >"
Fails as expected: "<div>< /div>"
Fails as expected: "<div></dov>"

What Is The Trouble

As you can deduce from my hunch to comment-out the recursion *html_element_, this is causing problems.

The real reason is that with<> extends the context. This means that each level of recursion adds more data to the context type, causing new template instantiations.

The simplest trick is to move with<> up outside the recursion:

auto tag_block__def =                                             //
    tag_header_[tag_name_begin_func] >> *html_element_ >> "</" >> //
    x3::omit[tag_name_[tag_name_end_func]] >> '>'                 //
    ;

auto inner_text        = x3::lexeme[*~x3::char_('<')];
auto html_element__def = inner_text | self_tag_ | tag_block_;
auto start             = x3::with<Ast::tag_name>(""s)[html_element_];

However this highlights the problem that elements can nest, and it's useless when inner tags overwrite the context data for tag_name. So, instead of string we could make it stack<string>:

auto start = x3::with<tag_stack>(std::stack<std::string>{})[html_element_];

And then amend the actions to match:

auto tag_name_begin_func = [](auto& ctx) { get<tag_stack>(ctx).push(_attr(ctx).name); };

auto tag_name_end_func = [](auto& ctx) {
    auto& s    = get<tag_stack>(ctx);
    _pass(ctx) = (s.top() == _attr(ctx));
    s.pop();
};

See it Live On Coliru

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>
#include <iomanip>
#include <iostream>
#include <stack>

//// Unused mixin disabled for simplicity
// #include <boost/spirit/home/x3/support/ast/position_tagged.hpp>

namespace x3 = boost::spirit::x3;
using namespace std::string_literals;

namespace Ast {
    struct html_tag;
    struct html_comment;

    // using mixin = x3::position_tagged;
    struct mixin {};

    struct attribute_data : mixin {
        std::string                  name;
        boost::optional<std::string> value;
    };
    using attribute_datas = std::vector<attribute_data>;

    struct tag_header : mixin {
        std::string     name;
        attribute_datas attributes;
    };

    struct self_tag : mixin {
        tag_header header;
    };

    using element_base =
        x3::variant<std::string, self_tag, boost::recursive_wrapper<html_tag>>;

    struct html_element : mixin , element_base {
        using element_base::element_base;
        using element_base::operator=;
    };

    using html_elements = std::vector<html_element>;

    struct html_tag : mixin {
        tag_header    header;
        html_elements children;
    };
} // namespace Ast

BOOST_FUSION_ADAPT_STRUCT(Ast::attribute_data, name, value)
BOOST_FUSION_ADAPT_STRUCT(Ast::tag_header, name, attributes)
BOOST_FUSION_ADAPT_STRUCT(Ast::self_tag, header)
BOOST_FUSION_ADAPT_STRUCT(Ast::html_tag, header, children)

namespace Parser {
    struct tag_stack final {};

    auto attribute_identifier_                                                         //
        = x3::rule<struct AttributeIdentifier_tag, std::string>{"AttributeIdentifier"} //
        = x3::lexeme[+~x3::char_(" /=>")];

    auto attribute_value_                                                    //
        = x3::rule<struct AttributeValue_tag, std::string>{"AttributeValue"} //
    = x3::lexeme                                                             //
        [('"' > *~x3::char_('"') > '"')                                      //
         | ("'" > *~x3::char_("'") > "'")                                    //
         | *~x3::char_(" />")                                                //
    ];

    auto single_attribute_ =
        x3::rule<struct attribute_identifier__tag, Ast::attribute_data>{"SingleAttribute"} //
        = attribute_identifier_ >> -("=" >> attribute_value_);

    auto attributes_                                                              //
        = x3::rule<struct attribute_data_tag, Ast::attribute_datas>{"Attributes"} //
        = *single_attribute_;

    [[maybe_unused]] static auto& header_of(x3::unused_type) {
        thread_local Ast::tag_header s_dummy;
        return s_dummy;
    }
    [[maybe_unused]] static auto& header_of(Ast::html_tag& ht) {
        return ht.header;
    }

    auto tag_name_begin_func = [](auto& ctx) { get<tag_stack>(ctx).push(_attr(ctx).name); };

    auto tag_name_end_func = [](auto& ctx) {
        auto& s    = get<tag_stack>(ctx);
        _pass(ctx) = (s.top() == _attr(ctx));
        s.pop();
    };
    auto assign_name  = [](auto& ctx) { header_of(_val(ctx)).name = _attr(ctx); };
    auto assign_attrs = [](auto& ctx) { header_of(_val(ctx)).attributes = _attr(ctx); };
    auto tag_name_                                                     //
        = x3::rule<struct HtmlTagName_tag, std::string>{"HtmlTagName"} //
        = x3::lexeme[*~x3::char_(" />")];

    auto self_tag_                                                       //
        = x3::rule<struct HtmlSelfTag_tag, Ast::self_tag>{"HtmlSelfTag"} //
        = '<' >> tag_name_[assign_name] >> attributes_[assign_attrs] >> "/>";

    auto tag_header_                                                                     //
        = x3::rule<struct HtmlTagBlockHeader_tag, Ast::tag_header>{"HtmlTagBlockHeader"} //
        = '<' >> tag_name_ >> attributes_ >> '>';

    x3::rule<struct tag_block__tag, Ast::html_tag>        tag_block_    = "TagBlock";
    x3::rule<struct html_element__tag, Ast::html_element> html_element_ = "HtmlElement";

    auto tag_block__def =                                             //
        tag_header_[tag_name_begin_func] >> *html_element_ >> "</" >> //
        x3::omit[tag_name_[tag_name_end_func]] >> '>'                 //
        ;

    auto inner_text        = x3::lexeme[*~x3::char_('<')];
    auto html_element__def = inner_text | self_tag_ | tag_block_;
    auto start             = x3::with<tag_stack>(std::stack<std::string>{})[html_element_];

    BOOST_SPIRIT_DEFINE(tag_block_, html_element_)
}

namespace unit_tests {
    template <bool ShouldSucceed = true, typename P>
    void test(P const& rule, std::initializer_list<std::string_view> cases) {
        for (auto input : cases) {
            if constexpr (ShouldSucceed) {
                typename x3::traits::attribute_of<P, x3::unused_type>::type result;

                auto ok = phrase_parse(input.begin(), input.end(), rule, x3::space, result);
                std::cout << quoted(input) << " -> " << (ok ? "Ok" : "FAILED") << std::endl;
            } else {
                auto ok = phrase_parse(input.begin(), input.end(), rule, x3::space);
                if (!ok)
                    std::cout << "Fails as expected: " << quoted(input) << std::endl;
                else
                    std::cout << "SHOULD HAVE FAILED: " << quoted(input) << std::endl;
            }
        }
    }
}

int main() {
    unit_tests::test(Parser::self_tag_,
                     {
                         R"(<simple foo="" bar='' value-less qux=bareword/>)",
                         R"(<div />)",
                         R"(<div/>)",
                         R"(< div/>)",
                     });

    unit_tests::test(Parser::start,
                     {
                         R"(<simple foo="" bar='' value-less qux=bareword></simple>)",
                         R"(<div ></div>)",
                         R"(<div></div>)",
                         R"(< div></div>)",
                         R"(< div ></div>)",
                         R"(<div data-src="https://www.google.com" id='hello world'></div>)",

                         R"(<div></ div>)",
                         R"(<div></ div >)",

                         R"(<div><nest/><nest some="more">yay</nest></div>)",
                     });

    unit_tests::test<false>(Parser::self_tag_,
                            {
                                R"(<div/ >)",
                                R"(<div>< /div>)",
                                R"(<div></dov>)",
                            });
}

Printing

"<simple foo=\"\" bar='' value-less qux=bareword/>" -> Ok
"<div />" -> Ok
"<div/>" -> Ok
"< div/>" -> Ok
"<simple foo=\"\" bar='' value-less qux=bareword></simple>" -> Ok
"<div ></div>" -> Ok
"<div></div>" -> Ok
"< div></div>" -> Ok
"< div ></div>" -> Ok
"<div data-src=\"https://www.google.com\" id='hello world'></div>" -> Ok
"<div></ div>" -> Ok
"<div></ div >" -> Ok
"<div><nest/><nest some=\"more\">yay</nest></div>" -> Ok
Fails as expected: "<div/ >"
Fails as expected: "<div>< /div>"
Fails as expected: "<div></dov>"

CLOSING THOUGHTS

I'm answering this assuming you are just doing this to learn X3. Otherwise the only recommendation is: do not do this. Use a library.

Not only does your grammar do a pretty poor job of parsing XML, it will utterly fail on HTML in the wild. Closing tags are not a given in HTML ("quirks mode"). Scripts, CDATA, entity references, Unicode, escapes will all f*ck your parser up.

Oh, have you noticed how you mostly broke attribute propagation by introducing some semantic actions? I could show you how to fix it, but I think I'd rather leave it for the moment.

Just use a library.

**Larry Evans** · Answer 2 · 2023-01-08T23:50:26.070000

This initial solution to the problem of, among other things, matching begin/end tags, is greatly simplified here The simplification solely focuses on the "matching begin/end tags" subpart of the problem. The simplification makes no attempt at parsing strings, instead it simply parses x3:uint_. This is sufficient to illustrate a solution to the subpart of the problem because the essence of the subpart problem is matching begin tags with end tags. More specifically, the problem of inferring that the attribute of this expression:

      auto 
    tag_header_
      = 
      (  '<' 
      >> tag_name_
      >> '>'
      )
      #ifdef USE_SEMANTIC_ACTIONS
      [tag_name_begin_func]
      #endif
      ;

is the same as the attribute of this expression:

      auto 
    tag_footer_
      = 
      (  "</"
      >> tag_name_ 
      >> '>'
      )
      #ifdef USE_SEMANTIC_ACTIONS
      [tag_name_end_func]
      #endif
      ;

is much visually simpler than inferring that the attribute of this expression:

    auto tag_name_                                                     //
        = x3::rule<struct HtmlTagName_tag, std::string>{"HtmlTagName"} //
        = x3::lexeme[*~x3::char_(" />")];

is the same as the attribute of this expression:

        "</" >> //
        x3::omit[tag_name_[tag_name_end_func]] >> '>'                 //

The latter 2, visually complicated, expressions were copy&pasted from here.

Furthermore, tag_name_ and inner_text are also much simpler. The original:

   auto tag_name_                                                     //
       = x3::rule<struct HtmlTagName_tag, std::string>{"HtmlTagName"} //
       = x3::lexeme[*~x3::char_(" />")];
   auto inner_text        = x3::lexeme[*~x3::char_('<')];

is obviously and distractedly more complicated than the simplified solution:

    auto tag_name_
        = x3::uint_;
    auto inner_text        = x3::uint_;

Now, the reader may note, that the original solution contained several statements which Seth called "immediately-defined rules". An "immediately-define rule" pattern maybe "abstracted" as:

    auto RuleDef
      = x3::rule<struct RuleTag, RuleAttribute>{"RuleName"}
      = RuleRhs;

in this abstraction the camel case identifiers are pattern parameters which are replaced to create an actual instance of an immediately-defined rule, somewhat like when template's expressions are instantiated. In the above tag_name_ instance, the following replacements were made:

  RuleDef -> tag_name_
  RuleTag -> HtmlTagName_tag
  RuleAttribute -> std::string
  RuleName -> HtmlTagName
  RuleRhs -> x3::lexeme[*~x3::char_(" />")]

But, what's the purpose of an immediately-defined rule? Well, one reason is for converting attribute of the RuleRhs to the RuleAttribute, as shown here. (The example may be a bit hard to understand because the immediately-defined rule is obscured by being within the expression forming the parser argument to the parse function.)

However, there's no need for such conversions in the simplification; hence, all the immediately-defined rules were removed as a further simplification.

Furthermore, the claim that inner_text requires * instead of + is wrong. Using * results in html_element_ always choosing inner_text but not consuming any input if the input starts with '<'. This results in infinite loop.

Instead of using the existing namespace unit_tests, use

namespace unit_tests 
{
      template 
      < bool ShouldSucceed = true
      , typename P
      >
      void 
    test
      ( P const& start
      , std::initializer_list<std::string> cases
      ) 
      {
          std::cout<<__func__ <<":ShouldSucceed="<<ShouldSucceed<<";\n";
          using aof_parser=typename x3::traits::attribute_of<P, x3::unused_type>::type;
          for (auto input : cases) 
          {
              std::cout<<":input.begin="<<input<<";\n";
              auto first=input.begin();
              auto const last=input.end();
                aof_parser 
              attr_actual;
              auto ok = phrase_parse(first, last, start, x3::space, attr_actual);
              std::string input_end(first,last);
              std::cout<<":input.end="<<input_end<<";\n";
              auto at_end=input_end.empty();
              std::cout<<":ok="<<ok<<":at_end="<<at_end<<";\n";
              bool success=ok && at_end;
              if constexpr (ShouldSucceed) 
              {
                  if (success)
                  {
                      std::cout << ":Yes Ok,succeeded and should have succeeded.";
                      std::cout<<":attr_actual=\n";
                      boost::spirit::x3::traits::print_attribute(std::cout,attr_actual);
                      std::cout<<";\n";
                  }
                  else
                  {
                      std::cout << ":Not Ok,failed but should have succeeded!";
                  }
                  std::cout << std::endl;
              } 
              else 
              {
                  if (!ok)
                      std::cout << ":Yes Ok,failed and should have failed.";
                  else
                      std::cout << ":Not Ok,succeeded but should have failed! ";
                  std::cout << std::endl;
              }
          }//for
      }//test
    template <bool ShouldSucceed = true, typename P>
    void test_with_stack
    ( P const& rule
    , std::initializer_list<std::string> cases
    ) 
    {
        auto start=
          x3::with<Parser::tag_stack>(std::stack<std::string>{})
          [
            rule
          ]
          ;
        test<ShouldSucceed>(start,cases);
    }//test_with_stack              
      template
      < bool ShouldSucceed=true
      , typename Parser
      >
      auto
    tester
      ( Parser const& parser
      , std::initializer_list<std::string> cases
      )
      {
      ; test_with_stack
        < ShouldSucceed
        >
        ( parser
        , cases
        )
      ;}
}//unit_tests

and, instead of:

unit_tests::test(Parser::start,

use:

unit_tests::tester(Parser::tag_block_,

to see the problem. But be prepared to kill the program because, when using * in inner_text, you'll get an infinite loop.

#define BOOST_SPIRIT_X3_DEBUG to actually see some output which clearly shows infinitely repeating same inner_text parser.

Parse Html with Boost Spirit X3

There are 2 best solutions below

Cleanup Exercism

What Is The Trouble

CLOSING THOUGHTS

Related Questions in C++

Related Questions in BOOST

Related Questions in XML-PARSING

Related Questions in BOOST-SPIRIT-QI

Related Questions in BOOST-SPIRIT-X3

Trending Questions

Popular # Hahtags

Popular Questions