I've done a function that strips all comments and a few other elements from php code. It's working fine, but, as I do not deeply undertand the code, I have some doubts:
Am I using the latest technology to parse a grammar in boost? A few years ago I used only
Spiritbut I didn't useqi.Is this the right approach with spirit?
What is the reason for putting the grammar inside a block of code?
#include <boost/spirit/include/qi.hpp> namespace qi = boost::spirit::qi; using namespace std; string non_comments_php_code(const string &contents) { string non_comments_code; using Iterator = string::const_iterator; Iterator begin = contents.cbegin(), end = contents.cend(); using Skipper = qi::rule<Iterator>; auto identifier = qi::standard_wide::char_; Skipper block_comment, single_line_comment, skipper, php_tag, php_comment, php_namespace, php_use; { using namespace qi; single_line_comment = "//" >> *(standard_wide::char_ - eol) >> (eol|eoi); block_comment = ("/*" >> *(block_comment | standard_wide::char_ - "*/")) > ("*/"|eoi); php_tag = lit("<?php") | lit("?>"); php_comment = '#' >> *(standard_wide::char_ - eol) >> (eol|eoi); php_namespace = lit("namespace ") >> *(standard_wide::char_ - (eol|';')) >> (eol|';'); php_use = lit("use ") >> *(standard_wide::char_ - (eol|';')) >> (eol|';'); skipper = space | single_line_comment | block_comment | php_tag | php_namespace | php_use | php_comment; } bool ok = phrase_parse(begin, end, skipper, skipper); if ( begin != end) { while( begin != end && *begin != '\n') { non_comments_code += *begin++; } } return non_comments_code; }
EDIT:
The goal of the function is to return any code in the php file that is neither a comment nor a (use|namespace) statement nor the tags <?php .. ?>
I am using templates to autogenerate php code, and once the code is created, I can add custom code. Previous to calling this function I manage to delete all the automatic code that was generated, and then this function tells me if I have added any custom code to the php file.
Thas is why I say it is working, as I dont mind the code, I just want to know if there is any custom code at all.
EDIT 2:
Example of input string:
<?php
namespace tests;
use codeception/tests;
/* This unit test tests something */
class Tester {
/// @group debug
public function testsFeature(/*AcceptanceTester*/ $I) {
$I->assertTrue($this->testsAll());
}
}
?>
And the required output:
classTester{publicfunctiontestsFeature($I){$I->assertTrue($this->testsAll());}}
In fact, the result has no any useful use, I just need to know if it is empty.
There are other approaches to solve the whole problem, like regenerating the template in a temp file and diff'ing it to get the addition changes, but that 1) would be far more expensive, 2) I really want to learn to use boost grammar parsers.
Oh I see, the whole thing was a bit inside-out. You are "parsing" the stuff that you want to "skip" and "skipping" the stuff you need "outside the parser".
It seems a lot more straightforward to have a parser and skipper in their designated roles. Let's create a
StripCommentsParser:This declares the output
std::stringwhich we will use to collect the desired output. I'd put all the bits together like so:Notes:
no
phrase_parse(as you should definitely not be able to change the skipper)rules combined into a grammar struct for encapsulation and re-use. Just slam
staticonto the parser and you insta-optimized your code:std::string non_comments_php_code(std::string const& contents) { std::string non_comments_code; static const StripCommentParser scp; parse(begin(contents), end(contents), scp, non_comments_code); return non_comments_code; }
Observations
What's good
I like how your code pays attention to when the rules should match at
qi::eoi. This is oft neglected, and it shows you understand PEG grammar productions well.What's bad
As I commented before there are a lot of other things weird about this code:
you're also skipping
qi::space?! That seems very unhelpful if the output should be useful for anything (other than counting code size ignoring significant whitespace?)you're inside-out parse driver had extra logic to randomly also skip '\n'. That's odd, because
You're randomly using
standard_wide. This is a bad idea because your input AND output are not wide-character. Also, I expect PHP is UTF8 by definition/convention.Your patterns arbitrarily assume certain space use. E.g.
"namespace\t"will not be matchedIf you care to explain what the real goal of the code is, I can tell you what I'd write.
Questions
That's interesting. "Years ago" is when I'd use Qi. Nowadays I still recommend Qi, but you have the option of using C++14 Spirit X3 (going C++17 now).
If all you're doing is squeezing ignorable input then I'd say X3 is a better choice. However there are areas where I think X3 isn't as mature (e.g. attribute propagation/handling).
Yes and no. Yes in the sense that you create rules. No in the sense that you used the skipper as the grammar (and skipper too). And wrote your own parser around the skipper. I think the above example is what you want
Scope. That's what blocks do. In this case it limits the scope of all the detail rules, as well as the
using namespacedirective. The struct has the same goal but packaging it up in a reusable instance.