I'm using spirit to parse fortran-like text file filled with fixed width numbers:
1234 0.000000000000D+001234
1234 7.654321000000D+001234
1234 1234
1234-7.654321000000D+001234
There are parsers for signed and unsigned integers, but I can not find a parser for fixed width real numbers, can someone help with it ?
Here's what I have Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
struct RECORD {
uint16_t a{};
double b{};
uint16_t c{};
};
BOOST_FUSION_ADAPT_STRUCT(RECORD, a,b,c)
int main() {
using It = std::string::const_iterator;
using namespace qi::labels;
qi::uint_parser<uint16_t, 10, 4, 4> i4;
qi::rule<It, double()> X19 = qi::double_ //
| qi::repeat(19)[' '] >> qi::attr(0.0);
for (std::string const str : {
"1234 0.000000000000D+001234",
"1234 7.654321000000D+001234",
"1234 1234",
"1234-7.654321000000D+001234",
}) {
It f = str.cbegin(), l = str.cend();
RECORD rec;
if (qi::parse(f, l, (i4 >> X19 >> i4), rec)) {
std::cout << "{a:" << rec.a << ", b:" << rec.b << ", c:" << rec.c
<< "}\n";
} else {
std::cout << "Parse fail (" << std::quoted(str) << ")\n";
}
}
}
Which obviously doesn't parse most records:
Parse fail ("1234 0.000000000000D+001234")
Parse fail ("1234 7.654321000000D+001234")
{a:1234, b:0, c:1234}
Parse fail ("1234-7.654321000000D+001234")
The mechanism exists, but it's hidden more deeply because there are many more details to parsing floating point numbers than integers.
qi::double_(andfloat_) are actually instances ofqi::real_parser<double, qi::real_policies<double> >.The policies are the key. They govern all the details of what format is accepted.
Here are the RealPolicies Expression Requirements
RP::allow_leading_dotRP::allow_trailing_dotRP::expect_dotRP::parse_sign(f, l)trueif successful, otherwisefalse.RP::parse_n(f, l, n)trueif successful, otherwisefalse. If successful, place the result into n.RP::parse_dot(f, l)trueif successful, otherwisefalse.RP::parse_frac_n(f, l, n, d)trueif successful, otherwisefalse. If successful, place the result into n and the number of digits into dRP::parse_exp(f, l)trueif successful, otherwisefalse.RP::parse_exp_n(f, l, n)trueif successful, otherwisefalse. If successful, place the result into n.RP::parse_nan(f, l, n)trueif successful, otherwisefalse. If successful, place the result into n.RP::parse_inf(f, l, n)trueif successful, otherwisefalse. If successful, place the result into n.Let's implement your policies:
Note:
strict_urealpoliciesto reduce the effort. The base class doesn't support signs, and requires a mandatory decimal separator ('.'), which makes it "strict" and rejecting just integral numbersIDigits,FDigits,EDigits)Let's go through our overrides one-by-one:
bool parse_sign(f, l)The format is fixed-width, so want to accept
'+'for positiveThat way the sign always takes one input character:
bool parse_n(f, l, Attr& a)The simplest part: we allow only a single-digit (
IDigits) unsigned integer part before the separator. Luckily, integer parsing is relatively common and trivial:bool parse_exp(f, l)Also trivial: we require a
'D'always:bool parse_exp_n(f, l, int& a)As for the exponent, we want it to be fixed-width meaning that the sign is mandatory. So, before extracting the signed integer of width 2 (
EDigits), we make sure a sign is present:bool parse_frac_n(f, l, Attr&, int& a)The meat of the problem, and also the reason to build on the existing parsers. The fractional digits could be considered integral, but there are issues due to leading zeroes being significant as well as the total number of digits might exceed the capacity of any integral type we choose.
So we do a "trick" - we parse an unsigned integer, but ignoring any excess precision that doesn't fit: in fact we only care about the number of digits. We then check that this number is as expected:
FDigits.Then, we hand off to the base class implementation to actually compute the resulting value correctly, for any generic number type
T(that satisfies the minimum requirements).Summary
You can see, by standing on the shoulders of existing, tested code we're already done and good to parse our numbers:
Now your code runs as expected: Live On Coliru
Prints
Decimals
Now, it's possible to instantiate this parser with precisions that exceed the precision of
double. And there are always issues with the conversion from decimal numbers to inexact binary representation. To showcase how the choice for genericTalready caters for this, let's instantiate with a decimal type that allows 64 significant decimal fractional digits:Live On Coliru
Prints
Bonus Take: Optionals
In the current RECORD, missing doubles are silently taken to be
0.0. That's maybe not the best:Now the output is Live On Coliru:
Summary / Add Unit Tests!
That's a lot, but possibly not all you need.
Keep in mind that you still need proper unit tests for e.g.
X19_type. Think of all edge cases you may encounter/want to accept/want to reject:" 3.141 "," .999999999999D+0 "etc.?All these are pretty simple changes to the policies, but, as you know, code without tests is broken.