I am trying to split <Description> text by Bit number and to put into particular Bit number element. Here is the file, I am parsing.
<Register>
<Name>abc</Name>
<Abstract></Abstract>
<Description>Bit 6 random description
Bit 5 msg octet 2
Bit 4-1
Bit 0 msg octet 4
These registers containpart of the Upstream Message.
They should be written only after the cleared by hardware.
</Description>
<Field>
<Name>qwe</Name>
<Description></Description>
<BitFieldOffset>6</BitFieldOffset>
<Size>1</Size>
<AccessMode>Read/Write</AccessMode>
</Field>
<Field>
<Name>qwe</Name>
<Description></Description>
<BitFieldOffset>5</BitFieldOffset>
<Size>1</Size>
<AccessMode>Read/Write</AccessMode>
</Field>
<Field>
....
</Field>
</Register>
<Register>
<Name>xyz</Name>
<Abstract></Abstract>
<Description>Bit 3 msg octet 1
Bit 2 msg octet 2
Bit 1 msg octet 3
Bit 0 msg octet 4
These registers.
They should be written only after the cleared by hardware.
</Description>
<Field>
....
</Field>
<Field>
....
</Field>
</Register>
The expected output would be:
Expected output:
<Register>
<long_description>
These registers containpart of the Upstream Message.
They should be written only after the cleared by hardware.
</long_description>
<bit_field position="6" width=" 1">
<long_description>
<p> random description</p>
</long_description>
<bit_field position="5" width=" 1">
<long_description>
<p>...</p>
</long_description>
<bit_field position="1" width=" 4">
<long_description>
<p>...</p>
</long_description>
</Register>
<Register>
.
.
.
</Register>
I am using XML-Twig package to parse this file but got stuck into the splitting.
foreach my $register ( $twig->get_xpath('//Register') ) # get each <Register>
{
my $reg_description= $register->first_child('Description')->text;
.
.
.
foreach my $xml_field ($register->get_xpath('Field'))
{
.
.
my @matched = split ('Bit\s+[0-9]', $reg_description);
.
.
}
}
I do not know how to create <bit_field> accordingly and keep text except Bit into <Register> <long_description>. Can anyone please help here?
Edits:
The Bit in <Description> can have multiple lines. e.g in following example, Bit 10-9's description is till starting of Bit 8
<Description>Bit 11 GOOF
Bit 10-9 Clk Selection:
00 : 8 MHz
01 : 4 MHz
10 : 2 MHz
11 : 1 MHz
Bit 8 Clk Enable : 1 = Enable CLK
<Description>
If I got everything right, you could look at the whole text block line by line.
Use a regular expression, to check if a line matches the pattern for a bit. Capture the relevant parts. Cache bit by bit in an array holding hashes storing the details of each bit.
Buffer lines that don't contain the bit pattern. If another line follows, that contains a bit pattern, the buffer must belong to the recent bit. Append it there. All other lines must be part of the overall description. Note: This doesn't distinguish between any additional lines of the description for the last bit. If there is such a bit, its additional lines will make the beginning of the overall description. (But you said such things aren't in your data.)
Proof of concept:
Prints:
I wrote it as stand alone script, so that I could test it. You'll have to adapt it into your script.
Maybe add some processing of the overall description eliminating those long sequences of white spaces.
First I tried using a continuing pattern (
while ($x =~ m/^...$/gc)) but that somehow ate the line endings away resulting in only matching every second line. Lookarounds, to keep them out of the actual match, didn't work (said it wasn't implemented; I guess, I'll have to check my Perl on this computer?), so the explicit splitting into lines is a work around.It might also be possible to shorten it using
grep()s,map()s or the like. But the verbose version better demonstrates the ideas, I think. So I didn't even look into that.