I must be going crazy here... Here is my XML file, called Original.xml:
<root>
<metadata>Trying to change this</metadata>
<body>
<salad>Greek Caesar</salad>
</body>
</root>
I am trying to modify the contents within the metadata tag.
Here is my entire piece of code that I have that WORKS:
#include <iostream>
#include <rapidxml/rapidxml_print.hpp>
#include <rapidxml/rapidxml_utils.hpp>
int main()
{
// Open 'Original.xml' to read from
rapidxml::file<> xmlFile("Original.xml");
rapidxml::xml_document<> doc;
doc.parse<0>(xmlFile.data());
// Get to <metadata> tag
// <root> <metadata> ???
rapidxml::xml_node<>* metadataNode = doc.first_node()->first_node()->first_node();
// Always correctly prints: 'Trying to change this'
std::cout << "Before: " << metadataNode->value() << std::endl;
// Modify the contents within <metadata>
const std::string newMetadataValue = "Did the changing";
metadataNode->value(newMetadataValue.c_str());
// Always correctly prints: 'Did the changing'
std::cout << "After: " << metadataNode->value() << std::endl;
// Save output to 'New.xml'
std::ofstream newXmlFile("New.xml");
newXmlFile << doc;
newXmlFile.close();
doc.clear();
return 0;
}
New.xml will now look like this:
<root>
<metadata>Did the changing</metadata>
<body>
<salad>Greek Caesar</salad>
</body>
</root>
That's the desired behavior I want.
What I don't understand is why I need a third first_node() call to SAVE the information inside metadata.
If I remove the third first_node() call, which is marked by the ??? above, New.xml will keep the old <metadata> string: "Trying to change this".
Yet, in this scenario, both std::cout calls on metadataNode->value() will still correctly print the intended strings; meaning, the first one will print "Trying to change this" and the second will correctly print "Did the changing".
Why in the world do I need to use n+1 calls to first_node() to SAVE the new value at the desired node where n is the number of nodes traversed (from the root) to get to the desired node? Why is that if I have n first_node() calls, I can successfully modify the value at the desired node in RAM only?
Possible bug? On whose end?
In the XML tree model, text elements are nodes as well. This makes sense when you have mixed content elements:
<a>some<b/>text<c/>nodes</a>.Basically:
But wait, there's more
Sadly, this is a problem unless your input has exactly the expected properties.
Assuming this ok sample:
If metadata element were empty, you'd crash:
Indeed this triggers ASAN failures:
If there's a surprise, it will .... do surprising things!
Ends up erroneously generating:
And that's not the end of it:
Prints
HOW TO GET IT ROBUST?
Firstly, use queries to find your target. Sadly rapidxml doesn't support this; See What XML parser should I use in C++?
Secondly, check the node type before editing
Thirdly, replace the entire node if you can, that makes you independent of what was previously there
Lastly, be sure to actually allocate your new node from the document so you don't get lifetime issues.
PUTTING IT ALL TOGETHER:
Prints
SUMMARY
XML is extensible. It's Markup. It's Language. It's not simple :)