Here's some fantastic example XML:
<root>
<section>Here is some text<mightbe>a tag</mightbe>might <not attribute="be" />. Things are just<label>a mess</label>but I have to parse it because that's what needs to be done and I can't <font stupid="true">control</font> the source. <p>Why are there p tags here?</p>Who knows, but there may or may not be spaces around them so that's awesome. The point here is, there's node soup inside the section node and no definition for the document.</section>
</root>
I'd like to just grab the text from the section node and all sub nodes as strings. BUT, note that there may or may not be spaces around the sub-nodes, so I want to pad the sub notes and append a space.
Here's a more precise example of what input might look like, and what I'd like output to be:
<root>
<sample>A good story is the<book>Hitchhikers Guide to the Galaxy</book>. It was published<date>a long time ago</date>. I usually read at<time>9pm</time>.</sample>
</root>
I'd like the output to be:
A good story is the Hitchhikers Guide to the Galaxy. It was published a long time ago. I usually read at 9pm.
Note that the child nodes don't have spaces around them, so I need to pad them otherwise the words run together.
I was attempting to use this sample code:
XDocument doc = XDocument.Parse(xml);
foreach(var node in doc.Root.Elements("section"))
{
output += String.Join(" ", node.Nodes().Select(x => x.ToString()).ToArray()) + " ";
}
But the output includes the child tags, and is not going to work out.
Any suggestions here?
TL;DR: Was given node soup xml and want to stringify it with padding around child nodes.