Poorly defined XML, get node and contents of all child nodes as string concat with spaces?

Question

Poorly defined XML, get node and contents of all child nodes as string concat with spaces?

603 Views Asked by somedev At 26 January 2015 at 21:15

Here's some fantastic example XML:

<root>
    <section>Here is some text<mightbe>a tag</mightbe>might <not attribute="be" />. Things are just<label>a mess</label>but I have to parse it because that's what needs to be done and I can't <font stupid="true">control</font> the source. <p>Why are there p tags here?</p>Who knows, but there may or may not be spaces around them so that's awesome. The point here is, there's node soup inside the section node and no definition for the document.</section>
</root>

I'd like to just grab the text from the section node and all sub nodes as strings. BUT, note that there may or may not be spaces around the sub-nodes, so I want to pad the sub notes and append a space.

Here's a more precise example of what input might look like, and what I'd like output to be:

<root>
    <sample>A good story is the<book>Hitchhikers Guide to the Galaxy</book>. It was published<date>a long time ago</date>. I usually read at<time>9pm</time>.</sample>
</root>

I'd like the output to be:

A good story is the Hitchhikers Guide to the Galaxy. It was published a long time ago. I usually read at 9pm.

Note that the child nodes don't have spaces around them, so I need to pad them otherwise the words run together.

I was attempting to use this sample code:

XDocument doc = XDocument.Parse(xml);
foreach(var node in doc.Root.Elements("section"))
{
    output += String.Join(" ", node.Nodes().Select(x => x.ToString()).ToArray()) + " ";
 }

But the output includes the child tags, and is not going to work out.

Any suggestions here?

TL;DR: Was given node soup xml and want to stringify it with padding around child nodes.

Original Q&A

There are 4 best solutions below

**AudioBubble** · Answer 1 · 2015-01-26T21:38:58.377000

You could try using xpath to extract what you need

var docNav = new XPathDocument(xml);

// Create a navigator to query with XPath.
var nav = docNav.CreateNavigator();

// Find the text of every element under the root node
var expression = "/root//*/text()";

// Execute the XPath expression
var resultString = nav.evaluate(expression);

// Do some stuff with resultString
....

References: Querying XML, XPath syntax

**Javier** · Answer 2 · 2015-01-26T22:13:47.537000

Here is a possible solution following your initial code:

private string extractSectionContents(XElement section)
{
    string output = "";
    foreach(var node in section.Nodes())
    {
        if(node.NodeType == System.Xml.XmlNodeType.Text)
        {
            output += string.Format("{0}", node);
        }
        else if(node.NodeType == System.Xml.XmlNodeType.Element)
        {
            output += string.Format(" {0} ", ((XElement)node).Value);
        }
    }

    return output;
}

A problem with your logic is that periods will be preceded by a space when placed right after an element.

**Alexei Levenkov** · Answer 3 · 2015-01-26T22:20:03.450000

You are looking at "mixed content" nodes. There is nothing particularly special about them - just get all child nodes (text nodes are nodes too) and join they values with space.

Something like

var result = String.Join("", 
  root.Nodes().Select(x => x is XText ? ((XText)x).Value : ((XElement)x).Value));

**steve16351** · Answer 4 · 2015-01-26T22:35:45.140000

Incase you have nested tags to an unknown level (e.g <date>a <i>long</i> time ago</date>), you might also want to recurse so that the formatting is applied consistently throughout. For example..

private static string Parse(XElement root)
{
    return root
        .Nodes()
        .Select(a => a.NodeType == XmlNodeType.Text ? ((XText)a).Value : Parse((XElement)a))
        .Aggregate((a, b) => String.Concat(a.Trim(), b.StartsWith(".") ? String.Empty : " ", b.Trim()));
}

Poorly defined XML, get node and contents of all child nodes as string concat with spaces?

There are 4 best solutions below

Related Questions in C#

Related Questions in XML

Related Questions in C#-4.0

Related Questions in LINQ-TO-XML

Related Questions in TXMLDOCUMENT

Trending Questions

Popular # Hahtags

Popular Questions