Can't access XML node using xpath (namepace issue?)

356 Views Asked by At

I have a small part of a xml export from a cms called tridion and I would like to parse this information using php.

I tried using DOMDocument and DOMXPath to access the data but fail to retrieve the required information.

For example when I try to access the node title from my example data I don't get any result.

$xmlDoc = new DOMDocument();
$xmlDoc->load($xmlFilePath);

$xpath = new DOMXPath($xmlDoc);
$xpath->registerNamespace('tcm', 'http://www.tridion.com/ContentManager/5.0');
$xpath->registerNamespace('xmlns', 'http://www.w3.org/1999/xlink');
$result = $xpath->query('title');

I believe this is a namespace issue but I don't really understand how to handle it.

This is what the export files look like (somewhat shortened for readability):

<PackageItem xmlns:tcm="http://www.tridion.com/ContentManager/5.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.sdltridion.com/ContentManager/ImportExport/Package/2013">
  <PrimaryBlueprintParentUrl>/webdav/Content%20%28en%29/Content/120_external%20Links/Services/EL_www%2some-domin%2Ecom.xml</PrimaryBlueprintParentUrl>
  <Data>
    <tcm:Data>
      <tcm:Title>EL_www.some-domain.com</tcm:Title>
      <tcm:Type>Normal</tcm:Type>
      <tcm:Schema xlink:type="simple" xlink:title="External Link (EL)" xlink:href="/webdav/Content/System/Schemas/Common/External%20Link%20%28EL%29.xsd" IsMandatory="false" />
      <tcm:Content>
        <externallink xmlns="uuid:D612E2C9-CD2E-4CD8-9FAE-3826311343A8">
          <title>www.some-domain.com</title>
          <url xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.some-domain.com" />
        </externallink>
      </tcm:Content>
    </tcm:Data>
  </Data>
</PackageItem>
2

There are 2 best solutions below

2
Nigel Ren On BEST ANSWER

The <externallink> element just before it defines the default namespace for it and <title> element as xmlns="uuid:D612E2C9-CD2E-4CD8-9FAE-3826311343A8". So if you define this namespace (I just use a dummy one - d) and then use this in your expression...

$xpath->registerNamespace('d', "uuid:D612E2C9-CD2E-4CD8-9FAE-3826311343A8");
$result = $xpath->query('//d:title');

Update...

For the url...

$result = $xpath->query('//d:url');

echo $xmlDoc->saveXML($result[0]);

Gives...

<url xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.some-domain.com"/>

As this doesn't have a value as such (I've just said to output the XML of the first node found), not sure what you need out of it.

If you just want the href attribute...

echo $result[0]->getAttribute("xlink:href");
0
ThW On

You missed registering aliases for the right namespaces. Here is an namespace definition for on the element externallink for the namespace uuid:D612E2C9-CD2E-4CD8-9FAE-3826311343A8. The XML parser understands that node as {uuid:D612E2C9-CD2E-4CD8-9FAE-3826311343A8}externallink and the title child element as {uuid:D612E2C9-CD2E-4CD8-9FAE-3826311343A8}title. The 3 following examples all all resolve to a title node like that:

  • <title xmlns="uuid:D612E2C9-CD2E-4CD8-9FAE-3826311343A8"/>
  • <t:title xmlns:t="uuid:D612E2C9-CD2E-4CD8-9FAE-3826311343A8"/>
  • <el:title xmlns:el="uuid:D612E2C9-CD2E-4CD8-9FAE-3826311343A8"/>

By registering aliases on the DOMXpath instance you allow it to do the same for the expression.

$xpath->registerNamespace('e', 'uuid:D612E2C9-CD2E-4CD8-9FAE-3826311343A8');

e:title -> {uuid:D612E2C9-CD2E-4CD8-9FAE-3826311343A8}title

Xpath 1.0 has no default namespace, so you will have to register an alias for any namespace you want to use in the expression.

However e:title would look for child nodes only. To look at any node in the document use //e:title. The starting / anchors the expression to the document itself (and not the current context node). The second / changes the axis from child to descendant. Use string() to cast the first matched node into a string and return it:

$xpath = new DOMXPath($xmlDoc);
$xpath->registerNamespace('e', 'uuid:D612E2C9-CD2E-4CD8-9FAE-3826311343A8');
var_dump($xpath->evaluate('string(//e:title)'));

Output:

string(19) "www.some-domain.com"

DOMXpath::query() can only return node lists, DOMXpath::evaluate() can return scalar values as well.

For the xlink:href attribute you need to register that namespace as well:

$xpath = new DOMXPath($xmlDoc);
$xpath->registerNamespace('e', 'uuid:D612E2C9-CD2E-4CD8-9FAE-3826311343A8');
$xpath->registerNamespace('xlink', 'http://www.w3.org/1999/xlink');
var_dump($xpath->evaluate('string(//e:url/@xlink:href)'));

Output:

string(26) "http://www.some-domain.com"