Use PHP to get the values of custom internal XML Entities

194 Views Asked by At

I have an XML file with the following Document Type Declaration:

<!DOCTYPE Credits SYSTEM "http://www.mrinitialman.com/././DTD-XSD/./mrinitialman.dtd" [
    <!ENTITY fa "http://www.furaffinity.net/user/">
    <!ENTITY da ".deviantart.com/">
    <!ENTITY weasyl "https://www.weasyl.com/~">
    <!ENTITY tweet "https://twitter.com/">
    <!ENTITY wiki-en "https://en.wikipedia.org/wiki/">
    <!ENTITY hymnarybio "https://hymnary.org/person/">
    <!ENTITY kerkliedwiki "https://kerkliedwiki.nl/">
    <!ENTITY wiki-nl "https://nl.wikipedia.org/wiki/">
]>

My PHP is simply ignoring all entity references; for example instead of seeing the attribute url="&wiki-en;Anne_Steele" and sending along "https://en.wikipedia.org/wiki/Anne_Steele", the code simply sends along "Anne_Steele", which causes my hyperlinks to not work.

So, I'd like to try a workaround in which I have PHP read all the Entities and get their text values, and go from there.

The problem is, I'm not getting anything.

For example: should I try the following:

$entityarray=array();
$xml = new DOMDocument();
$xml->LoadXML(xmlfile.xml);
$entities = $xml->doctype->entities;
for($entity_num = 0; $entity_num < $entities->length; $entity_num++){
    $entity = $entities->item($entity_num);
    $entity_name = $entity->nodeName;
    $entityarray[$entity_name] = $xml->SaveXML($entity);
}

I get an associated array of the whole Entity (for example, <!ENTITY wiki-en "https://en.wikipedia.org/wiki/">)

If, however, I try this:

for($entity_num = 0; $entity_num < $entities->length; $entity_num++){
    $entity = $entities->item($entity_num);
    $entity_name = $entity->nodeName;
    $entityarray[$entity_name] = $entity->nodeValue;
}

or

for($entity_num = 0; $entity_num < $entities->length; $entity_num++){
    $entity = $entities->item($entity_num);
    $entity_name = $entity->nodeName;
    $entityarray[$entity_name] = $entity->textContent;
}

I get an associated array of empty strings. Do I need to trim the entity code down to what I want? Is there something else I should do? Or am I just wasting my time struggling with Entity references?


Addendum:

I think I found out why I was having trouble: DOMDocument::importNode() and DOMNode::appendChild() Here is the full code:

$xml = <<<'XML'
<!DOCTYPE Credits [
     <!ENTITY da ".deviantart.com/">
     <!ENTITY wiki-en "https://en.wikipedia.org/wiki/">
     <!ENTITY wiki-nl "https://nl.wikipedia.org/wiki/">
]> <root attribute="&wiki-en;?test">&da;</root>
XML;

$xml2 = <<<'XML'
<!DOCTYPE check [
     <!ENTITY wiki-nl "https://nl.wikipedia.org/wiki/">
]>
<check>&wiki-nl;</check>
XML;

$document = new DOMDocument(); 
$document->substituteEntities=true;
$document->loadXML($xml); 

$document2 = new DOMDocument();
$document2->substituteEntities=true;
$document2->loadXML($xml2); 
$document->documentElement->appendChild($document->importNode($document2->documentElement));


var_dump($document->documentElement->textContent);
var_dump($document->documentElement->getAttribute('attribute'));
var_dump($document2->documentElement->textContent);
var_dump($document->getElementsByTagName('check')->item(0)->textContent);

Output: string(16) ".deviantart.com/" string(35) "en.wikipedia.org/wiki/?test" string(30) "nl.wikipedia.org/wiki" string(0) ""

Note: The content of the imported node is blank.

1

There are 1 best solutions below

7
ThW On

If you're reading string values from a DOM ($textContent, getAttribute()) the entities will be resolved. Here is a small example:

$xml = <<<'XML'
<!DOCTYPE Credits [
    <!ENTITY da ".deviantart.com/">
    <!ENTITY wiki-en "https://en.wikipedia.org/wiki/">
]>
<root attribute="&wiki-en;?test">&da;</root>
XML;

$document = new DOMDocument();
$document->loadXML($xml);

// reading string values - the entities are resolved
var_dump($document->documentElement->textContent);
var_dump($document->documentElement->getAttribute('attribute'));

Output:

string(16) ".deviantart.com/"
string(35) "https://en.wikipedia.org/wiki/?test"

Setting DOMDocument::$substituteEntities will replace all entities during parsing.

$document = new DOMDocument();
// let the parser replace the entities
$document->substituteEntities = true;
$document->loadXML($xml);

echo $document->saveXML();

Output:

<?xml version="1.0"?>
<!DOCTYPE Credits [
<!ENTITY da ".deviantart.com/">
<!ENTITY wiki-en "https://en.wikipedia.org/wiki/">
]>
<root attribute="https://en.wikipedia.org/wiki/?test">.deviantart.com/</root>