I have an XML file with the following Document Type Declaration:
<!DOCTYPE Credits SYSTEM "http://www.mrinitialman.com/././DTD-XSD/./mrinitialman.dtd" [
<!ENTITY fa "http://www.furaffinity.net/user/">
<!ENTITY da ".deviantart.com/">
<!ENTITY weasyl "https://www.weasyl.com/~">
<!ENTITY tweet "https://twitter.com/">
<!ENTITY wiki-en "https://en.wikipedia.org/wiki/">
<!ENTITY hymnarybio "https://hymnary.org/person/">
<!ENTITY kerkliedwiki "https://kerkliedwiki.nl/">
<!ENTITY wiki-nl "https://nl.wikipedia.org/wiki/">
]>
My PHP is simply ignoring all entity references; for example instead of seeing the attribute url="&wiki-en;Anne_Steele" and sending along "https://en.wikipedia.org/wiki/Anne_Steele", the code simply sends along "Anne_Steele", which causes my hyperlinks to not work.
So, I'd like to try a workaround in which I have PHP read all the Entities and get their text values, and go from there.
The problem is, I'm not getting anything.
For example: should I try the following:
$entityarray=array();
$xml = new DOMDocument();
$xml->LoadXML(xmlfile.xml);
$entities = $xml->doctype->entities;
for($entity_num = 0; $entity_num < $entities->length; $entity_num++){
$entity = $entities->item($entity_num);
$entity_name = $entity->nodeName;
$entityarray[$entity_name] = $xml->SaveXML($entity);
}
I get an associated array of the whole Entity (for example, <!ENTITY wiki-en "https://en.wikipedia.org/wiki/">)
If, however, I try this:
for($entity_num = 0; $entity_num < $entities->length; $entity_num++){
$entity = $entities->item($entity_num);
$entity_name = $entity->nodeName;
$entityarray[$entity_name] = $entity->nodeValue;
}
or
for($entity_num = 0; $entity_num < $entities->length; $entity_num++){
$entity = $entities->item($entity_num);
$entity_name = $entity->nodeName;
$entityarray[$entity_name] = $entity->textContent;
}
I get an associated array of empty strings. Do I need to trim the entity code down to what I want? Is there something else I should do? Or am I just wasting my time struggling with Entity references?
Addendum:
I think I found out why I was having trouble: DOMDocument::importNode() and DOMNode::appendChild() Here is the full code:
$xml = <<<'XML'
<!DOCTYPE Credits [
<!ENTITY da ".deviantart.com/">
<!ENTITY wiki-en "https://en.wikipedia.org/wiki/">
<!ENTITY wiki-nl "https://nl.wikipedia.org/wiki/">
]> <root attribute="&wiki-en;?test">&da;</root>
XML;
$xml2 = <<<'XML'
<!DOCTYPE check [
<!ENTITY wiki-nl "https://nl.wikipedia.org/wiki/">
]>
<check>&wiki-nl;</check>
XML;
$document = new DOMDocument();
$document->substituteEntities=true;
$document->loadXML($xml);
$document2 = new DOMDocument();
$document2->substituteEntities=true;
$document2->loadXML($xml2);
$document->documentElement->appendChild($document->importNode($document2->documentElement));
var_dump($document->documentElement->textContent);
var_dump($document->documentElement->getAttribute('attribute'));
var_dump($document2->documentElement->textContent);
var_dump($document->getElementsByTagName('check')->item(0)->textContent);
Output: string(16) ".deviantart.com/" string(35) "en.wikipedia.org/wiki/?test" string(30) "nl.wikipedia.org/wiki" string(0) ""
Note: The content of the imported node is blank.
If you're reading string values from a DOM (
$textContent,getAttribute()) the entities will be resolved. Here is a small example:Output:
Setting
DOMDocument::$substituteEntitieswill replace all entities during parsing.Output: