From the documentation of XML::Simple:
The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces. In particular, XML::LibXML is highly recommended.
The major problems with this module are the large number of options and the arbitrary ways in which these options interact - often with unexpected results.
Can someone clarify for me what the key reasons for this are?
The real problem is that what
XML::Simpleprimarily tries to do is take XML, and represent it as a perl data structure.As you'll no doubt be aware from
perldatathe two key data structures you have available is thehashand thearray.And XML doesn't do either really. It has elements which are:
And these things don't map directly to the available perl data structures - at a simplistic level, a nested hash of hashes might fit - but it can't cope with elements with duplicated names. Nor can you differentiate easily between attributes and child nodes.
So
XML::Simpletries to guess based on the XML content, and takes 'hints' from the various option settings, and then when you try and output the content, it (tries to) apply the same process in reverse.As a result, for anything other than the most simple XML, it becomes unwieldy at best, or loses data at worst.
Consider:
This - when parsed through
XML::Simplegives you:Note - now you have under
parent- just anonymous hashes, but underanother_nodeyou have an array of anonymous hashes.So in order to access the content of
child:Note how you've got a 'child' node, with a 'content' node beneath it, which isn't because it's ... content.
But to access the content beneath the first
another_childelement:Note how - because of having multiple
<another_node>elements, the XML has been parsed into an array, where it wasn't with a single one. (If you did have an element calledcontentbeneath it, then you end up with something else yet). You can change this by usingForceArraybut then you end up with a hash of arrays of hashes of arrays of hashes of arrays - although it is at least consistent in it's handling of child elements. Edit: Note, following discussion - this is a bad default, rather than a flaw with XML::Simple.You should set:
If you apply this to the XML as above, you get instead:
This will give you consistency, because you will no longer have single node elements handle differently to multi-node.
But you still:
E.g.:
You still have
contentandchildhash elements treated as if they were attributes, and because hashes are unordered, you simply cannot reconstruct the input. So basically, you have to parse it, then run it throughDumperto figure out where you need to look.But with an
xpathquery, you get at that node with:What you don't get in
XML::Simplethat you do inXML::Twig(and I presumeXML::LibXMLbut I know it less well):xpathsupport.xpathis an XML way of expressing a path to a node. So you can 'find' a node in the above withget_xpath('//child'). You can even use attributes in thexpath- likeget_xpath('//another_child[@different_att]')which will select exactly which one you wanted. (You can iterate on matches too).cutandpasteto move elements aroundparsefile_inplaceto allow you to modifyXMLwith an in place edit.pretty_printoptions, to formatXML.twig_handlersandpurge- which allows you to process really big XML without having to load it all in memory.simplifyif you really must make it backwards compatible withXML::Simple.It's also widely available - easy to download from
CPAN, and distributed as an installable package on many operating systems. (Sadly it's not a default install. Yet)See: XML::Twig quick reference
For the sake of comparison:
Vs.