Update: Context is MuleSoft and could any libs be used to solve scenarios like this.
I have an unusual requirement in that I need to accept 'Incorrect XML' within an API implementation and essentially correctly escape any control characters in areas of the XML where they should not be, i.e in attributes or on the element data, of which they can occur anywhere.
This is to prevent APIKit/Schema validation errors initially, as well as further DW transforms that will expect valid XML.
Tried to portray a simple example below:
<CARS>
<CAR>
<MODEL ALIAS="City & Co">alpha city</MODEL>
<YEAR>1992</YEAR>
<MANAFACTURER>Penguin</MANAFACTURER>
<OTHER>Made in UK & US</OTHER>
</CAR>
<CAR>
<MODEL ALIAS="City & Co" MAKE="BMW">venturi city</MODEL>
<YEAR>1994</YEAR>
<MANAFACTURER>Penguin</MANAFACTURER>
<OTHER>BHP > 1000</OTHER>
</CAR>
</CARS>
Is there any easy to parse XML in DW or external lib and essentially correctly escape control characters like & and < >?
Whatever is generating that, is not generating XML, but a string with similar formatting than XML. The thing is that no standard compliant parser will parse invalid XML like the example provided. You can try to hack it with string manipulation in DataWeave, Groovy, Java, etc. but not as XML until special characters are correctly escaped. It's difficult to cover all possible cases in that way. Maybe it would be easier to enclose each value in a cdata section.
The real solution would be to generate valid XML at the source.