Load entities in XSLT 2?

1k Views Asked by At

I'm using an XSLT 2.0 program to process some MathML documents. In those MathMLs, there are entities like ⁡ and ⁢, that give me "entity not defined" errors. Is there a way I can process documents with these entities without loading the MathML schema? (Because Saxon-HE cannot use xsl:import-schema…)

And just to be clear, I don't need to use the entities in my XSLT; I need to process XMLs that have them.

There's an entity file for MathML like this:

<!ENTITY AElig            "&#x000C6;" ><!--LATIN CAPITAL LETTER AE -->
<!ENTITY AMP              "&#38;#38;" ><!--AMPERSAND -->
<!ENTITY Aacute           "&#x000C1;" ><!--LATIN CAPITAL LETTER A WITH ACUTE —>
...

Maybe I can somehow make use of that?

UPDATE: multiple people has mentioned that the input documents should have the correct DTD. So here's an minimal example:

The XSLT:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:m="http://www.w3.org/1998/Math/MathML">
  <xsl:output method="text"/>

  <xsl:template match="/">
    <xsl:text>aaa</xsl:text>
  </xsl:template>
</xsl:stylesheet>

The MathML with DTD declaration:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN"
    "http://www.w3.org/Math/DTD/mathml2/mathml2.dtd">
<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow> 
    <mi> sin </mi> 
    <mo> &ApplyFunction; </mo> 
    <mi> x </mi> 
  </mrow> 
</math>

Now Saxon gives me this error:

I/O error reported by XML parser processing file:/path/to/mathml.xml: unknown protocol: classpath
2

There are 2 best solutions below

1
Eiríkr Útlendi On

I've had success in the past by declaring the entities in the XSL file. For example:

<!DOCTYPE stylesheet [
<!ENTITY lsquo "<xsl:text disable-output-escaping='yes'>&amp;#x2018;</xsl:text>">
<!ENTITY rsquo "<xsl:text disable-output-escaping='yes'>&amp;#x2019;</xsl:text>">
<!ENTITY ldquo "<xsl:text disable-output-escaping='yes'>&amp;#x201C;</xsl:text>">
<!ENTITY rdquo "<xsl:text disable-output-escaping='yes'>&amp;#x201D;</xsl:text>">
]>

... added at the top of the file, just after the <?xml?> declaration and just before the <xsl:stylesheet> element. I suspect a similar approach would help in your case.

0
Michael Kay On

Just to reinforce the other answers/comments, entity expansion is the responsibility of the XML parser and has nothing to do with the XSLT processor. For the XML to be well-formed, the entities must be declared, which means you need to have an (internal or external) DTD that references them: that is, the source document must have a suitable DOCTYPE declaration.

The only contribution Saxon will make is that it makes its own EntityResolver available to the XML parser. The term "EntityResolver" is a bit of a misnomer, because it doesn't actually expand entity references like &InvisibleTimes;; all it does is to locate external DTD files to satisfy the system IDs and public IDs that appear in your DOCTYPE declaration.