Is there a way to disregard a referenced dtd when running an xslt?

71 Views Asked by At

When I run the following templates using Saxon in Oxygen:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    expand-text="yes"
    version="3.0">    
    <xsl:output indent="yes" method="xml" omit-xml-declaration="no" encoding="utf-8"/>
    
    <xsl:template match="/">   
        <xsl:text>&#xa;</xsl:text>
        <xsl:apply-templates select="*"/>
    </xsl:template>
    
    <xsl:template match="*">
<!-- On Martin's suggestion I should use node-name instead of name, so I have changed this, but the result is the same. -->
        <xsl:value-of select="node-name()"/><xsl:text>&#xa;</xsl:text>
        <xsl:apply-templates select="*"/>
    </xsl:template>
</xsl:stylesheet>

On this xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ddn PUBLIC "-//S1000D//DTD Data Dispatch Note 20050501//EN//XML" "http://www.s1000d.org/s1000d_2-2/xml_dtd/ddn/dtd/ddn.dtd">
<ddn>
    <ddnc>
        <modelic>ABC</modelic>
        <sendid>AASSD</sendid>
        <recvid>VVBBN</recvid>
        <diyear>2024</diyear>
        <seqnum>00001</seqnum>
    </ddnc>
</ddn>

I get the following output:

<?xml version="1.0" encoding="utf-8"?>
ddn
ddnc
modelic
sendid
recvid
diyear
seqnum

So clearly (to me at least), the transform knows the element names.

If I change the template matching the elements to:

<xsl:template match="ddn">
     <xsl:value-of select="node-name()"/><xsl:text>&#xa;</xsl:text>
     <xsl:apply-templates select="*"/>
</xsl:template>

I get the following which doesn't include any element names:

<?xml version="1.0" encoding="utf-8"?>
        ABC
        AASSD
        VVBBN
        2024
        0000

If I remove the doctype declaration and run the same transformation I get:

<?xml version="1.0" encoding="utf-8"?>
ddn
        ABC
        AASSD
        VVBBN
        2024
        00001

So the root ddn is now found. Conclusion is that the dtd is used by the transformation.

I would rather disregard the dtd rather than trying to correct something in it since the dtd isn't mine to begin with. I just need to transform the content of the file I got, and the xml that is included in this question is only a small fragment of an actual file, but the problem is the same no matter the content.

So how can I get around this problem? Do I need to add some namespace to my rules (although the name function didn't produce anything like that), or can I tell Saxon to disregrd the dtd? It looks as if this is the default in the settings, but I suspect there is something else I am missing here.

I have tried the same transform using XMLSpy with the built in xslt engine, and it behaves in the same way.

If I add * as namespace like this: If I change the template matching the elements to:

<xsl:template match="*:ddn">
     <xsl:value-of select="node-name()"/><xsl:text>&#xa;</xsl:text>
     <xsl:apply-templates select="*"/>
</xsl:template>

I get:

<?xml version="1.0" encoding="utf-8"?>
ddn
        ABC
        AASSD
        VVBBN
        2024
        00001

So this works, but why?!?

Suggestions?

2

There are 2 best solutions below

0
Martin Honnen On BEST ANSWER

I downloaded some DTD and it has

<!ELEMENT ddn  (rdf:Description?,ddnc,issdate,security,datarest?,
                dispto,dispfrom,authrtn,mediaid?,remarks?,delivlst?) >
<!ATTLIST ddn
      id            ID      #IMPLIED
      xmlns         CDATA   #FIXED  "http://www.s1000d.org/ddn"
          %RDFDCATT; >

so based on that for any non-prefixed elements I would expect that declaring xpath-default-namespace="http://www.s1000d.org/ddn" in the XSLT allows you to select and/or match elements like ddn, ddnc, issdate etc.

1
Michael Kay On

An XSLT processor will always ask the parser to read the DTD because it is needed to resolve entity references - even though there might not be any. Although the DTD is being read, Saxon won't ask the parser to validate against the DTD unless you specifically request this.

This particular DTD does something I consider fairly horrible - it injects a namespace into the document in the form of a default attribute value. That has the effect of changing the namespace of all the unqualified elements in the document; so transforming the document with the DTD will behave completely different from the same transformation without.

You can't ask Saxon (or the underlying XML parser) to ignore the DOCTYPE, however you can substitute a harmless DTD for the real one by nominating an EntityResolver.