entity translation to customized entity

Question

entity translation to customized entity

54 Views Asked by Rahul At 06 July 2023 at 06:16

There are some user defined entites in the xml data. In order to unescape those entities, we are using below code:-

<xsl:stylesheet version='3.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:output method="xml" omit-xml-declaration="no" use-character-maps="mdash" />
<xsl:character-map name="mdash">
<xsl:output-character character="&#x2014;" string="&amp;mdash;"/>
<xsl:output-character character="&amp;" string="&amp;amp;" />
<xsl:output-character character="&quot;" string="&amp;quot;" />
<xsl:output-character character="&apos;" string="&amp;apos;" />
<xsl:output-character character="&#167;" string="&amp;sect;"/>
<xsl:output-character character="&#36;" string="&amp;dollar;" />
<xsl:output-character character="&#47;" string="&amp;sol;" />
<xsl:output-character character="&#45;" string="&amp;hyphen;" />
</xsl:character-map>
<!--=================================================================-->
<xsl:template match="@* | node()">
<!--=================================================================-->
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

But there is a special case where § is appearing twice in data, for example:-

Ex- The number §§ 1234

The above should example should be converted to a special userdefined entity i.e.

Output- The number &multisect; 1234

The §§ should be converted to &multisect;

Original Q&A

There are 2 best solutions below

Michael Kay On 06 July 2023 at 08:44

You can't achieve this directly in the serializer, as you can with single characters. You will either have to recognise "§§" in the transformation proper (perhaps converting it to some private-use-area character, which is then picked up by xsl:output-character), or you could do it by post-processing the output at the character-stream level.

**Martin Honnen** · Accepted Answer · 2023-07-06T13:02:01.613000

If you want to use a character map, you would first need to process text nodes where you expect the two sect characters to be present and replace them with a single character you don't expect to be used elsewhere; that character could then be converted by the map to the string &multisect; e.g. the stylesheet

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:fn="http://www.w3.org/2005/xpath-functions"
    exclude-result-prefixes="#all"
    expand-text="yes"
    version="3.0">
  
  <xsl:param name="multisect-sub" static="yes" as="xs:string" select="'«'"/>
  
  <xsl:character-map name="sub">
    <xsl:output-character _character="{$multisect-sub}" string="&amp;multisect;"/>
  </xsl:character-map>

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:output method="xml" indent="yes" use-character-maps="sub"/>
  
  <xsl:template match="text()">
    <xsl:apply-templates mode="analyze" select="analyze-string(., '&#xA7;&#xA7;')"/>
  </xsl:template>
  
  <xsl:template mode="analyze" match="fn:match">
    <xsl:text>{$multisect-sub}</xsl:text>
  </xsl:template>

</xsl:stylesheet>

transforms the input

<!DOCTYPE text [
  <!ENTITY sect "&#xA7;">
]>
<text>&sect;&sect; 1234</text>

into the output

<?xml version="1.0" encoding="UTF-8"?>
<text>&multisect; 1234</text>

Note that I used '«' primarily as an example, you might want to need to use a private char or some other character you are sure doesn't occur in your input/output data.

If you want the result to be well-formed you would also need to add a doctype to the output with e.g. xsl:output doctype-system="some.dtd" where you ensure that some.dtd declares e.g. <!ENTITY multisect "§§">

entity translation to customized entity

There are 2 best solutions below

Related Questions in XSLT

Related Questions in XML-PARSING

Related Questions in SAXON

Related Questions in SAXPARSER

Related Questions in XSLT-3.0

Trending Questions

Popular # Hahtags

Popular Questions