XSLT: Delete certain characters from string with nested elements

46 Views Asked by At

I have a sequence of transformations that result with a penultimate output containing a code line of the following structure:

<hi rendition="#rf-Lyricist">
   [Lyricist: <anchor xml:id="w1s"/>Some Name]<anchor xml:id="w1e"/>
</hi>

Transforming the tei:hi element into is not a problem. The square brackets surrounding the text should be transformed into a nested tei:supplied, so the desired result is:

<note type="lyricist">
   <supplied reason="provided-by-editor" cert="1" resp="#NN">
      Lyricist: <anchor xml:id="w1s"/>Some Name<anchor xml:id="w1e"/>
    </supplied>
</note>

I tried several ways of achieving this, inlcuding:

<xsl:template match="tei:body/tei:div[1]/tei:lg/tei:l/tei:hi[(@rendition='#rf-Lyricist')]">
    <!-- This method deletes all tei:anchor within the lyricist line. -->
    <xsl:analyze-string select="." regex="\[.*\]">
        <xsl:matching-substring>
            <xsl:element name="note" namespace="http://www.tei-c.org/ns/1.0">
                <xsl:attribute name="type">
                    <xsl:value-of select="'lyricist'"/>
                </xsl:attribute>
                    <xsl:element name="supplied" namespace="http://www.tei-c.org/ns/1.0">
                    <xsl:attribute name="reason">
                        <xsl:value-of select="'provided-by-editor'"/>
                    </xsl:attribute>
                    <xsl:attribute name="cert">
                        <xsl:value-of select="'1'"/>
                    </xsl:attribute>
                    <xsl:attribute name="resp">
                        <xsl:value-of select="'#ND'"/>
                    </xsl:attribute>
                    <xsl:value-of select="translate(., '[]', '')"/>
                </xsl:element>
            </xsl:element>
        </xsl:matching-substring>
    </xsl:analyze-string>
</xsl:template>

I always get a result, where not only the square brackets, but also the tei:anchor elements are deleted, such as:

<note type="lyricist">
   <supplied reason="provided-by-editor" cert="1" resp="#ND">
      Lyricist: Some Name
   </supplied>
</note>

The same method for replacing square brackets works perfectly well within a text node that has no further elements inside.

1

There are 1 best solutions below

0
Martin Honnen On

To implement "The square brackets surrounding the text should be transformed into a nested [..]supplied" I would use e.g.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:fn="http://www.w3.org/2005/xpath-functions"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:template match="hi">
    <note type="lyricist">
      <xsl:variable name="content">
        <xsl:apply-templates mode="brackets-to-elements"/>
      </xsl:variable>
      <xsl:for-each-group select="$content/node()" group-starting-with="osb">
        <xsl:choose>
          <xsl:when test="self::osb">
              <xsl:for-each-group select="tail(current-group())" group-ending-with="csb">
                <xsl:choose>
                  <xsl:when test="position() eq 1">
                    <supplied>
                      <xsl:apply-templates select="current-group()[not(position() = last())]"/>
                    </supplied>
                  </xsl:when>
                  <xsl:otherwise>
                    <xsl:apply-templates select="current-group()"/>
                  </xsl:otherwise>
                </xsl:choose>
              </xsl:for-each-group>
          </xsl:when>
          <xsl:otherwise>
            <xsl:apply-templates select="current-group()"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each-group>
    </note>
  </xsl:template>
  
  <xsl:mode name="brackets-to-elements" on-no-match="shallow-copy"/>
  
  <xsl:template match="hi//text()" mode="brackets-to-elements">
    <xsl:apply-templates select="analyze-string(., '\[|\]')" mode="analyze"/>
  </xsl:template>
  
  <xsl:template match="fn:match[. = '[']" mode="analyze">
    <osb/>
  </xsl:template>
  
  <xsl:template match="fn:match[. = ']']" mode="analyze">
    <csb/>
  </xsl:template>
  
</xsl:stylesheet>

which is the approach outlined in my comment, that is, to use one pass to find and convert square brackets in text nodes inside of hi elements to elements like e.g. osb and csb and a second pass on that intermediary tree to use standard/classic pairs of `for-each-group group-starting-with="osb"/group-ending-with="csb" to establish the wrapper element.

Output is e.g.

<note type="lyricist">
   <supplied>Lyricist: <anchor xml:id="w1s"/>Some Name</supplied><anchor xml:id="w1e"/>
</note>