xslt 2/3 nested group with for-each-group maintaining input order

64 Views Asked by At

There are similar questions on stackoverflow but the answers sort or group the matching items putting them out of sequence compared to the order of the input.

I have the following data

<doc>
<paragraph indent="0" stylename="heading_l1_toc" SDgroup="heading">heading-l1</paragraph>
<paragraph indent="0" stylename="heading_l2_toc" SDgroup="heading">heading-l2</paragraph>
<paragraph indent="0" stylename="list" SDgroup="list">AAA. Para 1.</paragraph>
<paragraph indent="1" stylename="list" SDgroup="list">BBB. Para 2</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para1</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para2</paragraph>
<paragraph indent="2" stylename="list" SDgroup="list">CCC. Para 3a</paragraph>
<paragraph indent="2" stylename="list" SDgroup="list">DDD. Para 3b/4</paragraph>
<paragraph indent="1" stylename="list" SDgroup="list">EEE. Para 5</paragraph>
<paragraph indent="2" stylename="continued" SDgroup="list">EEE. Continued para</paragraph>
<paragraph indent="0" stylename="list" SDgroup="list">FFF. Para 6</paragraph>
</doc>

I need to nest the paragraph[@stylename='list'} and following paragraph[@stylename='continued'] based on the list/@indent = x and the continued/@indent = x + 1.

The output should look like:

<doc>
  <paragraph indent="0" stylename="heading_l1_toc" SDgroup="heading">heading-l1</paragraph>
  <paragraph indent="0" stylename="heading_l2_toc" SDgroup="heading">heading-l2</paragraph>

  <List indent="0" maxItems="2">
    <paragraph stylename="list_unordered" SDgroup="list">AAA. Para 1.</paragraph>
      <List indent="1" maxItems="1">
        <paragraph indent="1" stylename="list_unordered" SDgroup="list">BBB. Para 2</paragraph>
        <paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para1</paragraph>
        <paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para2</paragraph>
        <List indent="2" maxItems="2">
          <paragraph indent="2" stylename="list_unordered" SDgroup="list">CCC. Para 3a</paragraph>
          <paragraph indent="2" stylename="list_unordered" SDgroup="list">DDD. Para 34</paragraph>
        </List>
        <paragraph indent="1" stylename="list_unordered" SDgroup="list">EEE. Para 5</paragraph>
        <paragraph indent="2" stylename="continued" SDgroup="list">EEE. Continued para</paragraph>
      </List>
      <paragraph indent="0" stylename="list_unordered" SDgroup="list">FFF. Para 6</paragraph>
  </List>
</doc>

I have tried many, many versions of for-each-group + group-by/adjacent and regular for-each, but end up with out of sequence output and/or correctly nested output but duplicate data (i.e. an item is correctly nested and then, at some point, the same item is output again. I believe this occurs with multiply nested for-each loops.

I'm 99.9% sure this is do'able but can't find a solution that produces what I need. Once again, I'm asking for help, which will of course, be greatly, greatly appreciated.

FYI: I have no control over the input data or means to change it - unless its within XSLT

2

There are 2 best solutions below

2
al.truisme On BEST ANSWER

Following your comment, it's a little easier to understand your output; however, I suggest that your provided output doesn't follow your description because the maxItems for the second List element (just after AAA should be 2 and not 1 since there are two list_unordered elements that are immediate children of that List element.

I've reworked my example implementation:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="3.0"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:l="local:functions"
                exclude-result-prefixes="#all">

    <xsl:output indent="yes" omit-xml-declaration="yes" />

    <xsl:mode on-no-match="shallow-copy"/>

    <xsl:template match="/doc" >
        <xsl:copy>
            <xsl:sequence select="l:processParagraphs('-1', *)" />
        </xsl:copy>
    </xsl:template>

    <xsl:function name="l:processParagraphs" as="element()*">
        <xsl:param name="indentLevel" as="xs:string"           />
        <xsl:param name="paras"       as="element(paragraph)*" />

        <xsl:choose>
            <xsl:when test="head($paras)/@stylename eq 'list'
                            and
                            xs:integer(head($paras)/@indent) gt xs:integer($indentLevel)">
                <xsl:variable name="itemsUnderList" as="element(paragraph)*"
                              select="l:nextInList(head($paras)/@indent, $paras)" />
                <xsl:variable name="processedInner" as="element()*"
                              select="l:processInner(head($paras)/@indent, $itemsUnderList)" />
                <List indent="{head($paras)/@indent}" maxItems="{count($processedInner[@stylename eq 'list_unordered'])}">
                    <xsl:sequence select="$processedInner" />
                </List>
                <xsl:sequence select="l:processParagraphs($indentLevel, $paras except $itemsUnderList)" />
            </xsl:when>
            <xsl:otherwise>
                <xsl:sequence select="l:processInner($indentLevel, $paras)" />
            </xsl:otherwise>
        </xsl:choose>
    </xsl:function>

    <xsl:function name="l:processInner" as="element()*">
        <xsl:param name="indentLevel" as="xs:string"           />
        <xsl:param name="paras"       as="element(paragraph)*" />

        <xsl:apply-templates select="head($paras)" /> <!-- to replace list stylename -->
        <xsl:if test="tail($paras)" >
            <xsl:sequence select="l:processParagraphs($indentLevel, tail($paras))" />
        </xsl:if>
    </xsl:function>

    <xsl:template match="paragraph/@stylename[. eq 'list']">
        <xsl:attribute name="stylename" select="'list_unordered'" />
    </xsl:template>

    <xsl:function name="l:nextInList" as="element(paragraph)*" >
        <xsl:param name="indentLevel" as="xs:string"           />
        <xsl:param name="paras"       as="element(paragraph)*" />

        <xsl:if test="xs:integer(head($paras)/@indent) ge xs:integer($indentLevel)" >
            <xsl:sequence select="(head($paras),
                               l:nextInList($indentLevel, tail($paras)))" />
        </xsl:if>
    </xsl:function>

</xsl:stylesheet>

which produces:

<doc>
   <paragraph indent="0" stylename="heading_l1_toc" SDgroup="heading">heading-l1</paragraph>
   <paragraph indent="0" stylename="heading_l2_toc" SDgroup="heading">heading-l2</paragraph>
   <List indent="0" maxItems="2">
      <paragraph indent="0" stylename="list_unordered" SDgroup="list">AAA. Para 1.</paragraph>
      <List indent="1" maxItems="2">
         <paragraph indent="1" stylename="list_unordered" SDgroup="list">BBB. Para 2</paragraph>
         <paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para1</paragraph>
         <paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para2</paragraph>
         <List indent="2" maxItems="2">
            <paragraph indent="2" stylename="list_unordered" SDgroup="list">CCC. Para 3a</paragraph>
            <paragraph indent="2" stylename="list_unordered" SDgroup="list">DDD. Para 3b/4</paragraph>
         </List>
         <paragraph indent="1" stylename="list_unordered" SDgroup="list">EEE. Para 5</paragraph>
         <paragraph indent="2" stylename="continued" SDgroup="list">EEE. Continued para</paragraph>
      </List>
      <paragraph indent="0" stylename="list_unordered" SDgroup="list">FFF. Para 6</paragraph>
   </List>
</doc>
2
al.truisme On

I was trying to figure out what your example was doing but I haven't quite nailed it. In case this helps somewhat, I've taken a recursive function approach.

One thing that I'm not seeing in your example are series of increasing/decreasing/increasing indentation levels, so I don't have anything to check against, but the method that I'm submitting should handle that.

As @martin-honnen mentioned in his question, there is an ambiguity about how to handle the list type w.r.t. indentation.

Also, I have no idea what the rule might be for the maxItems attribute.

For what it's worth:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:l="local:functions"
  exclude-result-prefixes="#all">
  
  <xsl:output indent="yes" omit-xml-declaration="yes" />

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:template match="/doc" >
    <xsl:copy>
      <xsl:sequence select="l:processParagraphs(head(*)/@indent, *)" />
    </xsl:copy>
  </xsl:template>
  
  <xsl:function name="l:processParagraphs" as="element()*">
    <xsl:param name="indentLevel" as="xs:string"           />
    <xsl:param name="paras"       as="element(paragraph)*" />

    <xsl:choose>
      <xsl:when test="head($paras)/@stylename eq 'list'
                      and
                      xs:integer(head($paras)/@indent) ge xs:integer($indentLevel)">
        <List indent="{head($paras)/@indent}" maxItems="???">
          <xsl:variable name="itemsUnderList" as="element(paragraph)*"
                        select="l:nextInList(head($paras)/@indent, $paras)" />
          <xsl:sequence select="l:processInner(head($paras)/@indent, $itemsUnderList)" />
          <xsl:sequence select="l:processParagraphs($indentLevel, $paras except $itemsUnderList)" />
        </List>
      </xsl:when>
      <xsl:otherwise>
        <xsl:sequence select="l:processInner($indentLevel, $paras)" />
      </xsl:otherwise>
    </xsl:choose>    
  </xsl:function>
  
  <xsl:function name="l:processInner" as="element()*">
    <xsl:param name="indentLevel" as="xs:string"           />
    <xsl:param name="paras"       as="element(paragraph)*" />
    
        <xsl:apply-templates select="head($paras)" /> <!-- to replace list stylename -->
        <xsl:if test="tail($paras)" >
          <xsl:sequence select="l:processParagraphs(head($paras)/@indent, tail($paras))" />
        </xsl:if>
  </xsl:function>

  <xsl:template match="paragraph/@stylename[. eq 'list']">
    <xsl:attribute name="stylename" select="'list_unordered'" />
  </xsl:template>
  
  <xsl:function name="l:nextInList" as="element(paragraph)*" >
    <xsl:param name="indentLevel" as="xs:string"           />
    <xsl:param name="paras"       as="element(paragraph)*" />
    
    <xsl:if test="xs:integer(head($paras)/@indent) ge xs:integer($indentLevel)" >
      <xsl:sequence select="(head($paras), 
                             l:nextInList($indentLevel, tail($paras)))" />
    </xsl:if>
  </xsl:function>
  
</xsl:stylesheet>

produces (in https://martin-honnen.github.io/xslt3fiddle/)

<doc>
   <paragraph indent="0" stylename="heading_l1_toc" SDgroup="heading">heading-l1</paragraph>
   <paragraph indent="0" stylename="heading_l2_toc" SDgroup="heading">heading-l2</paragraph>
   <List indent="0" maxItems="???">
      <paragraph indent="0" stylename="list_unordered" SDgroup="list">AAA. Para 1.</paragraph>
      <List indent="1" maxItems="???">
         <paragraph indent="1" stylename="list_unordered" SDgroup="list">BBB. Para 2</paragraph>
         <paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para1</paragraph>
         <paragraph indent="2" stylename="continued" SDgroup="list">BBB. Continued para2</paragraph>
         <List indent="2" maxItems="???">
            <paragraph indent="2" stylename="list_unordered" SDgroup="list">CCC. Para 3a</paragraph>
            <List indent="2" maxItems="???">
               <paragraph indent="2" stylename="list_unordered" SDgroup="list">DDD. Para 3b/4</paragraph>
            </List>
            <paragraph indent="1" stylename="list_unordered" SDgroup="list">EEE. Para 5</paragraph>
            <paragraph indent="2" stylename="continued" SDgroup="list">EEE. Continued para</paragraph>
         </List>
         <List indent="0" maxItems="???">
            <paragraph indent="0" stylename="list_unordered" SDgroup="list">FFF. Para 6</paragraph>
         </List>
      </List>
   </List>
</doc>