The new version of the question (2023-10-01)
General overview
I try to make the table of content (TOC) of a document by picking only his title nodes like h1, h2… h9 (h[0-9]), and so delete all the others nodes outside the title nodes.
I tried to use the match() statement who is only available on XSLT2, that’s why I use Saxon.
For, the moment I have the following MWE:
Minimal working example (MWE)
document.xml
<?xml version="1.0" encoding="UTF-8"?>
<document>
<h1>Lorem ipsum dolor</h1>
<h2>Lorem ipsum dolor</h2>
<p>
Sed ut <i>perspiciatis</i> unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?
</p>
<h1>sit amet et consectetur</h1>
<h2>Quia adipit</h2>
<p>
Sed ut <i>perspiciatis</i> unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?
</p>
</document>
maketoc.xslt
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="text"/>
<xsl:template match="*[not(*) and not(matches(name(), '^h[0-9]$'))]">
</xsl:template>
</xsl:stylesheet>
The conversion command
saxon-xslt -o output.txt document.xml maketoc.xslt
The rendering when I execute the command
Lorem ipsum dolor
Lorem ipsum dolor
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?
sit amet et consectetur
Quia adipit
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?
The problem
As you see, in the rendering, all the nodes still remain. I am not able to delete the nodes who are not matching h[0-9].
In the match statement I used *[not(matches(name(), '^h[0-9]$'))] or *[not(*) and not(matches(name(), '^h[0-9]$'))] as suggested by Michael Kay with the same result.
An intermediate solution
I finaly get the following maketoc.xslt who can isolate just title nodes:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"/>
<!-- Template to match and output title nodes (h1 to h9) -->
<xsl:template match="h1 | h2 | h3 | h4 | h5 | h6 | h7 | h8 | h9">
<xsl:value-of select="."/>
<xsl:text> </xsl:text> <!-- Add a newline after each title -->
<xsl:apply-templates select="node()"/> <!-- Process child nodes if needed -->
</xsl:template>
<!-- Template to skip all other nodes -->
<xsl:template match="node()">
<xsl:apply-templates select="node()"/> <!-- Process child nodes recursively -->
</xsl:template>
</xsl:stylesheet>
But, as you see, he dosen’t really use a regex like h[0-9]. I have to explicitly cite each h0, h1,… h9 possibility. When the goal is just to match it with a regex.
The question
So, how to delete all the nodes who doesn’t match h[0-9] regex?
The old version of the question
In order to make a TOC for a document, I search to catch only the h[0-9] nodes and delete all other nodes who haven’t theire place in the TOC.
So, in XSLT2, I made the following lines:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="text"/>
<xsl:template match="*[not(matches(name(), '^h[0-9]$'))]"> <!-- This is the relevant line -->
<!-- I let it empty in order to delet its content -->
</xsl:template>
</xsl:stylesheet>
I compile it using saxon-xslt -o output.txt example.xml maketoc.xslt
But, unfortunatly, the not(matches()) template doesn’t affect the target lines. And it basically affect no line.
So, how to delete all the nodes who doesn’t match h[0-9] regex?
You haven't shown your source document, but the most likely explanation is that the outermost element of the source document has a name that doesn't match h[0-9] which means that the element will be deleted and its children will not be processed.
Perhaps you should add a rule
to ensure that processing continues to the children of such an element.
Or you could change the pattern for elements that you want to delete to