We want to comment the <xref> tags if section/@id is not matched in the same folder xml files or other folder xml files:

Each folder have multiple/single xml files and <xref> and section/@id tags will be in the all xml files

  1. This <xref href="aag-dep1_1.01">some text here</xref> tag in the "aat-fti" folder but section in the eat-rw folder, if matched both xref/@href and section/@id then keep value xref tag as it is, it not matched then comment the xref tags keep the text as plain.

Please help and suggest, Thanks

See the below folder structure in screenshot, each folder have xml files but xref/@href and sectin/@id any xml files pr with in documents: folder structure from input

Input xml with section/@id in the aa-fti folder

<?xml version="1.0" encoding="UTF-8"?>
<book id="book_id">
    <title>Generally Accepted Accounting Principles</title>
    <chapter id="chapter_id" role="ls_level_2">
        <chapterinfo>
            <titleabbrev>Chapter_abb</titleabbrev>
            <title>Chapter_title</title>
        </chapterinfo>
        <section id="aag-dep1_01">
            <para>text here</para>
            <para>text here</para>
            <para>containing auditing <xref href="fir_56_10">some text here</xref> guidance related to generally accepted auditing standard</para>
            <para> the effective dates for FASB ASU No. 2018 <xref href="aag-dep1_1.01">some text here</xref></para>
            <section id="aag-dep1_1.01">
                <para>text here <xref href="fot_79_ut">some text here</xref></para>
                <para>text here</para>
                <para>text here<xref href="aag-dep1_01">some text here</xref></para>
            </section>
            <section id="aag-dep1_2.01">
                <para>text here</para>
                <para>text here</para>
                <para>text here <xref href="aag-dep1_02">some text here</xref></para>
            </section>
        </section>
        <section id="aag-dep1_02">
            <para>text here</para>
            <para>text here</para>
            <para>ces, including engagements for entities in specia</para>
            <para>example, a large calendar-year public insurance en</para>
            <section id="aag-dep1_1.02">
                <para>text here</para>
                <para>text here</para>
                <para>text here <xref href="tih52_23">some text here</xref></para>
            </section>
        </section>
        <section id="aag-dep1_regulation_and_oversight">
            <para>text <xref href="aag-dep1_1.02">some text here</xref> here</para>
            <para>text here</para>
            <para>early application may do so as of the beginning</para>
            <para>Other auditing publications have no authoritative status;</para>
            <section id="aag-dep1_08">
                <para>text here <xref href="aag-dep1_regulation_and_oversight">some text here</xref></para>
                <para>text <xref href="nov1_22">some text here</xref> here</para>
                <para>text here</para>
            </section>
        </section>
    </chapter>
</book>

Input xml file with xref/@href in the eat-rw folder

<?xml version="1.0" encoding="UTF-8"?>
<book id="book_id">
    <title>Generally Accepted Accounting Principles</title>
    <chapter id="chapter_id" role="ls_level_2">
        <chapterinfo>
            <titleabbrev>Chapter_abb</titleabbrev>
            <title>Chapter_title</title>
        </chapterinfo>
        <section id="aag-dep1_01">
            <para>text here</para>
            <para>text here</para>
            <para>containing auditing <!--<xref href="fir_56_10">-->some text here<!--</xref>--> guidance related to generally accepted auditing standard</para>
            <para> the effective dates for FASB ASU No. 2018 <xref href="aag-dep1_1.01">some text here</xref></para>
            <section id="aag-dep1_1.01">
                <para>text <!--<xref href="fot_79_ut">-->some text here<!--</xref>--> here</para>
                <para>text here</para>
                <para>text here<xref href="aag-dep1_01">some text here</xref></para>
            </section>
            <section id="aag-dep1_2.01">
                <para>text here</para>
                <para>text here</para>
                <para>text here <xref href="aag-dep1_02">some text here</xref></para>
            </section>
        </section>
        <section id="aag-dep1_02">
            <para>text here</para>
            <para>text here</para>
            <para>ces, including engagements for entities in specia</para>
            <para>example, a large calendar-year public insurance en</para>
            <section id="aag-dep1_1.02">
                <para>text here</para>
                <para>text here</para>
                <para>text here <!--<xref href="tih52_23">-->some text here<!--</xref>--></para>
            </section>
        </section>
        <section id="aag-dep1_regulation_and_oversight">
            <para>text <xref href="aag-dep1_1.02">some text here</xref> here</para>
            <para>text here</para>
            <para>early application may do so as of the beginning</para>
            <para>Other auditing publications have no authoritative status;</para>
            <section id="aag-dep1_08">
                <para>text here <xref href="aag-dep1_regulation_and_oversight">some text here</xref></para>
                <para>text <!--<xref href="nov1_22">-->some text here<!--</xref>--> here</para>
                <para>text here</para>
            </section>
        </section>
    </chapter>
</book>
1

There are 1 best solutions below

3
Martin Honnen On

Using XSLT 3 with Saxon 9.9, if you put the following XSLT in the parent folder of all the subfolders you have shown, it processes all *.xml files recursively and writes the transformed result to e.g. subfoldername-output:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="#all"
  expand-text="yes">
  
  <xsl:param name="collection-uri" select="'?select=*.xml;recurse=yes'"/>
  
  <xsl:param name="collection-docs" select="collection($collection-uri)"/>
  
  <xsl:key name="section" match="section" use="@id"/>

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:template name="xsl:initial-template">
    <xsl:apply-templates select="$collection-docs"/>
  </xsl:template>
  
  <xsl:template match="/">
    <xsl:variable name="result-uri" select="let $uri-tokens := tokenize(base-uri(), '/') return string-join(($uri-tokens[position() lt last() - 1], $uri-tokens[last() - 1] || '-output', $uri-tokens[last()]), '/')"/>
    <xsl:result-document href="{$result-uri}">
      <xsl:apply-templates/>
    </xsl:result-document>
  </xsl:template>

  <xsl:template match="xref[not(some $doc in $collection-docs satisfies key('section', @href, $doc))]">
    <xsl:comment>&lt;xref href="<xsl:value-of select="@href"/>"&gt;</xsl:comment><xsl:apply-templates/><xsl:comment>&lt;/xref&gt;</xsl:comment>
  </xsl:template>
  
</xsl:stylesheet>

Start the processing with the initial template and no input file. I think the approach should work with a simple folder structure (e.g. the parent folder with the XSLT contains one level of different subfolders with XML documents to be processed) but make sure you test that carefully on some test sample data before using it on your full folder set.