I have several big archives in XML that I need to split the main node into files, and use the node's title attribute as name, eg:

<book title="ABC123" year="2000">
  <description>Some sentences...</description>
  <img src="image/cover_ABC123" />
</book>

And export it as ABC123.xml

I found a script that partially solves my help request, but it follows a numbered sequence and export files as 01.xml, 02.xml etc.; I would need to adapt it to my case, but I can't figure how:

(source: https://stackoverflow.com/a/56889282/17486393)

#!/usr/bin/env bash
xmlfile=file.xml

n=$(xmlstarlet sel -t -v 'count(//ORDER)' file.xml)
for i in $(seq 1 $n); do
   xmlstarlet sel -t -m "//ORDER[${i}]" -c . $xmlfile > "File${i}.xml"
done

I tried to add this option: -e "title" to extract also the name:

#!/usr/bin/env bash
xmlfile=file.xml

n=$(xmlstarlet sel -t -v 'count(//book -e "title")' file.xml)
for i in $(seq 1 $n); do
   xmlstarlet sel -t -m "//ORDER book -e "title"[${i}]" -c . $xmlfile > "File${i}.xml"
done

But I get:

xsl:for-each : could not compile select expression '//product -e title [1]

I tried to use this one instead, but I didn't understand this as well:

(https://stackoverflow.com/a/36156617/17486393)

$ for ((i=1; i<=`xmlstarlet sel -t -v 'count(/root/row)'  1.xml`; i++)); do \
          echo '<?xml version="1.0" encoding="UTF-8"?><root>' > NAME.xml;
          NAME=$(xmlstarlet sel -t -m '/root/row[position()='$i']' -v './NAME' 1.xml); \
          xmlstarlet sel -t -m '/root/row[position()='$i']' -c . -n 1.xml >> $NAME.xml; \
          echo '</root>' >> NAME.xml
       done

And I changed into:

$ for ((i=1; i<=`xmlstarlet sel -t -v 'count(/root/book)'  books01.xml`; i++)); do \
          NAME=$(xmlstarlet sel -t -m '/root/book[position()='$i']' -v './NAME' books01.xml); \
          xmlstarlet sel -t -m '/root/book[position()='$i']' -c . -n books01.xml >> $NAME.xml;
       done

It doesn't produce any files...

If possible I'd like to use xmlstarlet.

2

There are 2 best solutions below

4
Michael Kay On BEST ANSWER

In XSLT 2.0 or later:

<xsl:transform version="2.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"/>
      
   <xsl:template match="/">
      <xsl:for-each select="//book">
        <xsl:result-document href="{@title}.xml">
          <xsl:copy-of select="."/>
        </xsl:result-document>
      </xsl:for-each>
    </xsl:template>

</xsl:transform>

To execute this with SaxonJ from the command line (on one line):

java -jar dir/SaxonHE12-3J/saxon-he-12.3.jar 
  net.sf.saxon.Transform -s:input.xml -xsl:stylesheet.xsl 
  -o:out/output.xml -t

The resulting output.xml file will be essentially empty; the multiple files produced by xsl:result-document will be in the same directory as output.xml. The -t option logs each output file as it is written.

If you prefer a GUI tool, many popular XML editors have Saxon (or another XSLT 2.0 processor) integrated.

0
urznow On
If possible I'd like to use xmlstarlet

xmlstarlet supports the EXSLT exsl:document element which is used to create multiple result documents in an existing directory, for example:

# shellcheck  shell=sh
cat <<'HERE' | xmlstarlet transform \
  /dev/stdin -s outDir="${TMPDIR}" input.xml
<xsl:transform version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:exsl="http://exslt.org/common"
  extension-element-prefixes="exsl"
>
  <xsl:param name="outDir" select="'/tmp'"/>
  <xsl:template match="/">
    <xsl:for-each select="//book">
      <exsl:document 
        href="{concat($outDir,'/',translate(@title,'/','_'),'.xml')}" 
        method="xml" 
        omit-xml-declaration="yes"
        indent="no"
      >
        <xsl:copy-of select="."/>
      </exsl:document>
    </xsl:for-each>
  </xsl:template>
</xsl:transform>
HERE

where

  • xmlstarlet transform is an XSLT processor
  • the XSLT stylesheet is read from stdin
  • the output directory can be passed on the command line, e.g. ${TMPDIR} or file://${TMPDIR}
  • any / (slash) characters in the book title are replaced with _ (underscore)
  • if book titles are not unique only the last will survive
  • the XPath 1.0 functions concat and translate are documented here and here

Or, if you're confident that lines in the input XML file are relatively short and therefore handled by standard text tools (hint: getconf LINE_MAX), you could have xmlstarlet select add a delimiter line before each XML <book> section and use awk to split the output into separate files:

# shellcheck  shell=sh
delim=$(uuidgen)
xmlstarlet select -t \
  -m '//book' \
    -o "${delim}$(printf '\t')${TMPDIR}/" -v 'translate(@title,"/","_")' -o '.xml' -n \
    -c '.' -n \
input.xml |
awk -F'[\t\n]' -v sep="${delim}" '$1==sep{close(out);out=$2;next;}{print>out;}'