Select first two words in a text using xmlstarlet

43 Views Asked by At

I have a simple xml file and I want to select 1) the first two occurrence words and 2) the rest in the element using xmlstarlet:

<fig> 
    <caption>Figure 1: Testing the figure</caption> 
</fig> 

The results will look like this:

Figure 1:

Testing the figure

I have tried this and it is not working:

xmlstarlet ed  -s -v substring(//fig/caption,1,string-length(substring-before(//fig/caption," "))+string-length(substring-before(substring-after(//fig/caption," ")," "))+1) input.xml

Thanks in advance.

Ofuuzo

1

There are 1 best solutions below

0
Daniel Haley On

Will the caption always contain :? If so, I'd suggest using that for the substring-before() and the substring-after()...

xmlstarlet sel -T -t -m "//fig/caption" -v "concat(normalize-space(substring-before(.,':')),':')" -n -n -v "normalize-space(substring-after(.,':'))" input.xml

This outputs:

Figure 1:

Testing the figure

If the caption does not always contain :, I'd suggest using XSLT. Here's an example...

XSLT 1.0 (test.xsl):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="/">
        <xsl:for-each select="//fig/caption">
            <xsl:call-template name="split_string">
                <xsl:with-param name="input" select="normalize-space()"/>
                <xsl:with-param name="split_at" select="2"/>
            </xsl:call-template>            
        </xsl:for-each>
    </xsl:template>
    
    <xsl:template name="split_string">
        <xsl:param name="input"/>
        <xsl:param name="sep" select="' '"/>
        <xsl:param name="split_at"/>
        <xsl:param name="curr_token" select="1"/>
        <xsl:choose>
            <xsl:when test="$split_at >= $curr_token">
                <xsl:choose>
                    <xsl:when test="substring-after($input,$sep)">
                        <xsl:value-of select="concat(substring-before($input,$sep),$sep)"/>
                        <xsl:call-template name="split_string">
                            <xsl:with-param name="input" select="substring-after($input,$sep)"/>
                            <xsl:with-param name="split_at" select="$split_at"/>
                            <xsl:with-param name="curr_token" select="$curr_token + 1"/>
                        </xsl:call-template>        
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:value-of select="$input"/>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="concat('&#xA;&#xA;',$input)"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

xmlstarlet command line:

xmlstarlet tr test.xsl input.xml

This produces the same output as above.