Issue creating a LF output from a CRLF input in an xml file for Excel using xslt

102 Views Asked by At

I have some XML that I want to transform into XML that Excel will open in a table.
But, some of my source data uses the string 
 to denote the CRLF and create a new line.
I understand that Excel only responds to the string 
 which is a LF only, to create a new line within a cell.
I have edited the result file to insert the string 
 and I get the desired effect in Excel, so I know its possible to do this.

I thought it would be simple to use a function like translate to swap the string [CRLF] to [LF]. But, I have had no success so far.

I'm limited to XSLT 1.0

Using some simple sample .xml borrowed from W3:

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
    <cd>
        <title>Empire Burlesque</title>
        <artist>Bob Dylan</artist>
        <country>USA</country>
        <company>Columbia</company>
        <price>10.90</price>
        <year>1985</year>
        <comment>Text on one line</comment>
    </cd>
    <cd>
        <title>Hide your heart</title>
        <artist>Bonnie Tyler</artist>
        <country>UK</country>
        <company>CBS Records</company>
        <price>9.90</price>
        <year>1988</year>
    </cd>
    <cd>
        <title>Greatest Hits</title>
        <artist>Dolly Parton</artist>
        <country>USA</country>
        <company>RCA</company>
        <price>9.90</price>
        <year>1982</year>
    </cd>
    <cd>
        <title>Still got the blues</title>
        <artist>Gary Moore</artist>
        <country>UK</country>
        <company>Virgin records</company>
        <price>10.20</price>
        <year>1990</year>
        <comment>Text with a break&#13; here (crlf)</comment>
    </cd>

We can see that within a <comment> the string &#13; is used.
In the .xsl itself I've used this:

<xsl:for-each select="catalog/cd">
    <!-- TABLE CONTENTS -->
    <Row>
        <!-- Title -->
        <Cell>
            <Data ss:Type="String">
                <xsl:value-of select="title"/>
            </Data>
        </Cell>
        <!-- Artist -->
        <Cell>
            <Data ss:Type="String">
                <xsl:value-of select="artist"/>
            </Data>
        </Cell>
        ...
        <!-- Comment -->
        <Cell>
            <Data ss:Type="String">
                <xsl:value-of disable-output-escaping="yes" select="translate(comment,'&#13;','&#10;')"/>
            </Data>
        </Cell>
    </Row>
</xsl:for-each>

The header of my XSL looks like this:

?xml version="1.0" encoding = "UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">
    <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
    <xsl:template match="/">
        <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">

Currently the result is like this: transformed xml viewed in Notepad++ transformed xml viewed in Excel

EDIT Settings in Excel: Wrap text enabled

Where a [CRLF] symbol has been used rather than the string &#10; which would work in Excel and display the cell with the break inside it. If I edit the result to insert the text string &#10; Excel will break the text within the cell in the desired way:

transformed xml in Notepad++ viewed in Excel

I'm stuck with XSLT 1.0, am I missing something simple here?

Many thanks in advance.

edit OS is WIN, and I'm using the XML tools plugin in Notepad++ to perform the transform while testing.

I just used some webpage to test another engine: webpagetransform I can see there is a break within the field, but its not the test string &#10; and I don't believe that Excel will use it to start a new line within a cell

3

There are 3 best solutions below

2
MrD at KookerellaLtd On BEST ANSWER

What you want is the entry for the 1st cell and not the 2nd?

<root>
   <cell>this is&#10;a line feed</cell>
   <cell>this is
a line feed</cell>
</root>

it may be helpful to look at this

if your data looks like this

<data>
    <item>
        <test>A&#13;B</test>
    </item>
</data>

this stylesheet produces the output below

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

    <xsl:variable name="lineFeed">
        <xsl:text disable-output-escaping="yes">&amp;#10;</xsl:text>
    </xsl:variable>
    <xsl:template match="/">
        <Root>
            <Cell>
                <xsl:value-of select="substring-before(data/item/test, '&#13;')"/>
                <xsl:text disable-output-escaping="yes">&amp;#10;</xsl:text>
                <xsl:value-of select="substring-after(data/item/test, '&#13;')"/>
            </Cell>
            <Cell>
                <xsl:value-of select="translate(data/item/test,'&#13;',$lineFeed)"/>
            </Cell>
            <Cell>
                <xsl:call-template name="convertToLineFeed">
                    <xsl:with-param name="string" select="data/item/test"/>
                </xsl:call-template>
            </Cell>
        </Root>
    </xsl:template>

    <xsl:template name="convertToLineFeed">
        <xsl:param name="string"/>
        <xsl:variable name="before" select="substring-before($string, '&#13;')"/>
        <xsl:choose>
            <xsl:when test="$before">
                <xsl:value-of select="$before"/>
                <xsl:text disable-output-escaping="yes">&amp;#10;</xsl:text>
                <xsl:call-template name="convertToLineFeed">
                    <xsl:with-param name="string" select="substring-after($string, '&#13;')"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$string"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

which outputs this

<Root>
  <Cell>A&#10;B</Cell>
  <Cell>A&amp;B</Cell>
  <Cell>A&#10;B</Cell>
</Root>

i.e. note that the 1st line does do what you want the 2nd one doesnt

I don't understand Michael Kay's answer, not because its wrong but because I don't have insight into the workings of the XSLT engine, so the answer is too technical for me.

In my simplistic terms, the 1st line works because its writing the line feed directly to the output, so it sidesteps some of this "normalisation" stuff, the 2nd line the linefeed is encoded in a raw text fragment and gets normalised before its written to the output (something like that).

So I think you CAN do what you want by writing a recursive template to do what the 1st cell example does, just recursively (I think), I'll do it now.


I've now added the recursive solution as line 3.

2
michael.hor257k On

This not (yet?) an answer, but I need the space for some code.

Please run the following test:

XML

<rows>
    <row>
        <name>no line breaks</name>
        <text>one line only</text>
    </row>
    <row>
        <name>line feed</name>
        <text>first line&#10;second line</text>
    </row>
    <row>
        <name>carriage return</name>
        <text>first line&#13;second line</text>
    </row>
    <row>
        <name>CRLF</name>
        <text>first line&#13;&#10;second line</text>
    </row>
</rows>

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="text">
    <xsl:copy>
        <xsl:value-of select="translate(., '&#13;', '&#10;')"/>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

You should see a result that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<rows>
    <row>
        <name>no line breaks</name>
        <text>one line only</text>
    </row>
    <row>
        <name>line feed</name>
        <text>first line
second line</text>
    </row>
    <row>
        <name>carriage return</name>
        <text>first line
second line</text>
    </row>
    <row>
        <name>CRLF</name>
        <text>first line

second line</text>
    </row>
</rows>

And if you are able to look at the hex dump of the result file, you should see this sequence at the line break of the two middle rows:

65 0A 73

representing the characters e, LF, s. And in the last row it should be:

65 0A 0A 73
3
Michael Kay On

An XML parser will always normalise line endings, so that an actual CRLF sequence in the input is equivalent to an NL (aka LF) character. But if the newlines are written as entity references, for example &#13;, this does not happen.

(Note that &#13; is CR, not CRLF as you seem to be suggesting.)

If the newline characters bypass normalization in this way, the XSLT processor will treat them as ordinary characters like any other. The serializer (which turns the XSLT result tree back into lexical XML) will turn a CR character into &#13; or some equivalent to ensure that it round-trips when re-parsed. But an NL character survives round-tripping so this special treatment is not needed.

You can convert CR characters to LF using translate(., '&#13;', '&#10;').

Converting CRLF to LF is a bit more difficult in XSLT 1.0. If you know that you only have CRLF sequences, and not CR on its own, then you can use translate(., '&#13;', '').