I have strings inside XML that contain low-order ASCII. Specifically, the EDI contains special characters like char(28), char(29), char(30) (aka File Separator/Group Separator/Record Separator). The way the devs encode it in the XML string, these characters get changed to "", where the x1C is the hex representation (char(28), file separator). We then use a CLR to convert it back to the original text to store in a table, with the low-ascii characters in it (we don't store the XML, just the data within, as fields)
However, I need to do this in Azure SQL DB, which doesn't support CLR.
I'm unsure if the best/fastest way to do this is to write the XQuery to extract it in the correct form, or just use XQuery "value" and post-process it with T-SQL: nested REPLACE, CROSS/OUTER APPLY with REPLACE, or even possibly STUFF (TRANSLATE won't work because it's multiple characters).
Looking online, I can see where other versions of XQuery support fn:replace or bin:decode-string, but it doesn't appear that SQL Server's XQuery supports these methods, so I'm trying to figure out how I can do this ("replace value of", maybe?).
DECLARE @xml_edi XML = '<Bundle><RawData>ABCDE&#x1E;&#x1C;FGHIJK&#x1D;LMNOP</RawData></Bundle>'
SELECT x.y.value('./RawData[1]','varchar(max)')
from @xml_edi.nodes('/Bundle')x(y)
Expected results:
ABCDEFGHIJKLMNOP --note that this shows all 3 as the same character, that's a limitation of the browser/editor
(ABCDE, then char(30) & char(28), then FGHIJK, then the low-ascii character 29, then LMNOP)
Some links I found while writing this:
- Function to convert hex to ascii in xquery
- SQL Nested Replace
- https://www.sqlshack.com/overview-of-the-sql-replace-function/ (TRANSLATE)
- https://dba.stackexchange.com/questions/201851/replace-character-without-using-looped-replace-function (interesting use of STUFF)
- https://bertwagner.com/posts/how-to-eliminate-ugly-nested-replace-functions/ (CROSS APPLY)
This is probably far from efficient, and it also assumes that no other escape sequences can appear in the
varcharvalue obtained from the XML, such as having&amp;due to having a double escaped ampersand (&). If, however, that is true, then we can likely do some varbinary (ab)use.First, I get the value of the text from the XML. Then I split that string into a single character on each row, with use of
GENERATE_SERIESandSUBSTRINGand work out if the character is in an escape sequence by counting the number of amerpsand (&) and semicolon (;) characters with different windows.Then I check if there is a semicolon in that group; if there is, it's an escape group. Finally, I reaggregate the string, omitting certain characters in the escape sequences, however, I aggregate the string in a binary form; converting non-escape sequences to a binary value (such as
'41'for'a') while leaving escaped sequences as they were. I then convert that value to an actualvarbinaryand thenCONVERTit back to avarchar. Yuck, but it works:I intentionally have partition and groups on (what is in effect) the value of the
RawDataelement is so that if this needs to be used against a column in a table, it scales. Though, if 2 (or more) elements have the same value, then a different (unique) identifier would need to be used.