How to skip special characters in XML when reading XML using XMLPARSE in SQL

162 Views Asked by At

I'm trying to parse an XML for reading some special nodes. However one of the nodes has user comments which contain all types of special characters including &, <, >, & etc. Unfortunately I cannot edit the XML's.

While I am using REPLACE to remove some of them, the '<' and '>' is clashing with the node delimiters.

Any suggestions from experts here? Maybe some possible solutions. Please note I am a novice so detailed guidance will be appreciated.

  1. REPLACE all special characters including '<' and '>'
  2. Skip reading any special character
  3. Limit the data I read. The comment node comes quiet far down in the XML where as I just need the top of the XML CLOB. If there is a way to limit XMLPARSE to certain number of characters, that can work but its not an ideal solution
  4. Utilize another function instead of XMLPARSE.

Here's how my script looks like:

SELECT * 
FROM TRANSACTION_DATA_TABLE tdt, trxData,
XMLTABLE
('$d/*:Request/*:AppData' PASSING XMLPARSE(DOCUMENT REPLACE(tdt.TRAN_DATA, '&', '') AS "d"
COLUMNS
FNAME CHAR(20) PATH '*:FirstName',
LNAME CHAR(20) PATH '*:LastName'
)  

TIA.

1

There are 1 best solutions below

1
Adrian Maxwell On BEST ANSWER

It would really help to have DDL and DML provided so that the problem can be treated fully, but here is something that may assist. You might also add to the fiddle(s) as an addition to your question.

Here is a simple example of how to return wanted elements of XML and then strip some characters from them - for DB2:

CREATE TABLE TRANSACTION_DATA_TABLE (
  ID INTEGER,
  TRAN_DATA XML
);
INSERT INTO TRANSACTION_DATA_TABLE (ID, TRAN_DATA) VALUES (
  1,
  XMLPARSE(DOCUMENT '<TransactionData>
              <Request>
                <AppData>
                  <FirstName>John & Freddy</FirstName>
                  <LastName>Doe</LastName>
                </AppData>
              </Request>
            </TransactionData>' PRESERVE WHITESPACE)
);
SELECT tdt.ID,
       TRANSLATE(d.FirstName, '', '&<>') AS FirstName,
       TRANSLATE(d.LastName, '', '&<>') AS LastName
FROM TRANSACTION_DATA_TABLE tdt,
     XMLTABLE('/TransactionData/Request/AppData'
              PASSING tdt.TRAN_DATA
              COLUMNS
                FirstName VARCHAR(200) PATH 'FirstName',
                LastName VARCHAR(200) PATH 'LastName') as d;
ID FIRSTNAME LASTNAME
1 John Freddy Doe

DB2 fiddle

or. use regex_replace perhaps?

SELECT tdt.ID,
       REGEXP_REPLACE(d.FirstName, '[&<>]', '') AS FirstName,
       REGEXP_REPLACE(d.LastName, '[&<>]', '') AS LastName
FROM TRANSACTION_DATA_TABLE tdt,
     XMLTABLE('/TransactionData/Request/AppData'
              PASSING tdt.TRAN_DATA
              COLUMNS
                FirstName VARCHAR2(200) PATH 'FirstName',
                LastName VARCHAR2(200) PATH 'LastName') as d;

Oracle fiddle (as I wasn't sure which db this was for)