How to get the superset of all XML paths from an XML column in a table

53 Views Asked by At

I have a table with an XML column. I have no data dictionary for the column or definition for the XML structure. There are about 1.5 million records. From visual inspection of a small sample, the structure of the XML appears to be fairly similar. What I'd like to be able to do is to run some SQL that will give me the superset of XML structure across all the records.

The table is called Remittance and the column is called RemittInstr.

For example, if I have two test data records whose XML value in the RemittInstrcolumn is:

row 1:

<ri>
    <Msg Type="MT103">
        <AccountNo>12345678</AccountNo>
        <Description code="ORCSR">Ordering Customer<Description>
    </Msg>
</ri>

row 2:

<ri>
    <Msg Type="MT202">
        <BICFI>ABCD1234</BICFI>
        <Description code="FI">Financial Institution<Description>
    </Msg>
</ri>

How do I write a query that will return the following rows:

/ri
/ri/Msg
/ri/Msg/@Type
/ri/Msg/AccountNo
/ri/Msg/BICFI
/ri/Msg/Description
/ri/Msg/Description/@code

So that I can get a full picture of the structure of the XML across all the rows?

Edit: This is sufficiently different to the linked questions as the linked questions only deal with a single XML value. The question is about finding the structure of the XML across the rows in an entire table. The solution to doing this is sufficiently different. @Charlieface has provided a link to a good solution in his comments below.

1

There are 1 best solutions below

0
Charlieface On BEST ANSWER

You can use a recursive CTE for this. I would warn you that this is likely to be very slow on 1.5m rows.

It's a bit complicated to get attribute names, because it doesn't seem you can do .nodes('@*') to get all attributes.

Instead you need to CROSS APPLY a union of the node name and its atttributes.

WITH cte AS (
    SELECT
      xpath = r.RemittanceInstr.value('local-name(.)','nvarchar(max)'),
      child = r.RemittanceInstr
    FROM Remittance r

    UNION ALL

    SELECT
      xpath = CONCAT(cte.xpath, '/', v2.name),
      v2.child
    FROM cte
    CROSS APPLY cte.child.nodes('*') x(nd)
    CROSS APPLY (VALUES (x.nd.value('local-name(.)','nvarchar(max)') )) v(name)
    CROSS APPLY (
        SELECT
          v.name,
          x.nd.query('*')

        UNION ALL

        SELECT
          CONCAT(v.name, '/@', x2.attr.value('local-name(.)','nvarchar(max)')),
          NULL
        FROM x.nd.nodes('@*') x2(attr)
    ) v2(name, child)
)
SELECT DISTINCT
  xpath
FROM cte;

db<>fiddle