I wrote the following XML query to parse the XML file into my Snowflake database:
SELECT XMLGET( prodCstmsHdr.value, 'prodCd' )
FROM (SELECT XMLDOC
FROM RAW.WE_BREXIT_TRADE_TERMS
WHERE WE_BREXIT_TRADE_TERMS_UID =
(SELECT MAX(WE_BREXIT_TRADE_TERMS_UID)
FROM RAW.WE_BREXIT_TRADE_TERMS
WHERE WE_BREXIT_TRADE_TERMS_UID = 1623
)
) btt
INNER JOIN LATERAL FLATTEN(btt.xmldoc:"$") body
INNER JOIN LATERAL FLATTEN(body.VALUE:"$") prodCstmsHdr
INNER JOIN LATERAL FLATTEN(prodCstmsHdr.VALUE:"$") prodClsfnDtl
INNER JOIN LATERAL FLATTEN(prodClsfnDtl.VALUE:"$") prodClsfnItm
INNER JOIN LATERAL FLATTEN(prodClsfnItm.VALUE:"$") prodClsfnHTSGrp
LEFT JOIN LATERAL FLATTEN(prodClsfnItm.VALUE:"$") prodMatlCntnt
WHERE 1=1
AND GET(prodCstmsHdr.value, '@') = 'prodCstmsHdr'
AND GET(prodClsfnDtl.value, '@') = 'prodClsfnDtl'
AND GET(prodClsfnItm.value, '@') = 'prodClsfnItm'
AND GET(prodClsfnHTSGrp.value, '@') = 'prodClsfnHTSGrp'
**AND GET(prodMatlCntnt.value, '@') = 'prodMatlCntnt'**
AND XMLGET( prodCstmsHdr.value, 'prodCd' ):"$"::STRING IN ( '169831A', '5700123')
with the following XML file:
<?xml version="1.0"?>
<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">
<S:Body>
<ns2:getProductClassificationUpdatesResponse xmlns:ns2="http://xmlns.int.XXXXX.com/customs/globalproductdetails/v1" xmlns:ns3="http://customs.XXXX.com/webservice/soapfault">
<prodCstmsHdr>
<brand>Brand 1</brand>
<company>company name</company>
<divNm>20</divNm>
<divisionDescription>FOOTWEAR</divisionDescription>
<gndrAgeDesc>ADULT UNISEX</gndrAgeDesc>
<gndrAgeNm>01</gndrAgeNm>
<prodCd>169831A</prodCd>
<prodClsfnDtl>
<cmpndTypeNm>IND</cmpndTypeNm>
<imprtCtryCd>EU</imprtCtryCd>
<prodClsfnItm>
<lastUpdatedDate>2020-05-13T12:15:32.102-07:00</lastUpdatedDate>
<primFlag>Y</primFlag>
<prodClsfnHTSGrp>
<fromPrc>0</fromPrc>
<fromSz>0C</fromSz>
<hts>64039111111</hts>
<prcDirection>UP</prcDirection>
<szDirection>ABOVE</szDirection>
</prodClsfnHTSGrp>
<valPctg>100</valPctg>
<valTypeNm>BY PERCENTAGE</valTypeNm>
</prodClsfnItm>
<prodTypeNm>Inline</prodTypeNm>
</prodClsfnDtl>
<prodPoDtl>
<originCountries>US</originCountries>
<uom>PR</uom>
</prodPoDtl>
<seasonNm>Season 2020</seasonNm>
<silDesc>Sil description</silDesc>
<silNm>020</silNm>
<sportActyDesc>Football</sportActyDesc>
<sportActyNm>04</sportActyNm>
</prodCstmsHdr>
<prodCstmsHdr>
<brand>Brand 1</brand>
<company>Company name</company>
<divNm>20</divNm>
<divisionDescription>FOOTWEAR</divisionDescription>
<gndrAgeDesc>WOMENS</gndrAgeDesc>
<gndrAgeNm>22</gndrAgeNm>
<prodCd>5700123</prodCd>
<prodClsfnDtl>
<cmpndTypeNm>INDIVIDUAL</cmpndTypeNm>
<imprtCtryCd>EU</imprtCtryCd>
<prodClsfnItm>
<itmTypeNm>SHOE</itmTypeNm>
<lastUpdatedDate>2020-05-07T12:03:38.933-07:00</lastUpdatedDate>
<primFlag>Y</primFlag>
<prodClsfnHTSGrp>
<fromPrc>0</fromPrc>
<fromSz>0C</fromSz>
<hts>64041111000</hts>
<htsDesc>OTHER</htsDesc>
<prcDirection>UP</prcDirection>
<szDirection>ABOVE</szDirection>
</prodClsfnHTSGrp>
<prodMatlCntnt>
<athleticFootwear>N</athleticFootwear>
<constrCharacteristicDesc>Lined</constrCharacteristicDesc>
<constrProcessDesc>V</constrProcessDesc>
<coversAnkle>Y</coversAnkle>
<factoryCode>9K</factoryCode>
<fastenerTypeDesc>Lacing</fastenerTypeDesc>
<heelStrapFlag>N</heelStrapFlag>
<liningPrimary>
<fiber1>Cotton</fiber1>
<fiber1Percent>100</fiber1Percent>
<material>Textile</material>
</liningPrimary>
<liningSecondary>
<fiber1>Synthetic Leather</fiber1>
<material>Synthetic</material>
</liningSecondary>
<measurementSummary>
<coatedLeatherPct>0.0</coatedLeatherPct>
<leatherPct>0.0</leatherPct>
<otherPct>0.0</otherPct>
<summMethodCd>A</summMethodCd>
<syntheticPct>0.0</syntheticPct>
<textilePct>100.0</textilePct>
</measurementSummary>
<midsole>
<primaryMaterial>EVA</primaryMaterial>
</midsole>
<onePieceFtw>N</onePieceFtw>
<onePieceSole>N</onePieceSole>
<openHeelFlag>N</openHeelFlag>
<openToeFlag>N</openToeFlag>
<outsole>
<primaryMaterial>Rubber</primaryMaterial>
<primaryMaterialPercent>100</primaryMaterialPercent>
</outsole>
<plugsBottom>N</plugsBottom>
<plugsSide>N</plugsSide>
<primTechFeatureDesc>Low Density Polymer</primTechFeatureDesc>
<sockliner>
<primaryMaterial>Textile</primaryMaterial>
</sockliner>
<uprLthrNm/>
<uprMajority>Textile</uprMajority>
<uprOther/>
<uprSynthetic/>
<uprTextile>
<primaryMaterial>Polyester</primaryMaterial>
<primaryMaterialPercent>100</primaryMaterialPercent>
</uprTextile>
</prodMatlCntnt>
<valPctg>100</valPctg>
<valTypeNm>BY PERCENTAGE</valTypeNm>
</prodClsfnItm>
<prodTypeNm>Inline</prodTypeNm>
</prodClsfnDtl>
<prodPoDtl>
<originCountries>VN</originCountries>
<uom>PR</uom>
</prodPoDtl>
<seasonNm>Season 2020</seasonNm>
<silDesc>Sil description</silDesc>
<silNm>020</silNm>
<sportActyDesc>CASUAL/LEISURE</sportActyDesc>
<sportActyNm>29</sportActyNm>
</prodCstmsHdr>
</ns2:getProductClassificationUpdatesResponse>
</S:Body>
</S:Envelope>
Because product 169831A is missing the node "prodMatlCntnt" the data won't show up. Removing the condition "AND GET(prodMatlCntnt.value, '@') = 'prodMatlCntnt'" will cause duplicates. This is normally not an issue but I have more nodes in my file as you can see and the query takes forever to run.
The solution for the problem as described is to avoid doing all the inner joins that create all the duplicates.
In the problem as presented there's no need to do look at the inner values. Please rephrase the problem if any of those values are really required.
Solution:
Staging: