Retrieve data from xml when there is a type in the tag

47 Views Asked by At

The following is part of a larger XML. The XML consists of multiple entries like this one. I want to retrieve some data from every entry.

<entry>
   <title>S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718</title>
   <link href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/$value"/>
   <link rel="alternative" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/"/>
   <link rel="icon" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/Products('Quicklook')/$value"/>
   <id>c5525116-f3a4-4738-830e-c68e7e7f0c1c</id>
   <summary>Date: 2022-01-20T10:53:51.024Z, Instrument: MSI, Satellite: Sentinel-2, Size: 991.25 MB</summary>
   <ondemand>false</ondemand>
   <date name="generationdate">2022-01-20T13:17:18Z</date>
   <date name="beginposition">2022-01-20T10:53:51.024Z</date>
   <date name="endposition">2022-01-20T10:53:51.024Z</date>
   <date name="ingestiondate">2022-01-20T15:34:49.421Z</date>
   <int name="orbitnumber">34368</int>
   <int name="relativeorbitnumber">51</int>
   <double name="cloudcoverpercentage">25.768485</double>
   <str name="level1cpdiidentifier">S2A_OPER_MSI_L1C_TL_VGS2_20220120T125348_A034368_T31UFV_N03.01</str>
   <str name="gmlfootprint"><gml:Polygon srsName="http://www.opengis.net/gml/srs/epsg.xml#4326" xmlns:gml="http://www.opengis.net/gml"> <gml:outerBoundaryIs> <gml:LinearRing> 
   <gml:coordinates>54.138373318280884,4.530701256365736 54.10530483528773,6.209260084873382 53.119880621881606,6.135320369204087 53.15178549855093,4.495379581512258 54.138373318280884,4.530701256365736</gml:coordinates> </gml:LinearRing> 
   </gml:outerBoundaryIs> </gml:Polygon></str>
   <str name="footprint">MULTIPOLYGON (((6.135320369204087 53.119880621881606, 6.209260084873382 54.10530483528773, 4.530701256365736 54.138373318280884, 4.495379581512258 53.15178549855093, 6.135320369204087 53.119880621881606)))</str>
   <str name="format">SAFE</str>
   <str name="processingbaseline">03.01</str>
   <str name="platformname">Sentinel-2</str>
   <str name="filename">S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718.SAFE</str>
   <str name="producttype">S2MSI2A</str>
   <str name="platformidentifier">2015-028A</str>
   <str name="platformserialidentifier">Sentinel-2A</str>
   <str name="processinglevel">Level-2A</str>
   <str name="datastripidentifier">S2A_OPER_MSI_L2A_DS_VGS2_20220120T131718_S20220120T105350_N03.01</str>
   <str name="granuleidentifier">S2A_OPER_MSI_L2A_TL_VGS2_20220120T131718_A034368_T31UFV_N03.01</str>
   <str name="identifier">S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718</str>
   <str name="uuid">c5525116-f3a4-4738-830e-c68e7e7f0c1c</str>
</entry>

Using this code I'm able to retrieve some parts:

import xml.dom.minidom

doc = xml.dom.minidom.parse(xml_file)
nodes_id = doc.getElementsByTagName("id")

I know the list nodes_id consists of nodes (like <DOM Element: id at 0x13f7d92e5e8>), but I convert these later to the actually data.

I also would like to retrieve the cloudcoverpercentage and producttype. However, due to the existing str name= and double name= in the tag I haven't been able to figure out how to do this. I tried the following, but this doesn't seem to work.

nodes_cloudcover = doc.getElementsByTagName("cloudcoverpercentage")
nodes_cloudcover = doc.getElementsByTagName("double name='cloudcoverpercentage'") 

Does someone know how I can solve this problem? Thanks in advance!

1

There are 1 best solutions below

0
white On

I'd suggest using a module that support xpath, which makes this task way easier.

import xml.etree.ElementTree as ET


xml_file = "xml_imp.xml"

tree = ET.parse(xml_file)
root = tree.getroot()
cloud = root.find(".//double[@name='cloudcoverpercentage']")
# this reads as: find an element "double" that has an attribute "name" which has content "cloudpercentage"
product = root.find(".//str[@name='producttype']")


print(cloud.text)
print(product.text)

Output:

25.768485
S2MSI2A