The following is part of a larger XML. The XML consists of multiple entries like this one. I want to retrieve some data from every entry.
<entry>
<title>S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718</title>
<link href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/$value"/>
<link rel="alternative" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/"/>
<link rel="icon" href="https://scihub.copernicus.eu/dhus/odata/v1/Products('c5525116-f3a4-4738-830e-c68e7e7f0c1c')/Products('Quicklook')/$value"/>
<id>c5525116-f3a4-4738-830e-c68e7e7f0c1c</id>
<summary>Date: 2022-01-20T10:53:51.024Z, Instrument: MSI, Satellite: Sentinel-2, Size: 991.25 MB</summary>
<ondemand>false</ondemand>
<date name="generationdate">2022-01-20T13:17:18Z</date>
<date name="beginposition">2022-01-20T10:53:51.024Z</date>
<date name="endposition">2022-01-20T10:53:51.024Z</date>
<date name="ingestiondate">2022-01-20T15:34:49.421Z</date>
<int name="orbitnumber">34368</int>
<int name="relativeorbitnumber">51</int>
<double name="cloudcoverpercentage">25.768485</double>
<str name="level1cpdiidentifier">S2A_OPER_MSI_L1C_TL_VGS2_20220120T125348_A034368_T31UFV_N03.01</str>
<str name="gmlfootprint"><gml:Polygon srsName="http://www.opengis.net/gml/srs/epsg.xml#4326" xmlns:gml="http://www.opengis.net/gml"> <gml:outerBoundaryIs> <gml:LinearRing>
<gml:coordinates>54.138373318280884,4.530701256365736 54.10530483528773,6.209260084873382 53.119880621881606,6.135320369204087 53.15178549855093,4.495379581512258 54.138373318280884,4.530701256365736</gml:coordinates> </gml:LinearRing>
</gml:outerBoundaryIs> </gml:Polygon></str>
<str name="footprint">MULTIPOLYGON (((6.135320369204087 53.119880621881606, 6.209260084873382 54.10530483528773, 4.530701256365736 54.138373318280884, 4.495379581512258 53.15178549855093, 6.135320369204087 53.119880621881606)))</str>
<str name="format">SAFE</str>
<str name="processingbaseline">03.01</str>
<str name="platformname">Sentinel-2</str>
<str name="filename">S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718.SAFE</str>
<str name="producttype">S2MSI2A</str>
<str name="platformidentifier">2015-028A</str>
<str name="platformserialidentifier">Sentinel-2A</str>
<str name="processinglevel">Level-2A</str>
<str name="datastripidentifier">S2A_OPER_MSI_L2A_DS_VGS2_20220120T131718_S20220120T105350_N03.01</str>
<str name="granuleidentifier">S2A_OPER_MSI_L2A_TL_VGS2_20220120T131718_A034368_T31UFV_N03.01</str>
<str name="identifier">S2A_MSIL2A_20220120T105351_N0301_R051_T31UFV_20220120T131718</str>
<str name="uuid">c5525116-f3a4-4738-830e-c68e7e7f0c1c</str>
</entry>
Using this code I'm able to retrieve some parts:
import xml.dom.minidom
doc = xml.dom.minidom.parse(xml_file)
nodes_id = doc.getElementsByTagName("id")
I know the list nodes_id consists of nodes (like <DOM Element: id at 0x13f7d92e5e8>), but I convert these later to the actually data.
I also would like to retrieve the cloudcoverpercentage and producttype. However, due to the existing str name= and double name= in the tag I haven't been able to figure out how to do this. I tried the following, but this doesn't seem to work.
nodes_cloudcover = doc.getElementsByTagName("cloudcoverpercentage")
nodes_cloudcover = doc.getElementsByTagName("double name='cloudcoverpercentage'")
Does someone know how I can solve this problem? Thanks in advance!
I'd suggest using a module that support xpath, which makes this task way easier.
Output: