I have an xml file from which I want to count a number of tags with the name 'neighbor'. To be more specific, I want to count only the neighbor-tags, that are direct children of any of the country-tags.
Here are the contents of my xml file:
<?xml version="1.0"?>
<data>
<country name="Austria">
<rank>1</rank>
<year>2008</year>
<neighbor name="Liechtenstein"/>
<neighbor name="Switzerland"/>
<neighbor name="Italy"/>
</country>
<country name="Iceland">
<hasnoneighbors/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<neighbor name="Malaysia"/>
<someothertag>
<neighbor name="Germany"/>
</someothertag>
</country>
<neighbor name="Jupiter"/>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<neighbor name="Costa Rica"/>
<neighbor name="Colombia"/>
<country name="SubCountry">
<rank>12</rank>
<year>2023</year>
<neighbor name="NeighborOfSubCountry"/>
</country>
</country>
</data>
The expected result should be 7. Germany and Jupiter should be left out of the total of 9 tags.
I've written the following piece of code:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
totalneighbors = 0
neighborlist = []
for country in root.iter('country'):
print(f'Country {country.attrib["name"]} contains these neighbors:')
for index, neighbor in enumerate(country.findall('neighbor')):
neighborname = neighbor.attrib['name']
print(f'neighbor no {index+1}, with name {neighbor.attrib["name"]}')
neighborlist.append(neighbor.attrib['name'])
print(f"total for this country is {index+1}\n")
totalneighbors += index+1
print(f'total nr of neighbors in country-nodes is {totalneighbors} according to index-counting')
print(f"but the neighborlist says it's {len(neighborlist)}")
I wanted to count the tags with the enumerate-functionality from python, but it's giving me the wrong result (10 instead of 7). I put another way of counting in the code, by adding the 'findall' results to a list, and then using the length of that list. This does give me the correct number.
After adding some print statements in the code, I figured out where things go wrong; Iceland has no neighbors, but the print-statement shows that the index is still 3. It looks as if the index from the previous loop was never reset, and it just uses that 3 again, even though 'findall' should find nothing.
So my question is: What am I doing wrong? Why does 'enumerate' not give me 0 when 'findall' finds nothing? Am I using it wrong? Or is it just not possible when combined with an empty search result?
I hope someone can clarify what's going wrong here.
The problem lies in Iceland not having a neighbor, as you said. The first country has three neighbors, so the
indexwill have the value of 2 after running the firstforloop. But the loop won't execute for Iceland, because findall returns an empty list. so theindexvalue would still have the value of the previous country.You can set the
indexto-1before theforloop. That way your code works fine. Because nothing will be added to thetotalneighborsif the country has no neighbor.But overall, I recommend using the
lxmlpackage and XPath. here you can find the docs: https://lxml.de/parsing.htmlfor your purpose using XPath is the best option. You can find more information here: https://www.w3schools.com/xml/xpath_intro.asp
the code using
lxmlwould look like something like this:hope this helps.