Efficiently Iterating Through Specific Tags in Parsing XML Using xml.etree

52 Views Asked by Minura Punchihewa At 11 November 2021 at 05:56

I am in the process of parsing a very large XML file that is about 9 GB in size. I have tried the .iterparse method, which is, from what I have gathered, the recommended way to go about this task. However, this seems to take too long. Now, I am trying to implement a multiprocessing approach where I try to parse the elements of interest in separate processes.

I believe it was possible to do .iterparse('path_to_file.xml, events=("start", "end"), tag='some_tag) in the past, but it does not look like this is supported anymore.

So, the way I have come up with is this,

root = ET.parse('path_to_file.xml').getroot()

for element in root.iter('some_tag'):
   # do something

Is there a better way to go about this? From what I know, this is a memory intensive operation.

If there is no other way to do this, is there a way to clear the memory when using this approach? In the same way that we do element.clear() when using .iterparse?

Original Q&A

Efficiently Iterating Through Specific Tags in Parsing XML Using xml.etree

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in XML-PARSING

Related Questions in XML.ETREE

Related Questions in ITERPARSE

Trending Questions

Popular # Hahtags

Popular Questions