Getting the values of an attribute in XML using Beautiful Soup (Python)

56 Views Asked by At

I am using Beautiful Soup to traverse some TEI XML that I've written for Peanuts comic strips. I'm trying to isolate certain features that are recorded in the div using the @ana attribute.

 <text>
      <body>
         <head><emph>Peanuts</emph>, <date when="1971-10-01">1 October 1971</date></head>
         
         <div type="panelGrp" xml:id="Peanuts1971-10-01" ana="#s-psych #s-outside">
            ...
         </div>
      </body>
   </text>

I can isolate this particular div (the only one in each document) using the following.

def make_soup(xmlfile):
    with open(xmlfile) as xml_file:
        soup = BeautifulSoup(xml_file, 'lxml-xml')
        return soup

div = soup.find('div')

Where I am stuck, however, is accessing the contents of @ana. In this case, the output should be #s-psych #s-outside.

1

There are 1 best solutions below

0
Hermann12 On

I don't have your function, but I think you can pick the answer from my mockup:

from bs4 import BeautifulSoup

with open("Peanats.html", 'r') as htm_file:
    soup = BeautifulSoup(htm_file, 'html.parser')
    #print(soup.prettify())
    print(soup.div['ana']) # What you search for

Output:

#s-psych #s-outside