NoneType error when trying to access .text attribute of an existent <a> element

78 Views Asked by Kai Garcia At 26 October 2023 at 05:48

I am using BeautifulSoup to scrape the first wikitable on the page List of military engagements during the Russian invasion of Ukraine to get the names of all 57 battles. I have attached an image of the table's HTML for reference: HTML of the wikitable.

To grab all the <a> elements in the first column and get just the text (the battle names), I did the following:

import requests
from bs4 import BeautifulSoup

url = 'https://en.wikipedia.org/wiki/List_of_military_engagements_during_the_Russian_invasion_of_Ukraine'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
table = soup.find('table')
rows = table.find_all('tr')

battlenames = []
for row in rows:
    # Find the first <td> element within the row
    td_element = row.find('td')
    if td_element:
        # Find the first <a> element within the <td> element
        battlename = td_element.find('a')
        cleanname = battlename.text
        battlenames.append(cleanname)

for name in battlenames:
    print(name)

I ran this in both Spyder and Jupyter Notebook and got the following error:

AttributeError                            Traceback (most recent call last)
Cell In[6], line 18
     15     if td_element:
     16         # Find the first <a> element within the <td> element
     17         battlename = td_element.find('a')
---> 18         cleanname = battlename.text
     19         battlenames.append(cleanname)
     21 for name in battlenames:

AttributeError: 'NoneType' object has no attribute 'text'

This surprised me because the first <td> element of every row (<tr>) contains an <a> element with the battle name. I.e., there are no empty boxes in the table's first column that would cause a NoneType error. What could be the issue?

Original Q&A

There are 1 best solutions below

HedgeHog On 26 October 2023 at 06:00 BEST ANSWER

EDIT

Based on comment from @Ouroboros1 to be more precise, the issue is exactly, that there are elements of td that do not contain a a.

table contains one "sub" tr for "Battles of Voznesensk", where the first td fills "9 March 2022" in the "Start date" column. Now, this td just happens to have no link a

So you have also to check if there is an a before calling .text:

if td_element:
    # Find the first <a> element within the <td> element
    battlename = td_element.find('a')
    # check hier if also a is available
    if battlename:
        cleanname = battlename.text
        battlenames.append(cleanname)

You could also try to change your selection strategy, may use css selectors to select only tr with td that contains a:

soup.table.select('tr:has(td:first-of-type a)')

or even directly all a in first td of tr:

soup.table.select('tr td:first-of-type a')

Example css selectors

import requests
from bs4 import BeautifulSoup

url = 'https://en.wikipedia.org/wiki/List_of_military_engagements_during_the_Russian_invasion_of_Ukraine'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')

#Option A

for row in soup.table.select('tr:has(td:first-of-type a)'):
        print(row.td.a.text)

#Option B
for a in soup.table.select('tr td:first-of-type a'):
    print(a.text)

NoneType error when trying to access .text attribute of an existent <a> element

There are 1 best solutions below

EDIT

Example css selectors

Related Questions in PYTHON

Related Questions in HTML

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Related Questions in NONETYPE

Trending Questions

Popular # Hahtags

Popular Questions