I am using BeautifulSoup to scrape the first wikitable on the page List of military engagements during the Russian invasion of Ukraine to get the names of all 57 battles. I have attached an image of the table's HTML for reference: HTML of the wikitable.
To grab all the <a> elements in the first column and get just the text (the battle names), I did the following:
import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/wiki/List_of_military_engagements_during_the_Russian_invasion_of_Ukraine'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
table = soup.find('table')
rows = table.find_all('tr')
battlenames = []
for row in rows:
# Find the first <td> element within the row
td_element = row.find('td')
if td_element:
# Find the first <a> element within the <td> element
battlename = td_element.find('a')
cleanname = battlename.text
battlenames.append(cleanname)
for name in battlenames:
print(name)
I ran this in both Spyder and Jupyter Notebook and got the following error:
AttributeError Traceback (most recent call last)
Cell In[6], line 18
15 if td_element:
16 # Find the first <a> element within the <td> element
17 battlename = td_element.find('a')
---> 18 cleanname = battlename.text
19 battlenames.append(cleanname)
21 for name in battlenames:
AttributeError: 'NoneType' object has no attribute 'text'
This surprised me because the first <td> element of every row (<tr>) contains an <a> element with the battle name. I.e., there are no empty boxes in the table's first column that would cause a NoneType error. What could be the issue?
EDIT
Based on comment from @Ouroboros1 to be more precise, the issue is exactly, that there are elements of
tdthat do not contain aa.So you have also to check if there is an
abefore calling.text:You could also try to change your selection strategy, may use
css selectorsto select onlytrwithtdthat containsa:or even directly all
ain firsttdoftr:Example css selectors