How do I scrape from the web using python's beautiful soup

98 Views Asked by At

For my class we were tasked with scaping raw data and processing it. I want to scrape data from UFC fights from http://statleaders.ufc.com/en/career.

For example in class we used a different website that had weather data information and we used the line

table = bs.find_all("table")

however, for the ufc site that doesn't work so I looked at the source page and it seems like the class is called "results-table" so tried doing

raw_data = []

Find all the tables in the webpage page that we have just parsed

table= bs.find_all("div", {"class": "results-table"})

for row in table: line = row.text raw_data.append(line)

print(raw_data)

but my raw data is empty. How do I scrape this data correctly?

2

There are 2 best solutions below

1
chocolateimage On

The example you provided works, but I think you didn't get the data correctly, here's an example of how I did it:

import requests
from bs4 import BeautifulSoup

r = requests.get("http://statleaders.ufc.com/en/career")

bs = BeautifulSoup(r.text,"html.parser")

raw_data = []

table = bs.find_all("div", {"class": "results-table--tr"})

for row in table:
    line = row.find_all("span")
    rank = line[0].text
    if rank == "Rnk":
        continue
    name = line[1].find_all("a")[0].text
    total = line[2].text
    print(rank,name,total)
0
HedgeHog On

You could iterate all rows and use stripped_strings to get the texts out and create a list of dicts, you could also add the type with .find_previous('h3').

Example

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get("http://statleaders.ufc.com/en/career").text,"html.parser")

data = []

for e in soup.select('article .results-table div ~ div'):
    d = dict(zip(['rank','name','total'],list(e.stripped_strings)))
    d['type'] = e.find_previous('h3').text
    data.append(d)

data

Output

[{'rank': '1', 'name': 'Jim Miller', 'total': '41', 'type': 'Total Fights'},
 {'rank': '2',
  'name': 'Andrei Arlovski',
  'total': '39',
  'type': 'Total Fights'},
 {'rank': '3',
  'name': 'Donald Cerrone',
  'total': '38',
  'type': 'Total Fights'},
 {'rank': '4', 'name': 'Clay Guida', 'total': '34', 'type': 'Total Fights'},
 {'rank': '4',
  'name': 'Jeremy Stephens',
  'total': '34',
  'type': 'Total Fights'},
 {'rank': '6', 'name': 'Demian Maia', 'total': '33', 'type': 'Total Fights'},...]