Web scraping for multiple classes using python

111 Views Asked by Sushmitha Krishnan At 03 February 2023 at 09:35

I am trying to scrape address from 10K filing document in HTML: https://www.sec.gov/Archives/edgar/data/1652044/000165204419000032/goog10-qq32019.htm

It has multiple div class, and I want to scrape for address inside span.

Expected output:

1600 Amphitheatre parkway

I have tried few things like below:

from requests_html import HTMLSession

s = HTMLSession()
r = s.get('https://www.sec.gov/Archives/edgar/data/1652044/000165204419000032/goog10-qq32019.htm')
r

add1 = r.html.find_all('div')
add1

However, if you inspect the page it has many layers I am new to HTML and python. Please help

Original Q&A

There are 1 best solutions below

Vincent Lagache On 03 February 2023 at 09:52 BEST ANSWER

You could do it like this, but I'm not sure it's very robust, or applicable to many examples given how the ids look...

from requests_html import HTMLSession
from bs4 import BeautifulSoup

session = HTMLSession()
page = session.get('https://www.sec.gov/Archives/edgar/data/1652044/000165204419000032/goog10-qq32019.htm')
soup = BeautifulSoup(page.content, 'html.parser')

content = soup.find(id="d92517213e644-wk-Fact-0B11263160365DBABCF89969352EE602")
print(content.text)

output

1600 Ampitheatre Parkway

Edit : I didn't see @baduker answer and I didn't know there was an API, he is right, use the API

Web scraping for multiple classes using python

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in WEB

Related Questions in BEAUTIFULSOUP

Related Questions in EDGAR

Trending Questions

Popular # Hahtags

Popular Questions