I am trying to scrape address from 10K filing document in HTML: https://www.sec.gov/Archives/edgar/data/1652044/000165204419000032/goog10-qq32019.htm
It has multiple div class, and I want to scrape for address inside span.
Expected output:
1600 Amphitheatre parkway
I have tried few things like below:
from requests_html import HTMLSession
s = HTMLSession()
r = s.get('https://www.sec.gov/Archives/edgar/data/1652044/000165204419000032/goog10-qq32019.htm')
r
add1 = r.html.find_all('div')
add1
However, if you inspect the page it has many layers I am new to HTML and python. Please help
You could do it like this, but I'm not sure it's very robust, or applicable to many examples given how the ids look...
output
Edit : I didn't see @baduker answer and I didn't know there was an API, he is right, use the API