Scraping tiingo HTML with Beautiful Soup

429 Views Asked by At

I am looking to scrape financial data for various companies in the S&P 500 from their respective webpages on tiingo.com

For example, take the following URL:

https://www.tiingo.com/f/b/aapl

which displays the most recent balance sheet data for Apple

I am looking to extract the "Property, Plant & Equipment" amount for the most recent quarter, which is 25.45B in this particular instance. However, I'm having trouble writing the correct Beautiful Soup code to extract this text.

Inspecting the element, I see that the 25.45B number is within a class "ng-binding ng-scope" within an element and within the class "col-xs-6 col-sm-3 col-md-3 col-lg-3 statement-field-data ng-scope," which itself is within the class "col-xs-7 col-sm-8 col-md-8 col-lg-9 no-padding-left no-padding-right."

However, I'm not sure how to accurately write the Beautiful Soup code to locate the correct element and then execute the element.getText() function.

I was thinking something like this:

import os, bs4, requests

res_bal = requests.get("https://www.tiingo.com/f/b/aapl")

res_bal.raise_for_status()

soup_bal = bs4.BeautifulSoup(res_bal.text, "html.parser")

elems_bal = soup_bal.select(".col-xs-6 col-sm-3 col-md-3 col-lg-3 statement-field-data ng-scope")

elems_bal_2 = elems_bal.select(".ng-binding ng-scope")

joe = elems_bal_2.getText()

print(joe)

but so far I have not had success with this code. Any help would be much appreciated!

1

There are 1 best solutions below

1
On BEST ANSWER

The problem with your selectors

elems_bal = soup_bal.select(".col-xs-6 col-sm-3 col-md-3 col-lg-3 statement-field-data ng-scope")

elems_bal_2 = elems_bal.select(".ng-binding ng-scope")

is that, there are multiple elements present of the page with same class and therefore your are not getting correct results.

Note if you use only beautifulsoup and request then the content in the page source does not have data which you want to scrape and this can be done with the help of selenium and beautifulsoup you can do it: if you do not have selenium install first do: pip install selenium

here is working code for the same:

from selenium import webdriver
import  bs4, time

driver = webdriver.Firefox()   
driver.get("https://www.tiingo.com/f/b/aapl")
driver.maximize_window()
# sleep is given so that JS populate data in this time
time.sleep(10)
pSource= driver.page_source

soup = bs4.BeautifulSoup(pSource, "html.parser")

Property=soup.findAll('div',{'class':'col-xs-5 col-sm-4 col-md-4 col-lg-3 statement-field-name indent-2'})
for P in Property:
    if 'Property' in P.text.strip():
        print P.text

x=soup.find("a",{"ng-click":"toggleFundData('Property, Plant & Equipment',SDCol.restatedString==='restated',true)"})
print x.text

The output for the same is:

Property, Plant & Equipment
25.45B