Web Scrape Return Empty HTML Tag

52 Views Asked by At

I can't seem to web scrape the artworks from this website? The data I get back returns the HTML tag but it's empty.

I have not used web scraping tools that much and I am unsure what my problem is.

from bs4 import BeautifulSoup
import requests

url = "https://centerforbookarts.org/book-shop"

response = requests.get(url)

soup = BeautifulSoup(response.text, "lxml")
# soup = BeautifulSoup(response.text, "html.parser")
element = soup.find_all("section", {"class": "posts"})
print(element)

I also tried html.parser and Selenium but I can't seem to get the data that I need. It always returns an empty tag but clearly this tag isn't empty because it holds all the information that I am looking for.

1

There are 1 best solutions below

2
Yevhen Kuzmovych On

The information you are looking for is not initially present in the section tag. It is getting populated from the <script> var posts = ... </script> (you can find it if you search "posts" in the HTML of the page).

What you can do is find that script and extract the info from it directly as it is neatly stored in JSON:

from bs4 import BeautifulSoup
import requests
import re
import json
from pprint import pprint

url = "https://centerforbookarts.org/book-shop"

response = requests.get(url)

soup = BeautifulSoup(response.text, "lxml")


script = str(soup.find('script', string=re.compile('.*posts.*')))

posts = json.loads(re.findall('(\[.*\]);', script)[0])

pprint(posts[0])