How can I scrape data from script tag of type - application/ld+json from a website?

198 Views Asked by At

I am trying to fetch product data from a website. I can see a script tag which has all the data i need for my project. However, when I am fetching the data. some the key value pairs are not coming.

Link - https://www.walmart.ca/en/ip/onn-55-4k-uhd-hdr-roku-smart-tv-100012586-ca-55-in/6000204596475

Data Needed - enter image description here

Below is my code so far -

URL = 'https://www.walmart.ca/en/ip/onn-55-4k-uhd-hdr-roku-smart-tv-100012586-ca-55-in/6000204596475'

page = session.get(URL, headers = headers, proxies=proxies )
page.html.arender(sleep=50, keep_page = True, scrolldown = 10)
soup1 = BeautifulSoup(page.content, "html.parser")
result= soup1.find("script", {"type": "application/ld+json"})
data = json.loads(result.get_text())
print(json.dumps(data, indent=4))

The output i get is -

{
    "@context": "http://schema.org/",
    "@type": "Product",
    "name": "onn. 55\" 4K UHD HDR Roku Smart TV (100012586-CA)",
    "image": [
        "https://i5.walmartimages.ca/images/Large/977/188/6000204977188.jpg",
        "https://i5.walmartimages.ca/images/Enlarge/977/188/6000204977188.jpg"
    ],
    "description": "Binge on movies and TV episodes, news, sports, music and more! We insisted on 4K Ultra High Definition for this 55 in. LED TV, bringing out more lifelike colour, texture and detail. We also partnered with Roku to bring you the best possible content with thousands of channels to choose from, conveniently presented through your own customizable home screen. Watch via cable, satellite, HDTV antenna or just start streaming from your favourite app. Like the sound of your own voice? You can actually use it with the Roku mobile app to search for the title, artist, actor or director, or just go old-school with our handy remote. We handle all software updates too, automatically, so all you have to worry about is what to watch. Lose yourself in the ultimate viewing experience. Watch onn.",
    "sku": "6000204596476",
    "brand": {
        "@type": "Thing",
        "name": "onn."
    },
    "aggregateRating": {
        "@type": "AggregateRating",
        "ratingValue": 4.3299,
        "reviewCount": 2467
    }
}

I am not getting the following values in the output - Offers, Sellers which are present in the tag as shown in screenshot attached above.

I have also tried using Extruct library as per https://hackersandslackers.com/scrape-metadata-json-ld/ , still getting same results. For my project I need to get the price value, current and initial.

Please help me understand why are these values being missed and how can I resolve this without using Selenium.

Thank you!

0

There are 0 best solutions below