Struggling to extract JSON from a web page

44 Views Asked by Sam Kirwan At 17 January 2024 at 17:03

I am trying to scrape the window.PRELOAEDED_STATE from the following url using requests.json, I cant isolate the element I want so that i can use the json function on it.

I tried the below code first.

response = requests.get(https://www.racingpost.com/profile/horse/431262/ready-for-action-ii)

I successfully got a response from the server and when viewing the text that the request produces I can see the data I would like in the HTML but I cant single it down to the window.PRELOADED_STATE element that I want. Once I have that element I want to use .json() on it in order to get the data into a dictionary

Original Q&A

There are 2 best solutions below

Barmar On 17 January 2024 at 17:39 BEST ANSWER

Use a regular expression to extract everything on the line between window.PRELOADED_STATE = and the final ;.

import re, requests, json

response = requests.get('https://www.racingpost.com/profile/horse/431262/ready-for-action-ii')
state_match = re.search(r'window.PRELOADED_STATE\s*=\s(.*);', response.text)
if state_match:
    preloaded_state = json.loads(state_match.group(1))

Nathan Rodet On 17 January 2024 at 17:41

Can you give more context ?

It seems you cannot scrape the window.PRELOADED_STATE value from the following url because this is dynamic content and the scaping will actually fetches the static content.

You might need to try differents tools such as Selenium which render and execute the JavaScript to be able to scrape this value.

Struggling to extract JSON from a web page

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-REQUESTS-HTML

Trending Questions

Popular # Hahtags

Popular Questions