Struggling to extract JSON from a web page

44 Views Asked by At

I am trying to scrape the window.PRELOAEDED_STATE from the following url using requests.json, I cant isolate the element I want so that i can use the json function on it.

I tried the below code first.

response = requests.get(https://www.racingpost.com/profile/horse/431262/ready-for-action-ii)

I successfully got a response from the server and when viewing the text that the request produces I can see the data I would like in the HTML but I cant single it down to the window.PRELOADED_STATE element that I want. Once I have that element I want to use .json() on it in order to get the data into a dictionary

2

There are 2 best solutions below

0
Barmar On BEST ANSWER

Use a regular expression to extract everything on the line between window.PRELOADED_STATE = and the final ;.

import re, requests, json

response = requests.get('https://www.racingpost.com/profile/horse/431262/ready-for-action-ii')
state_match = re.search(r'window.PRELOADED_STATE\s*=\s(.*);', response.text)
if state_match:
    preloaded_state = json.loads(state_match.group(1))
0
Nathan Rodet On

Can you give more context ?

It seems you cannot scrape the window.PRELOADED_STATE value from the following url because this is dynamic content and the scaping will actually fetches the static content.

You might need to try differents tools such as Selenium which render and execute the JavaScript to be able to scrape this value.