I've been fooling around with pandas and dataframes for a while now, and decided i'd start a project using it. For the project I need to scrape bettings odds from a sportsbookie.
I've succesfully retrieved an api and made a dataframe using the data. But the column 'betOffers' contains over a thousand characters, of which I only want to print out the odds of the game. My code currently looks like this:
import pandas
import requests
api = 'https://eu-offering-api.kambicdn.com/offering/v2018/betcitynl/listView/table_tennis/czech_republic/czech_liga_pro/all/matches.json?lang=nl_NL&market=NL&client_id=2&channel_id=1&ncid=1711057956050&category=20141&useCombined=true&useCombinedLive=true'
response = requests.get(api)
responseData = response.json()
df = pandas.json_normalize(responseData, 'events')
odds = df['betOffers'].str.split('odds')
file_format = 'csv'
file_name = 'betcity' + file_format
odds.to_csv(file_name)
By using the split method I hoped i would get the odds, but the output is now just '', when using the strip method (trying to get rid of everything but the odds) it also outputs just ''
pandas.Series.str.splitdoes not do what you think it does.One approach you can use is to analyze the structure of the JSON and then use the parameters available for
pandas.json_normalizeto extract the data you want. You're reading a JSON with nested data, which contains strings, numbers, objects and arrays, so you need to take that into account.It looks something like this (some closing brackets and most of the data omitted for brevity):
So if you want the values for
odds, you can add to yourjson_normalizecall an argumentrecord_path=["events", "betOffers", "outcomes"]. Pandas will then read the specified records as rows into your dataframe, with the attribute names as columns.This will then result in a dataframe like this: