Pytrends - how to get historical interest by region by day

372 Views Asked by At

Each week we have an analyst use google trends to download the past 5 year history of a search topic for every DMA we have. It is a little painstaking and requires an hour or two each week to do. I tried to create a python script that achieves this, but I have pretty limited python experience and am struggling. I feel that I am getting very close but the output is still incorrect.

Google trends provides a "relative" score, which means if you pull 5 years all at once, then on each of your date keys in the table you download it will give you an index that is based on its value rank compared to every other day in that 5 years. When I do this with my script it almost feels like it is indexing something on each day or something else.

I created this script which goes back 3 years to limit the data pull and this is where I am stuck at now. I want to ultimately pull past 5 year history by region/dma. I would hope to have an output with the DMA, date, and score/index that google provides for all days or weeks going back 5 years on each DMA.

import pandas as pd
import pytrends
from pytrends.request import TrendReq
from datetime import date, timedelta

# set up the trend request object
pytrends = TrendReq(hl='en-US', tz=360)

# define the keyword and other parameters
keyword = '/m/07xn3v'
geo = 'US'

# define the date range
start_date = date(date.today().year - 3, 1, 1)
end_date = date.today() - timedelta(days=1)

# loop through the date range and retrieve the interest by region data for each day
dfs = []
for single_date in pd.date_range(start_date, end_date):
    print(f"Retrieving data for {single_date}")
    start_date_str = single_date.strftime("%Y-%m-%d")
    end_date_str = (single_date + timedelta(days=365)).strftime("%Y-%m-%d")
    timeframe_str = f"{start_date_str} {end_date_str}"
    
    # build the payload
    pytrends.build_payload(kw_list=keyword, geo=geo, timeframe=timeframe_str)

    # get the interest by region data
    interest_by_region_df = pytrends.interest_by_region(resolution='DMA', inc_low_vol=True, inc_geo_code=True)
    interest_by_region_df.reset_index(inplace=True)
    interest_by_region_df.rename(columns={'geoCode': 'geoCodeDMA'}, inplace=True)

    # add the date as a new column
    interest_by_region_df['date'] = single_date

    dfs.append(interest_by_region_df)

# concatenate the data frames and export to csv
interest_by_region_df = pd.concat(dfs, ignore_index=True)
interest_by_region_df.to_csv('~/Desktop/region_trend.csv', index=False)
0

There are 0 best solutions below