Passing the name of a table into a function as an argument using pandasql

55 Views Asked by At

I'm working with a DataFrame called movies that contains information about movies and their genres (among other features). The genres are comma separated and the column can contain multiple keywords (i.e. "Horror,Thriller").

I'm trying to write a function that basically counts and lists the number of movies for each genre and returns a DataFrame with the genres and their counts.

Here's what I have:

from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())

def number_in_genre(genres_list):
    dataframes = []
    for genre in genres_list:
        q = """SELECT COUNT(*) 
               FROM movies 
               WHERE genres LIKE '%""" + genre + """%'"""
        df = pysqldf(q)
        df['genre'] = genre
        dataframes.append(df)
    return pd.concat(dataframes)

This code DOES return a DataFrame with each genre and the count.

However, I was hoping to use this function to analyze subsets of the data (i.e. movies.head(30) and movies.loc[movies['production_budget'] < 2000000]) and therefore want to pass in an argument "table" in addition to the genres_list. I tried this:

def number_in_genre(genres_list, table):
    dataframes = []
    for genre in genres_list:
        q = """SELECT COUNT(*) 
               FROM """ + table + """ 
               WHERE genres LIKE '%""" + genre + """%'"""
        df = pysqldf(q)
        df['genre'] = genre
        dataframes.append(df)
    return pd.concat(dataframes)

But then when I call the function using the movies table, I get this error message: UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U37'), dtype('<U37')) -> dtype('<U37')

The only thing I've changed between the two blocks is replacing FROM movies with FROM """ + table + """. I've also tried putting the + and """ in different places, and even keeping the entire string on one line, but can't seem to get it to work.

Not sure why that argument seems to be behaving differently than genres_list/genre, which seem to be working just fine.

Any insight would be appreciated!

0

There are 0 best solutions below