How to display the variable name in a Python DataFrame instead of the column name?

Question

How to display the variable name in a Python DataFrame instead of the column name?

124 Views Asked by Guillaume_96 At 23 January 2024 at 20:35

I'm currently studying the basics of data analysis with Python in Colab, and for that I'm using my IMDb watchlist as a dataset.

In the column Genres, several movie genres can be registered in the same cell (which makes things more difficult), and I'm trying to calculate the proportions of the genres presented in this dataset and then plot it with a pie or barh chart maybe.

dataset

So I created variables to store the value_counts() of each genre as True or False, as you can see below:

action = df['Genres'].str.contains('Action').value_counts()
animation = df['Genres'].str.contains('Animation').value_counts()
biography = df['Genres'].str.contains('Biography').value_counts()
comedy = df['Genres'].str.contains('Comedy').value_counts()
crime = df['Genres'].str.contains('Crime').value_counts()
drama = df['Genres'].str.contains('Drama').value_counts()
documentary = df['Genres'].str.contains('Documentary').value_counts()
family = df['Genres'].str.contains('Family').value_counts()
fantasy = df['Genres'].str.contains('Fantasy').value_counts()
film_noir = df['Genres'].str.contains('Film-Noir').value_counts()
history = df['Genres'].str.contains('History').value_counts()
horror = df['Genres'].str.contains('Horror').value_counts()
mystery = df['Genres'].str.contains('Mystery').value_counts()
music = df['Genres'].str.contains('Music').value_counts()
musical = df['Genres'].str.contains('Musical').value_counts()
romance = df['Genres'].str.contains('Romance').value_counts()
scifi = df['Genres'].str.contains('Sci-Fi').value_counts()
sport = df['Genres'].str.contains('Sport').value_counts()
thriller = df['Genres'].str.contains('Thriller').value_counts()
war = df['Genres'].str.contains('War').value_counts()
western = df['Genres'].str.contains('Western').value_counts()

Then I put these variables into a DataFrame:

genres = pd.DataFrame(
    [action, animation, biography,
     comedy, crime, drama,
     documentary, family, fantasy,
     film_noir, history, horror,
     mystery, music, musical,
     romance, scifi, sport,
     thriller, war, western],
    )
genres.head(5)

The problem is in the output:

output

I'd like it to display the variable names instead of 'Genres', as it's being show in the first column. Is it possible?

Original Q&A

There are 2 best solutions below

Marilyn Smith On 23 January 2024 at 21:01

I think you can achieve this by creating a DataFrame using a dictionary where keys are the genre names, and values are the corresponding Series containing the counts. Here's an example:

import pandas as pd

# Sample DataFrame
data = {'Genres': ['Action, Drama', 'Comedy, Romance', 'Action, Comedy', 'Drama', 'Comedy']}
df = pd.DataFrame(data)

# List of genres
genre_list = ['Action', 'Animation', 'Biography', 'Comedy', 'Crime', 'Drama', 'Documentary', 'Family',
              'Fantasy', 'Film-Noir', 'History', 'Horror', 'Mystery', 'Music', 'Musical', 'Romance',
              'Sci-Fi', 'Sport', 'Thriller', 'War', 'Western']

# Create a dictionary to store genre counts
genre_counts = {}

# Populate the dictionary with counts
for genre in genre_list:
    genre_counts[genre] = df['Genres'].str.contains(genre).sum()

# Create a DataFrame from the dictionary
genres_df = pd.DataFrame(list(genre_counts.items()), columns=['Genre', 'Count'])

# Display the DataFrame
print(genres_df)

This code creates a dictionary (genre_counts) where keys are genre names, and values are the counts of each genre in the 'Genres' column. Then, it converts the dictionary into a DataFrame (genres_df) and displays it. This way, the DataFrame will have 'Genre' and 'Count' columns instead of 'Genres'.

**Laurent B.** · Accepted Answer · 2024-01-23T22:33:40.343000

To avoid using a relatively slow for loop :

Let's suppose with have the following dataframe

                       Genres
0              Comedy, Horror
1          Comedy, Drama, War
2  Mistery, Romance, Thriller

Proposed code

import pandas as pd

# create the original DataFrame
df = pd.DataFrame({'Genres': ['Comedy, Horror', 'Comedy, Drama, War', 'Mistery, Romance, Thriller']})

# split the genres by comma and remove leading spaces
df['Genres'] = df['Genres'].str.split(',').apply(lambda x: [i.strip() for i in x])

# explode the list into separate rows
df = df.explode('Genres')

# Counting Matrix using crosstab method
genre_counts = pd.crosstab(index=df.index, columns=df['Genres'], margins=False).to_dict('index')

genre_counts = pd.DataFrame(genre_counts)

# count the number of 0s and 1s in each row
counts = ( genre_counts.apply(lambda row: [sum(row == 0), sum(row == 1)], axis=1) )

# Final count with 2 columns 'False' and 'True'
counts = pd.DataFrame(counts.tolist(), index=counts.index, columns=['False', 'True'])

print(counts)

Vizualisation

          False  True
Comedy        1     2
Drama         2     1
Horror        2     1
Mistery       2     1
Romance       2     1
Thriller      2     1
War           2     1

How to display the variable name in a Python DataFrame instead of the column name?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in DATASET

Related Questions in IMDB

Trending Questions

Popular # Hahtags

Popular Questions