Understanding numeric_only boolean parameter in Pandas

35 Views Asked by At

I am new to Pandas and I am trying to better understand the use of the numeric_only parameter.

As you can see in the screenshot below, the goal is to pull the median number of video shares by the author's ban status. Why do I need to specify "numeric_only = True" within the median function. Why do the results pull multiple fields of the data frame when I remove the numeric_only parameter?

enter image description here

enter image description here

I would expect that by just using median(['video_share_count']) would be enough to specify that I am interested in pulling only the specific numeric field.

1

There are 1 best solutions below

0
mozway On

"Why do the results pull multiple fields of the data frame when I remove the numeric_only parameter?"

groupby.median only accepts one parameter: numeric_only.

By running:

df.groupby('author_ban_status').median(['video_share_count'])

You're actually still using the numeric_only parameter, it's equivalent to:

df.groupby('author_ban_status').median(numeric_only=['video_share_count'])

And since bool(['video_share_count']) evaluates to True, you're essentially running:

df.groupby('author_ban_status').median(numeric_only=True)

So not filtering any column.

You might just want:

df.groupby('author_ban_status')[['video_share_count']].median(numeric_only=True)