I have a set of 5 bins as well as the counts of countries inside them arranged by Continents. I want the 'Rank' column to list all 5 bins for each continent, even if the counts are 0.
Continent Rank
Asia (2.212, 15.753] 7
(2.212, 15.753] 7
(2.212, 15.753] 7
(2.212, 15.753] 7
(15.753, 29.227] 4
Australia (2.212, 15.753] 7
Europe (2.212, 15.753] 7
(15.753, 29.227] 4
(15.753, 29.227] 4
(15.753, 29.227] 4
(29.227, 42.701] 2
(29.227, 42.701] 2
North America (2.212, 15.753] 7
(56.174, 69.648] 2
South America (56.174, 69.648] 2
Name: Count, dtype: int64
Below are the 2 frames that I merged together to get the dataframe.
Rank
(2.212, 15.753] 7
(15.753, 29.227] 4
(29.227, 42.701] 2
(56.174, 69.648] 2
(42.701, 56.174] 0
Continent
Asia (15.753, 29.227]
North America (2.212, 15.753]
Asia (2.212, 15.753]
Europe (2.212, 15.753]
Europe (15.753, 29.227]
North America (56.174, 69.648]
Europe (15.753, 29.227]
Asia (2.212, 15.753]
Europe (15.753, 29.227]
Asia (2.212, 15.753]
Europe (29.227, 42.701]
Europe (29.227, 42.701]
Asia (2.212, 15.753]
Australia (2.212, 15.753]
South America (56.174, 69.648]
This is the code I used to merge the two:
merge1 = renew.reset_index()
merge1 = merge1.merge(counts,how='right', on='Rank')
merge1 = merge1.dropna()
merge1 = merge1.drop('index', axis=1)
merge1
I'm going to answer the questions I believe are being asked:
A. how can I get the count of each (Continent, Rank) pair that appears in my data?
...and...
B. how can I include explicit zero counts in that output?
Let's start with your DataFrame:
We'll only need this as an input to get the job done so let's set aside the other inputs you discuss.
If we do a pandas groupby and count on these two columns (i.e.
df.groupby(['Continent', 'Rank']).count()) we will get an empty answer. This is because, unlike in SQL, pandas is expecting there to be an additional column for it to count. We'll add a dummy column to satisfy this requirement in our case, but to be clear, that behavior is intended in case we wanted to know how many rows by (Continent, Rank) pair have values present in a specific column for instance.And now we can do a groupby.
To include explicit zeros in our output, we'll need to perform additional operations after our groupby. Specifically, we can do:
which is the final answer to your question.