Is it possible in a pandas dataframe to have some multiindexed columns and some singleindexed columns?

84 Views Asked by At

In pandas I would like to have a dataframe whose some columns have a multi index, some don't.

Visually I would like something like this:

  |   c    |    |
  |--------|  d |
  | a  | b |    |
================|
  | 1  | 4 |  0 |
  | 2  | 5 |  1 |
  | 3  | 6 |  2 |

In pandas I can do something like this:

df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6], 'd':[0,1,2]})
columns=[('c','a'),('c','b'), 'd']
df.columns=pd.MultiIndex.from_tuples(columns)

and the output would be:

   c      d
   a  b NaN
0  1  4   0
1  2  5   1
2  3  6   2

However, when accessing the d column by df['d'], I get as output a Pandas Dataframe, not Pandas series. The problem is clearly that pandas applied multicolumn indexing to every column. So my question is: is there a way to apply column multindexing only to certain columns and leave the others as they are?

In other words, I would like that the result of df['d'] would be a Series as in a normal dataframe, the result of df['c'] a pd.DataFrame as in column multindex and the result of df['c']['a'] a Pandas Series. Is this possible?

2

There are 2 best solutions below

8
Timeless On BEST ANSWER

You can use the empty string "" as a placeholder :

columns = [
    ("c", "a"),
    ("c", "b"),
    ("d", ""),  # << here 
]

Output :

type(df["d"])      # pandas.core.series.Series
type(df["c"])      # pandas.core.frame.DataFrame
type(df["c"]["a"]) # pandas.core.series.Series
7
Laurent B. On

You should define a singleton (d,) like this :

import pandas as pd
df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6], 'd':[0,1,2]})
columns=[('c','a'),('c','b'), ('d',)]
df.columns=pd.MultiIndex.from_tuples(columns)

print(df)
   c      d
   a  b NaN
0  1  4   0
1  2  5   1
2  3  6   2

df['d'] is seen as a dataframe and not a series because of the header. Typically, series have indexes + values.

Like dataframe is a 2-level multiindexes columns you have the sub-level with NaN by calling only df['d'].

If you want to call the series, you must call df[('d', np.nan)]in order that you don't have header anymore.

>>> type(df['d'])
<class 'pandas.core.frame.DataFrame'>

>>> type(df[('d', np.nan)])
<class 'pandas.core.series.Series'>

Pandas preserves the MultiIndex structure even when you're accessing a single column.

So, you cannot have a mixed DataFrame where indexing a single column gives you both a Series and a DataFrame based on whether the column is part of a MultiIndex or not.

Improvement of my answer to Roshach comment

I prefer improve the answer here on header deletion.

About your output dataframe :

>>> type(df['d'])
<class 'pandas.core.frame.DataFrame'>

Now we will try to delete the header just to see the effect (cf. Timeless answer). I just give the explanation here.

>>> df.columns
MultiIndex([('c', 'a'),
            ('c', 'b'),
            ('d', nan)],
           )

>>> new_columns_list = list(df.columns)
>>> new_columns_list[2]
('d', nan)
>>> new_columns_list[2] = ('d', '')
>>> new_columns_list
[('c', 'a'), ('c', 'b'), ('d', '')]

Now we convert this new_columns_listto multiindex-column :

df.columns = pd.MultiIndex.from_tuples(new_columns_list)

>>> df
   c     d
   a  b   
0  1  4  0
1  2  5  1
2  3  6  2
>>> type(df['d'])
<class 'pandas.core.series.Series'>

So the deletion of header => df['d'] is now converted in series type