Change the type of a column multi-index level?

96 Views Asked by germ At 13 January 2024 at 00:04

I am reading a .csv file that looks like this:

,1,1,2,2,3,3...
,'A','B','A','B',...
0,1,2,3,4,...
1,2,3,4,5,...

I read this with df = pd.read_csv(fname,header=[0,1],index_col=0)

Now, deplorably, pandas sets the dtype of the first level of the column multiindex to string, even though there are only integers in the first row of the file. Unfortunately, there is no way to tell pandas what dtype to use for each row (level) of the column index. (Something like header={0:'int64',1:'string'}).

Now, it seems that there should be a simple and easy way to convert one level of the multiindex columns to int, but I searched for a long time and could not come up with anything. Right now, I am re-generating the index from scratch, but that seems overkill.

Another solution that could possibly work would be to convert the multiindex to a DataFrame, change the dtypes, then set the index from the DataFramce. That also seems like an overcomplicated process.

Suggestions?

Original Q&A

There are 2 best solutions below

Nick On 13 January 2024 at 00:45

You could use set_levels:

csvstr = '''
,1,1,2,2,3,3
,'A','B','A','B','A','B'
0,1,2,3,4,5,6
1,2,3,4,5,6,7
'''

df = pd.read_csv(StringIO(csvstr),header=[0,1],index_col=0)

df.columns.dtypes
# level_0    object
# level_1    object
# dtype: object

df.columns
# MultiIndex([('1', ''A''),
#             ('1', ''B''),
#             ('2', ''A''),
#             ('2', ''B''),
#             ('3', ''A''),
#             ('3', ''B'')],
#            )

df.columns = df.columns.set_levels(df.columns.levels[0].astype('int64'), level=0)

df.columns.dtypes
# level_0     int64
# level_1    object
# dtype: object

df.columns
# MultiIndex([(1, ''A''),
#             (1, ''B''),
#             (2, ''A''),
#             (2, ''B''),
#             (3, ''A''),
#             (3, ''B'')],
#            )

Timeless On 13 January 2024 at 02:10

it seems that there should be a simple and easy way to convert one level of the multiindex..

Unfortunately, not yet (see pandas/issues/54523).

A workaround would be to convert the header to_frame and map it with literal_eval :

from ast import literal_eval

conv_mux = df.columns.to_frame().map(literal_eval).convert_dtypes()

df.columns = pd.MultiIndex.from_frame(conv_mux) # ---------^ is optional

Output :

>>> df.columns

MultiIndex([(1, 'A'),
            (1, 'B'),
            (2, 'A'),
            (2, 'B'),
            (3, 'A')],
           names=[0, 1])

>>> df.columns.dtypes.to_dict()

{0: Int64Dtype(), 1: string[python]}

Used input :

op = """,1,1,2,2,3
,'A','B','A','B','A'
0,1,2,3,4
1,2,3,4,5"""

from io import StringIO

df = pd.read_csv(StringIO(op) ,header=[0,1], index_col=0)

Change the type of a column multi-index level?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in CSV

Related Questions in MULTI-INDEX

Trending Questions

Popular # Hahtags

Popular Questions