Replacing NaN in dataframe with datetime index by dict doesn't work

44 Views Asked by At

I have a dict with hourly measurement data where some entries are missing (gaps). My current approach is to create a dataframe with an hourly datetime index and prefilled with NaN. Then replace the values in the dataframe by the gasDict (see below). The dataframe is later interpolated to get rid of the NaNs.

import pandas as pd
import numpy as np

dataRange = pd.date_range(pd.to_datetime('2023-01-01 01:00:00'), pd.to_datetime('2023-01-01 05:00:00'), freq='H')
df = pd.DataFrame(np.nan, index=dataRange, columns=['gas'])
df['gas'] = pd.to_numeric(df['gas'], errors='coerce')

gasDict = {'2023-01-01 01:00:00' : 40,
           '2023-01-01 03:00:00' : 20  
          }

# these 3 methods do not work here
# methods from stackoverflow remap-values-in-pandas-column-with-a-dict-preserve-nans
df1 = df['gas'].map(gasDict).fillna(df['gas']) 
print(df1)

df2 = df['gas'].map(gasDict)
print(df2)

df3 = df.replace({'gas': gasDict})
print(df3)

# this code is correct but slow:
for key, value in gasDict.items():
    df.at[pd.to_datetime(key)] = value    

print(df) 

result (only the last one is correct!):

2023-01-01 01:00:00   NaN
2023-01-01 02:00:00   NaN
2023-01-01 03:00:00   NaN
2023-01-01 04:00:00   NaN
2023-01-01 05:00:00   NaN
Freq: H, Name: gas, dtype: float64
2023-01-01 01:00:00   NaN
2023-01-01 02:00:00   NaN
2023-01-01 03:00:00   NaN
2023-01-01 04:00:00   NaN
2023-01-01 05:00:00   NaN
Freq: H, Name: gas, dtype: float64
                     gas
2023-01-01 01:00:00  NaN
2023-01-01 02:00:00  NaN
2023-01-01 03:00:00  NaN
2023-01-01 04:00:00  NaN
2023-01-01 05:00:00  NaN
                      gas
2023-01-01 01:00:00  40.0
2023-01-01 02:00:00   NaN
2023-01-01 03:00:00  20.0
2023-01-01 04:00:00   NaN
2023-01-01 05:00:00   NaN

But that last method code is awfully slow (gasDict has about 10000 entries). What is the correct appoach?

2

There are 2 best solutions below

0
Panda Kim On BEST ANSWER

Answer

Since the index contains datetime, you need to map to the index. You did not map to the index in methods 1-3. Also, your index is of datetime type, but the key of gasDict is of string type, so you need to unify the data type for mapping.

Code

df['gas'] = df.index.astype('str').map(gasDict)

df:

                    gas
2023-01-01 01:00:00 40.0
2023-01-01 02:00:00 NaN
2023-01-01 03:00:00 20.0
2023-01-01 04:00:00 NaN
2023-01-01 05:00:00 NaN

If your df's gas column contains non-NaN values unlike your example, so you want to use fillna, use the following code:

df['gas'] = df['gas'].fillna(df.index.astype('str').to_frame()[0].map(gasDict))

you can get same result.

Anyway, important thing is that you should try to map your index and make the result to gas column of df.

0
pc18 On

I think it is better to start with the dataframe first and then extend the index. To create a dataframe from your dict you can use DataFrame.from_dict:

df = pd.DataFrame.from_dict(gasDict, orient='index', columns=['gas'])

Then convert the index to be of type datatime.

df.index = df.index.astype("datetime64['ns']")

After that use the reindex method to extend your index:

df = df.reindex(dataRange)

By the way, the first approaches don't work because the keys in your dict are strings while the index has type datetime.