I have a pandas dataframe and I need to fill NULL values with the last value on a partition. Specifically, for each "id" and "month", I need to create and explode last "value" on subsequent months.
Example of my dataset:
| id | month | value |
|---|---|---|
| 1 | 2023-01-01 | London |
| 1 | 2023-02-01 | Paris |
| 2 | 2023-01-01 | New York |
| 3 | 2023-02-01 | Paris |
| 4 | 2023-03-01 | NULL |
My desidered output (Exploding the values up to April 2023):
| id | month | value |
|---|---|---|
| 1 | 2023-01-01 | London |
| 1 | 2023-02-01 | Paris |
| 1 | 2023-03-01 | Paris |
| 1 | 2023-04-01 | Paris |
| 2 | 2023-01-01 | New York |
| 2 | 2023-02-01 | New York |
| 2 | 2023-03-01 | New York |
| 2 | 2023-04-01 | New York |
| 3 | 2023-02-01 | Paris |
| 3 | 2023-03-01 | Paris |
| 3 | 2023-04-01 | Paris |
| 4 | 2023-03-01 | NULL |
| 4 | 2023-04-01 | NULL |
Thank u!
One option using reshaping as a rectangular intermediate with
pivot/stack:Or with a MultiIndex and
groupby.ffill:Output:
keeping/filling original NaNs:
You can add a helper column to identify the original NaNs:
Output: