Replace missing values with mean using Ibis

152 Views Asked by At

How can I use Ibis to fill missing values with the mean?

For example, if I have this data:

import pandas as pd
import ibis
from ibis import _

ibis.options.interactive = True

df = pd.DataFrame(data={'fruit': ['apple', 'apple', 'apple', 'orange', 'orange', 'orange'],
                        'variety': ['gala', 'honeycrisp', 'fuji', 'navel', 'valencia', 'cara cara'],
                        'weight': [134 , 158, pd.NA, 142, 96, pd.NA]})

t = ibis.memtable(df)

Using Ibis code:

  • How would I replace the NA values in the weight column with the overall mean of weight?
  • How would I replace the NA values in the weight column with the the mean within each group (apples, oranges)?
1

There are 1 best solutions below

0
Phillip Cloud On BEST ANSWER

In the first case (replacing NULL with overall mean) you can simply pass the mean of the replacement column to fillna and ibis will figure out what you mean:

In [27]: t.mutate(weight=_.weight.fillna(_.weight.mean()))
Out[27]:
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃ fruit  ┃ variety    ┃ weight  ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ string │ string     │ float64 │
├────────┼────────────┼─────────┤
│ apple  │ gala       │   134.0 │
│ apple  │ honeycrisp │   158.0 │
│ apple  │ fuji       │   132.5 │
│ orange │ navel      │   142.0 │
│ orange │ valencia   │    96.0 │
│ orange │ cara cara  │   132.5 │
└────────┴────────────┴─────────┘

In the second case of replacing the nulls per group, you can use a window function:

In [28]: t.mutate(weight=_.weight.fillna(_.weight.mean().over(group_by="fruit")))
Out[28]:
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃ fruit  ┃ variety    ┃ weight  ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ string │ string     │ float64 │
├────────┼────────────┼─────────┤
│ orange │ navel      │   142.0 │
│ orange │ valencia   │    96.0 │
│ orange │ cara cara  │   119.0 │
│ apple  │ gala       │   134.0 │
│ apple  │ honeycrisp │   158.0 │
│ apple  │ fuji       │   146.0 │
└────────┴────────────┴─────────┘