How do I fill blanks with a median filtered on a dimension?

46 Views Asked by At

I am currently trying to solve some Datacamp exercises and wanted to try something extra for practice:

I am trying to create a calculated column (let's name it "airbnb[updated_price]") based on the original column "airbnb[price]".

The airbnb[price] column holds numerical values which represent the cost to rent an airbnb for one night per listing_id (just a simple distinct ID number stored as a whole number).

However, for some rows, the airbnb[price] is empty. After further investigation, this only happens with values that have airbnb[city] as "Sydney" (to be clear: not all Syndey rows are empty, just some of them. For other cities, there are no empty rows).

Datacamp asks me to solve this issue by creating a calculated column with:

updated_price = 
   IF(ISBLANK(airbnb[price]),
   110,
   airbnb[price]
)

Note: The 110 is given by the exercise and represents the median of airbnb[price] for listings located in Sydney.

This works very well but isn't very challenging. What I am trying to do is something along the lines of:

updated_price = 
   IF(ISBLANK(airbnb[price]),
   [Sydney_median],
   airbnb[price]
)

Here I tried using a measure that calculate the median of airbnb[price] for listings located in Sydney. The measure does:

Sydney_median = CALCULATE(
   MEDIAN(airbnb[price]),
   airbnb[city] = "Sydney"
)

Which works and returns the value of 110 when I put it in a table or card. However, when I (as seen above in the second "updated_price") use the measure inside the calculated column, it gives the error: expressions that yield variant data-type cannot be used to define calculated columns

I understand for a part that calculated columns are created data which creates a problem here as the measure is something of a more flexible nature (please correct me if I am wrong).

My question is:

Is is possible to do this another way? As in: is it possible to create a column that fills all blank values (which only exist for the city Sydney) with a median of the price for said city based on the non blank values?

Help is appreciated!

For insight, the data looks like this:

Screenshot of datatable

0

There are 0 best solutions below