Conditional non-equi join in data.table

106 Views Asked by TBP At 07 December 2022 at 16:46

I have a dataset with some missing data (in reality <1%). We are going to use data from other sites to fill in the missing data. We have a data.table named dt1, which contains the data. We have another data.table named dt2 and this is an index of the sites that can be used to replace (rsite) the missing data from a site. In dt1 I would like to create a column in dt1 named UsedSite that records which site the data we end up using are from.

Create some data to use

dt1<-data.table(site=c("s1","s1","s2","s2"), x=c(NA, 1, NA,1))
dt2<-data.table(site=c("s1","s1","s2","s2"), rsite=c("s3","s3","s4","s4"))

Since only a small fraction of data are missing, start by setting the values of UsedSite to site

dt1[,UsedSite:=site]

The following is where I can't figure out how to do this. I want to just replace the UsedSite values with rsite where x is NA. Not at all surprisingly, the line below replaces everything - I don't want that. I don't understand how to instruct it to consider only lines where a condition (is.na(x)) is being met.

dt1[dt2,UsedSite:=rsite, on=.(site)]

My natural impulse is to try dt1[dt2&is.na(x),UsedSite:=rsite, on=.(site)], but that doesn't work. Thoughts? I would like to stick with the data.table paradigm in this.

Original Q&A

There are 1 best solutions below

akrun On 07 December 2022 at 16:48 BEST ANSWER

We may use fcoalesce here to replace the NA values with the corresponding values of 'rsite' If it is a non-NA, it won't get replaced as fcoalesce will return the first non-NA for each row

library(data.table)
dt1[dt2, UsedSite := fcoalesce(as.character(x), rsite), on = .(site)]

-output

> dt1
   site  x UsedSite
1:   s1 NA       s3
2:   s1  1        1
3:   s2 NA       s4
4:   s2  1        1

Update

dt1[dt2, UsedSite := fifelse(is.na(x), rsite, site), on = .(site)]

-output

> dt1
   site  x UsedSite
1:   s1 NA       s3
2:   s1  1       s1
3:   s2 NA       s4
4:   s2  1       s2

Conditional non-equi join in data.table

There are 1 best solutions below

Update

Related Questions in R

Related Questions in DATA.TABLE

Related Questions in INNER-JOIN

Related Questions in NON-EQUI-JOIN

Trending Questions

Popular # Hahtags

Popular Questions