I have two DTs
X = data.table(names = c("a", "a", "b", "b", "c", "c"), years = c("2001", "2002", "2001", "2002", "2001", "2002"), val.1 = 1:6, key = c("names", "years"))
X
| names | years | val.1 |
| -------- | -------- |
| a | 2001 | 1 |
| a | 2002 | 2 |
| b | 2001 | 3 |
| b | 2002 | 4 |
| c | 2001 | 5 |
| c | 2002 | 6 |
and
X.update = data.table(names = c("a", "b", "b", "c", "d", "d", "d"), years = c("2003", "2002", "2003", "2003", "2001", "2002", "2003"), val.1 = 11:17, key = c("names", "years"))
X.update
| names | years | val.1 |
| -------- | -------- |
| a | 2003 | 11 |
| b | 2002 | 12 |
| b | 2003 | 13 |
| c | 2003 | 14 |
| d | 2001 | 15 |
| d | 2002 | 16 |
| d | 2003 | 17 |
The task looks natural to me. X.update supersedes all old values (val.1) for the same c("names", "year") and adds new entries everywhere else.
Here it means:
- A new row for all names for the year 2003
- A new row of d for every year (d is basically added)
- A correction of b in the year 2002
X.final
| names | year | val.1 |
| -------- | -------- |
| a | 2001 | 1 |
| a | 2002 | 2 |
| a | 2003 | 11 | <-added for a for 2003
| b | 2001 | 3 | # edited, R2Evans and Sam are right
| b | 2002 | 12 | <-corrected for b for 2002
| b | 2003 | 13 | <-added for b for 2003
| c | 2001 | 5 |
| c | 2002 | 6 |
| c | 2003 | 14 | <-added for c for 2003
| d | 2001 | 15 | <-added
| d | 2002 | 16 | <-added
| d | 2003 | 17 | <-added
Since I need this for tables with 100,000s of rows, I wanted to ask for an idiomatic (= fast) solution in DT.
If you remove from
Xwhat exists inX.updateand then row-bind, it'll be rather direct.(FYI, I think your row 4 should have
val.1 = 3instead of4.)X[!X.update,]is an anti-join, operating on the keys of the first frame.