Assume that I have several years of personal data, in which a Personal ID (indiv) identifies individuals across years, but the household ID (househ) only assigns household relation within a year.
Example:
libary(dplyr)
test_01 <- data.frame(indiv=c(1,2,3,4,5,6),househ=c(1,1,2,3,4,4),time=rep(1,6)) #1&2 form a household, 3 and 4 are single, 5&6 form a household
test_02 <- data.frame(indiv=c(2,3,4,5,6,7),househ=c(1,2,2,3,3,4),time=rep(2,6)) #1 exits, so 2 is now a new household, 3&4 now form a new household, 5&6 still do, 7 enters
test_03 <- data.frame(indiv=c(2,3,4,5,7,8,9,10),househ=c(1,2,2,3,4,5,5,6),time=rep(3,8)) #according to logic above
data_test_panel <- bind_rows(test_01,test_02,test_03)
The desired time-consistent household variable would be:
data_test_panel$true_household <- c(1,1,2,3,4,4,5,6,6,4,4,7,5,6,6,8,7,9,9,10)
So far I tried:
library(data.table)
setDT(data_test_panel)[,cons_household := .GRP,.(time,househ)] # where cons_household is the new household ID. However, this doesn't give the same household ID across time but assigns new values for every appearance of a household.
Thank you kindly for all your help! /Severin
Create a list column of unique individuals in a given household at each time period. We can then take the group number by that column.
dplyrapproachThis is much nicer than the
data.tableapproach:Note this uses per-operation grouping with the
.byparameter which requires at leastdplyr v1.1.0.data.tableapproachThe syntax for this is much more complicated because
data.tablewill automatically unroll your list column into a vector if you try to make itlist(indiv)by group.Also
data.tabledoesn't support grouping by list columns so we need to usetoString(), which is OK in this case but in general is not ideal as you have to think through the format of the column to ensure it will not somehow cause ambiguity leading two distinct groups to be merged.