I have a data.table where each row corresponds to a set with assigned mass. Sometimes, the mass assigned is (NA). I would like to update it by the minimum of numerical masses assigned to its subsets.
The set is encoded by set column (bit notation represented as an integer). The imprint column is a bitmap representation of the set for visual control of this example. N stands for the cardinality of the set.
dt <- data.table(id.fe = 1:15, set = c(1,2,4,8,3,5,9,6,10,12,7,11,13,14,15), imprint = c('0001','0010','0100','1000','0011','0101','1001','0110','1010','1100','0111','1011','1101','1110','1111'), N = c(1,1,1,1,2,2,2,2,2,2,3,3,3,3,4), mass = c(0.4,1,1,0,0.3,NA,NA,NA,NA,0,NA,NA,NA,NA,NA))
id.fe set imprint N mass
1: 1 1 0001 1 0.4
2: 2 2 0010 1 1.0
3: 3 4 0100 1 1.0
4: 4 8 1000 1 0.0
5: 5 3 0011 2 0.3
6: 6 5 0101 2 NA
7: 7 9 1001 2 NA
8: 8 6 0110 2 NA
9: 9 10 1010 2 NA
10: 10 12 1100 2 0.0
11: 11 7 0111 3 NA
12: 12 11 1011 3 NA
13: 13 13 1101 3 NA
14: 14 14 1110 3 NA
15: 15 15 1111 4 NA
I.e. in the case of set id.fe==6 with two elements, we have assigned mass NA and we would like to replace it by minimum mass of its subsets - i.e. sets id.fe == 1 and id.fe == 3 which is 0.4. For illustration, I store it in a new column newMass.
So far I do it in a for loop: Note that the condition bitwAnd(set, setId) == set, ]$mass)] guarantees that set is a subset of idSet. The other two conditions N < NId and !is.na(mass)are not necessary - I hope it will speed up the calculations.
for(id in dt[is.na(mass),id.fe]) {
idSet <- dt[id.fe == id, set]
idN <- dt[id.fe == id, N]
dt[id.fe == id, newMass := min(dt[N < idN & !is.na(mass) & bitwAnd(set, idSet) == set, ]$mass)]
}
dt[]
id.fe set imprint N mass newMass
1: 1 1 0001 1 0.4 NA
2: 2 2 0010 1 1.0 NA
3: 3 4 0100 1 1.0 NA
4: 4 8 1000 1 0.0 NA
5: 5 3 0011 2 0.3 NA
6: 6 5 0101 2 NA 0.4
7: 7 9 1001 2 NA 0.0
8: 8 6 0110 2 NA 1.0
9: 9 10 1010 2 NA 0.0
10: 10 12 1100 2 0.0 NA
11: 11 7 0111 3 NA 0.3
12: 12 11 1011 3 NA 0.0
13: 13 13 1101 3 NA 0.0
14: 14 14 1110 3 NA 0.0
15: 15 15 1111 4 NA 0.0
Is there any possibility to remove the loop?
Try this sequence.
Pre-determine all sets' subsets:
Calculate and summarize each
set's min of subset masses:Join and coalesce: