I have a datatable as following:
example_data <- data.table(
group_number = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4),
A = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
B = c(5, 6, 7, 8, 5, 6, 7, 8, 13, 14, 15, 16, 21, 22, 23, 24),
other_columns = c(-0.03821019, -1.65126924, 0.26851449, -0.95818859,
0.51500988, 1.01449388, -0.65350743, -0.26931109,
1.04377564, 0.36648139, 2.47164652, -0.49508173,
0.53431890, -0.35798755, 0.38692761, 0.62482005)
)
Here, I want to first group rows by group_number. And then if the rows of columns A and B of one group are identical to the rows of columns A and B of another group, then I would like to remove the duplicated group and leave only one group.
For example, in the above table, since rows of columns A and B from group_number==1 are identical to the rows of columns A and B from group_number==2, I would like to remove all the rows that belongs to group_number 2 (rows 5:8).
The desried output should look as follows:
example_data <- data.table(
group_number = c(1, 1, 1, 1, 3, 3, 3, 3, 4, 4, 4, 4),
A = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4),
B = c(5, 6, 7, 8, 13, 14, 15, 16, 21, 22, 23, 24),
other_columns = c(-0.03821019, -1.65126924, 0.26851449, -0.95818859,
1.04377564, 0.36648139, 2.47164652, -0.49508173,
0.53431890, -0.35798755, 0.38692761, 0.62482005)
)
The job can be done using distinct(example_data,A,B,.keep_all = TRUE), but this is not the answer I am seeking for. Distinct function groups data by a combination of columns A and B and then see if there are duplicated values across that group. I want to group by group_number and check if the contents of columns A and B are identical across the group_number.
Thank you very much in advance!
If I'm understanding correctly, you can try the following.
splitthe data.table into a list based ongroup_number. Then, you can extract the rows that are not duplicated based on columnsAandB. Note in thelapplywe use[to extract rows, and have an empty argument to indicate all rows should be included (and selectingAandBcolumns).Output