Select randomly one member of each family with missing data

34 Views Asked by At

I have a huge dataset like this one:

FAMID <- c(1,1,2,3,3,4,4,5,6)
IID <-   c(1,2,2,1,2,1,2,1,2)
Value <- c(3,6,3,5,6,7,0,4,6)

df <- as.data.frame(cbind(FAMID, IID, Value))

      FAMID IID Value
 [1,]     1   1     3
 [2,]     1   2     6
 [3,]     2   2     3
 [4,]     3   1     5
 [5,]     3   2     6
 [6,]     4   1     7
 [7,]     4   2     0
 [8,]     5   1     4
 [9,]     6   2     6

And I need to randomly select one member from each family (FAMID). Each family can have two members (IID=1 or IID=2) and there are families with both members and families with just one member (that is either 1 or 2). I need to randomly select just one member from each family but for those families with just one member I need to include that member necesarily. So, I would need an output like this one:

     FAMID IID Value
[1,]     1   1     3
[2,]     2   2     3
[3,]     3   2     6
[4,]     4   1     7
[5,]     5   1     4
[6,]     6   2     6

I have tried with dplyr but I cannot make it work.

Thank you so much in advance.

1

There are 1 best solutions below

1
ThomasIsCoding On

Probably this is what you are after

df %>%
    slice_sample(by = FAMID)

or in base R you can try

subset(df, ave(FAMID, FAMID, FUN = \(x) sample(length(x))) == 1)