Find unique values in a column in R

45 Views Asked by At

I have data from videos and I am trying to make a new table which summarizes the information on number of unique users so far in the video. The first table is what I have and the second table is what I am looking for using R. I tried group by Video and Minute but that gives me unique users for that minute only.

User Video Minute
a V1 1
b V1 1
b V1 1
c V1 1
d V1 2
c V1 2
e V1 2
a V2 1
b V2 2
c V2 2
Video Minute Unique Users
V1 1 3
V1 2 5
V2 1 1
V2 2 3
2

There are 2 best solutions below

0
r2evans On BEST ANSWER

I think what you're looking for is a cumulative count of distinct users, so in Minute=2, you want to count all the users in Minute 1 and 2.

dplyr

library(dplyr)
summarize(quux, .by = c("Video", "Minute"), Users = list(User)) |>
  mutate(
    .by = "Video",
    UniqueUsers = sapply(1:n(), function(i) n_distinct(unlist(Users[1:i])))
  )
#   Video Minute      Users UniqueUsers
# 1    V1      1 a, b, b, c           3
# 2    V1      2    d, c, e           5
# 3    V2      1          a           1
# 4    V2      2       b, c           3

base R

This is a little harder since it doesn't do transform in a grouped sense, so we need to nest it a bit ...

aggregate(User ~ Video + Minute, quux, FUN = list) |>
  transform(
    UniqueUsers = ave(
      seq_along(User), Video,
      FUN = function(rows) sapply(seq_along(rows), function(i) length(unique(unlist(User[rows[1:i]])))))
  )
#   Video Minute       User UniqueUsers
# 1    V1      1 a, b, b, c           3
# 2    V2      1          a           1
# 3    V1      2    d, c, e           5
# 4    V2      2       b, c           3

An alternative approach would be to Reduce the User list-column to unique-ify and cumulatively-combine them, then use lengths:

aggregate(User ~ Video + Minute, quux, FUN = function(L) list(unique(L))) |>
  transform(
    User = ave(User, Video, FUN = function(U) Reduce(function(...) unique(unlist(list(...))), U, accumulate = TRUE))
  ) |>
  transform(UniqueUsers = lengths(User))
#   Video Minute          User UniqueUsers
# 1    V1      1       a, b, c           3
# 2    V2      1             a           1
# 3    V1      2 a, b, c, d, e           5
# 4    V2      2       a, b, c           3

I don't know that it adds much value, over to you.


Data

quux <- structure(list(User = c("a", "b", "b", "c", "d", "c", "e", "a", "b", "c"), Video = c("V1", "V1", "V1", "V1", "V1", "V1", "V1", "V2", "V2", "V2"), Minute = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L)), class = "data.frame", row.names = c(NA, -10L))
0
Onyambu On

Use the code below:

df %>%
   group_by(Video, Minute)%>%
   summarise(user = list(unique(User)), .groups = 'drop_last')%>%
   mutate(user = accumulate(user, union), n_user = lengths(user))

# A tibble: 4 × 4
# Groups:   Video [2]
  Video Minute user      n_user
  <chr>  <int> <list>     <int>
1 V1         1 <chr [3]>      3
2 V1         2 <chr [5]>      5
3 V2         1 <chr [1]>      1
4 V2         2 <chr [3]>      3

To take a peek into the user list:

df %>%
   group_by(Video, Minute)%>%
   summarise(user = list(unique(User)), .groups = 'drop_last')%>%
   mutate(user = accumulate(user, union), n_user = lengths(user)) %>%
   data.frame()

  Video Minute          user n_user
1    V1      1       a, b, c      3
2    V1      2 a, b, c, d, e      5
3    V2      1             a      1
4    V2      2       a, b, c      3