I have 2 questions about Sequential Pattern Mining.
df <- data.frame(member_no = c('1','1','1','2','3','4','5','4','3','2','3','1','2','2','4'),
year_month = c('2020_Apr','2021_Mar','2021_Mar','2022_Jan','2023_May','2022_Dec','2019_Nov','2022_Feb','2021_Aug','2021_Aug','2020_Jan','2021_Mar','2021_Dec','2021_Jul','2023_Apr'),
product = c('A','B','B','B','C','C','A','B','B','B','A','B','B','B','C'))
dataset <- df |>
select(member_no, year_month, product) |>
group_by(member_no, year_month) |>
summarize(itemset = paste(as.character(product), collapse = ','))
write.table(dataset, 'data.txt', sep = ',', quote = F, row.names = F, col.names = F)
transaction <- read_baskets('data.txt', sep = ',', info = c('sequenceID', 'eventID'))
inspect(transaction)
freq.s <- cspade(transaction, parameter = list(support = 0.001),
control = list(verbose = T))
inspect(head(freq.s, 1000, by = 'support'))
df1 <- freq.s
# output all results
df1 <- as(df1, "data.frame") %>% as_tibble()
df1$pattern <- (str_count(df1$sequence, ",") + 1)
df1 <- df1[order(-df1$support),] # descending
Q1: View(dataset) shows the member_no 1 has 3 itemsets in one single month.
Do I only count it only once as the sequence <{A}, {B}> or <{A}, {B,B,B}>
Q2: Why would sum(df1$support) more than 1?
Thank you guys!