I have a data.frame df which has three columns named as id, year, class. id has the user ids, year has values {2018, 2019, 2020, 2021, 2022}. And class has three different values {class_A, class_B, class_C}. And the dataset has more than 50K rows.
I would like to track the flow of users (percentage, not absolute numbers) over the years from one class to another.
I am trying to follow different examples, particularly this one from here
library(ggplot2)
library(ggalluvial)
library(dplyr)
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
vaccinations <- vaccinations %>%
group_by(survey) %>%
mutate(pct = freq / sum(freq))
ggplot(vaccinations,
aes(x = survey, stratum = response, alluvium = subject,
y = pct,
fill = response %in% c("Missing", "Never"),
label = response)) +
scale_x_discrete(expand = c(.1, .1)) +
scale_y_continuous(label = scales::percent_format()) +
scale_fill_manual(values = c(`TRUE` = "cadetblue1", `FALSE` = "grey50")) +
geom_flow() +
geom_stratum(alpha = .5) +
geom_text(aes(label = paste0(..stratum.., "\n", scales::percent(..count.., accuracy = .1))), stat = "stratum", size = 3) +
theme(legend.position = "none") +
ggtitle("vaccination survey responses at three points in time")
But I don't know how to make years as axes (one for each) and stratums should be classes.
Any guidance please.