I have some patient data, where the individual patients change treatment groups over time. My goal is to visualize the sequence of group changes and aggregate this data into a "sequence profile" for each treatment group.
For each treatment group I would like to show, when it generally occurs in the treatment cycle (say rather in the beginning or in the end). To account for the differing sequence length, I would like to standardize these profiles betweenn 0 (very beginning) and 1 (end).
I would like to find an efficient data preparation and visualization.
Mininmal Example
Structure of Data
library(dplyr)
library(purrr)
library(ggplot2)
# minimal data
cj_df_raw <- tibble::tribble(
  ~id, ~group,
    1,    "A",
    1,    "B",
    2,    "A",
    2,    "B",
    2,    "A"
  )
# compute "intervals" for each person [start, end]
cj_df_raw %>% 
  group_by(id) %>% 
  mutate(pos = row_number(),
         len = length(id),
         start = (pos - 1) / len,
         end = pos / len) %>% 
  filter(group == "A")
#> # A tibble: 3 x 6
#> # Groups:   id [2]
#>      id group   pos   len start   end
#>   <dbl> <chr> <int> <int> <dbl> <dbl>
#> 1     1 A         1     2 0     0.5  
#> 2     2 A         1     3 0     0.333
#> 3     2 A         3     3 0.667 1
(So Id 1 was in group A in the first 50% of their sequence, and Id 2 was in Group A in the first 33% and the last 33% of their sequence. This means, that 2 Ids where between 0-33% of the sequence, 1 between 33-50%, 0 between 50-66% and 1 above 66%.)
This is the result I would like to achieve and I miss a chance to transform my data effectively.
Desired outcome
profile_treatmen_a <- tibble::tribble(
    ~x, ~y,
     0, 0L,
  0.33, 2L,
   0.5, 1L,
  0.66, 0L,
     1, 1L,
     1, 0L
  )
profile_treatmen_a %>% 
  ggplot(aes(x, y)) +
  geom_step(direction = "vh") +
  expand_limits(x = c(0, 1), y = 0)
(Ideally the area under the curve would be shaded)
Ideal solution: via ggridges
The goal of the visualization would be to compare the "sequence-profile" of many treatment-groups at the same time. If I could prepare the data accordingly, I would like to use the ggridges-package for a striking visual comparison the treatment groups.
library(ggridges)
data.frame(group = rep(letters[1:2], each=20),
           mean = rep(2, each=20)) %>% 
  mutate(count = runif(nrow(.))) %>% 
  ggplot(aes(x=count, y=group, fill=group)) +
  geom_ridgeline(stat="binline", binwidth=0.5, scale=0.9)

                        
You could build helper intervals and then just plot a histogram. Since each patient is either in Group A or B both groups sum up to 100%. With these helper intervals you could also easily switch to other
geoms.