Scaling within (but not between) variables in ggridges

39 Views Asked by At

I'm trying to use the ggridges package to visualize differences in the population distribution across multiple groups, but am having problems with scaling when sample sizes differ significantly. I have differing sample sizes in both the groups and populations. When I let the package autoscale, it looks like the populations are the same size, but when I scale using height, the distributions of groups with smaller sample sizes are impossible to read. Ideally, I'd like to use height-based scaling within each group, but not across all groups.

Here is a reproducible example.

A sample data frame with three groups (a, b, c) each consisting of two populations (dogs, cats). Across all groups, there are more dogs than cats. Group c is 10x larger than groups a and b. I'm interested in the differences in distributions between dogs and cats in each group, but not the differences in size across groups.

#three groups of dogs
sa_dog = data.frame(group = "a", 
              value = rnorm(n = 50, mean = 0, sd = 1),
              species = "dog")
sb_dog = data.frame(group = "b", 
              value = rnorm(n = 50, mean = 0, sd = 1),
              species = "dog")
sc_dog = data.frame(group = "c", 
              value = rnorm(n = 500, mean = 0, sd = 1),
              species = "dog")

#three groups of cats
sa_cat = data.frame(group = "a", 
              value = rnorm(n = 10, mean = 1.5, sd = 0.5),
              species = "cat")
sb_cat = data.frame(group = "b", 
              value = rnorm(n = 10, mean = 1.5, sd = 0.5),
              species = "cat")
sc_cat = data.frame(group = "c", 
              value = rnorm(n = 100, mean = 1.5, sd = 0.5),
              species = "cat")

df <- bind_rows(sa_dog, sb_dog, sc_dog, sa_cat, sb_cat, sc_cat)

Plotting this straight up looks good, except that it visually implies that there are similar numbers of dogs and cats, which we know is not true.

library(dplyr)
library(ggplot2)
library(ggridges)

df %>% 
  ggplot(aes(x=value, y=group, fill=species)) +
  geom_density_ridges(alpha=0.5, scale=0.98)

sample plot

This can be fixed by switching to stat='density' and mapping the height aesthetic to 'count'. Not quite as pretty, but you can see the different sizes of the distributions reflected accurately. However, this only works if all of the groups have comparable sample sizes.

#leaving out group c
df %>% 
  filter(group != "c") %>%
  ggplot(aes(x=value, y=group, fill=species, height = ..count..)) +
    geom_density_ridges(alpha=0.5, scale=0.98, stat='density')

enter image description here

When one group has a much larger sample size, though, it becomes difficult to interpret anything about the smaller groups.

df %>% 
  ggplot(aes(x=value, y=group, fill=species, height = ..count..)) +
    geom_density_ridges(alpha=0.5, scale=0.98, stat='density')

enter image description here

I'd like to combine the first and second plots; I want height-based scaling within each group, but not to scale across all groups. I don't need to be able to see from the visualization that group c is bigger than groups a and b, but I do need to be able to compare the distributions of dogs and cats in each group.

0

There are 0 best solutions below