Scaling within (but not between) variables in ggridges

39 Views Asked by Jaken At 12 February 2024 at 21:52

I'm trying to use the ggridges package to visualize differences in the population distribution across multiple groups, but am having problems with scaling when sample sizes differ significantly. I have differing sample sizes in both the groups and populations. When I let the package autoscale, it looks like the populations are the same size, but when I scale using height, the distributions of groups with smaller sample sizes are impossible to read. Ideally, I'd like to use height-based scaling within each group, but not across all groups.

Here is a reproducible example.

A sample data frame with three groups (a, b, c) each consisting of two populations (dogs, cats). Across all groups, there are more dogs than cats. Group c is 10x larger than groups a and b. I'm interested in the differences in distributions between dogs and cats in each group, but not the differences in size across groups.

#three groups of dogs
sa_dog = data.frame(group = "a", 
              value = rnorm(n = 50, mean = 0, sd = 1),
              species = "dog")
sb_dog = data.frame(group = "b", 
              value = rnorm(n = 50, mean = 0, sd = 1),
              species = "dog")
sc_dog = data.frame(group = "c", 
              value = rnorm(n = 500, mean = 0, sd = 1),
              species = "dog")

#three groups of cats
sa_cat = data.frame(group = "a", 
              value = rnorm(n = 10, mean = 1.5, sd = 0.5),
              species = "cat")
sb_cat = data.frame(group = "b", 
              value = rnorm(n = 10, mean = 1.5, sd = 0.5),
              species = "cat")
sc_cat = data.frame(group = "c", 
              value = rnorm(n = 100, mean = 1.5, sd = 0.5),
              species = "cat")

df <- bind_rows(sa_dog, sb_dog, sc_dog, sa_cat, sb_cat, sc_cat)

Plotting this straight up looks good, except that it visually implies that there are similar numbers of dogs and cats, which we know is not true.

library(dplyr)
library(ggplot2)
library(ggridges)

df %>% 
  ggplot(aes(x=value, y=group, fill=species)) +
  geom_density_ridges(alpha=0.5, scale=0.98)

This can be fixed by switching to stat='density' and mapping the height aesthetic to 'count'. Not quite as pretty, but you can see the different sizes of the distributions reflected accurately. However, this only works if all of the groups have comparable sample sizes.

#leaving out group c
df %>% 
  filter(group != "c") %>%
  ggplot(aes(x=value, y=group, fill=species, height = ..count..)) +
    geom_density_ridges(alpha=0.5, scale=0.98, stat='density')

When one group has a much larger sample size, though, it becomes difficult to interpret anything about the smaller groups.

df %>% 
  ggplot(aes(x=value, y=group, fill=species, height = ..count..)) +
    geom_density_ridges(alpha=0.5, scale=0.98, stat='density')

I'd like to combine the first and second plots; I want height-based scaling within each group, but not to scale across all groups. I don't need to be able to see from the visualization that group c is bigger than groups a and b, but I do need to be able to compare the distributions of dogs and cats in each group.

Original Q&A

Scaling within (but not between) variables in ggridges

There are 0 best solutions below

Related Questions in R

Related Questions in GGPLOT2

Related Questions in HISTOGRAM

Related Questions in GGRIDGES

Trending Questions

Popular # Hahtags

Popular Questions