How to add count (n) / summary statistics as a label to ggplot2 boxplots?

1.1k Views Asked by At

I am new to R and trying to add count labels to my boxplots, so the sample size per boxplot shows in the graph.

This is my code:

  bp_east_EC <-total %>% filter(year %in% c(1977, 2020, 2021, 1992),
                                 sampletype == "groundwater",
                                 East == 1,
                                 #EB == 1,
                                 #N59 == 1,
                                 variable %in% c("EC_uS")) %>%

ggplot(.,aes(x = as.character(year), y = value, colour = as.factor(year))) +
theme_ipsum() +
ggtitle("Groundwater EC, eastern Curacao") +
theme(plot.title = element_text(hjust = 0.5, size=14)) +
theme(legend.position = "none") +
labs(x="", y="uS/cm") +
geom_jitter(color="grey", size=0.4, alpha=0.9) +
geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=23, size=2) #shows mean

I have googled a lot and tried different things (with annotate, with return functions, mtext, etc), but it keeps giving different errors. I think I am such a beginner I cannot figure out how to integrate such suggestions into my own code.

Does anybody have an idea what the best way would be for me to approach this?

1

There are 1 best solutions below

1
jrcalabrese On

I would create a new variable that contained your sample sizes per group and plot that number with geom_label. I've generated an example of how to add count/sample sizes to a boxplot using the iris dataset since your example isn't fully reproducible.

library(tidyverse)
data(iris)

# boxplot with no label
ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) + 
  geom_boxplot()

# boxplot with label
iris %>%
  group_by(Species) %>%
  mutate(count = n()) %>%
  mutate(mean = mean(Sepal.Length)) %>%
  ggplot(aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_boxplot() +
  geom_label(aes(label= count , y = mean + 0.75), # <- change this to move label up and down
            size = 4, position = position_dodge(width = 0.75)) + 
  geom_jitter(alpha = 0.35, aes(color = Species))  +
  stat_summary(fun = mean, geom = "point", shape = 23, size = 6)

enter image description here