Adding a text label on top of the spike in ggdist::stat_spike

46 Views Asked by At

I have continuous gene expression data (log2TPM) on the X-axis and categorical data on the Y-axis (gene groupings). I I am using the following script and to get the plot below:

ggplot(merged_data, aes(x = log2TPM, y=categ)) + 
  ggdist::stat_slab(justification = -.2, width = .6) + 
  geom_boxplot(outlier.color = NA, width = .15) +
  ggdist::stat_spike(at=gene_data$log2TPM, justification = -.2)

enter image description here

Is there a way to properly place a text label on top of the spike? I am envisioning two ways, one with the label in the upper margin of the plotting area, or another, amybe more complicated where the label tracks the half eye plot (stat_slab).

The goal is to get something like this, after adding the label contained in gene_data$gene_name:

enter image description here

Here is a sample of what my data looks like:

r$> merged_data  %>%  select(log2TPM,gene_name,categ)  %>% head()
log2TPM gene_name     categ
3.3575520    TSPAN6 All_13421
4.5084287      DPM1 All_13421
1.6507646     SCYL3 All_13421
0.9259994  C1orf112 All_13421
4.5637683       CFH All_13421
4.8619554     FUCA2 All_13421
1

There are 1 best solutions below

0
Matthew Kay On

The short answer is "it's complicated". The exact position of the endpoints of the spikes is determined by a combination of the scaling of the thickness aesthetic (determined by any thickness scales added to the plot, as well as the normalize and scale parameters to stat_slab), as well as the justification and position arguments to the slab.

The first thing to do to make your life easier (and which you should always do when using stat_spike to label a slab), is to add scale_thickness_shared() to the plot. This will ensure that stat_spike() and stat_slab() use the same scaling function, and will fix the fact that in your current plot the endpoints of the spike do not lie on the density curve.

With shared scaling in place, in the default case (no other changes to justification, position, or side), the endpoint of the spike will be the y position plus the thickness times 0.9 (which is the default value of scale).

You can use stat_spike() with geom = "label" or geom = "text" and an after_scale() calculation to determine the location. You have to convert the thickness to a numeric because it is a subclass of numeric that otherwise can't be used directly as a position value. Here's a simple example with two groups:

library(ggplot2)
library(ggdist)

set.seed(1234)

data.frame(
  g = c("a","b"),
  x = rnorm(1000, c(0,1))
) |>
  ggplot(aes(x = x, y = g)) + 
  stat_slab() +
  stat_spike(at = 1) +
  stat_spike(
    aes(
      label = g, 
      y = stage(start = g, after_scale = y + as.numeric(thickness) * 0.9)
    ), 
    at = 1, 
    geom = "label", 
    vjust = 0, 
    hjust = 0
  ) +
  scale_thickness_shared()

two densities with a labeled spike overlaid at x = 1

You have to do as.numeric(thickness) instead of just thickness because thickness values are a subclass of numeric that won't work directly as positional values. See Details in help("ggdist::thickness") for more on that if you're curious.

To be honest, this all should be easier but that's the best approach I can suggest at the moment.