How to order the x axis based on the value matching a factor level with two facets in R ggplot2?

52 Views Asked by At

I am making a stacked barplot of the proportion of time (Duration) spent on using three strategies (CA.presence: Presence, Possible presence, Absence) by 20 participants (participant) having performed two same tasks (task). I would like to facet by task.

I've used the following code and obtain this plot: plot

CA_presence %>% ggplot(aes(fill=CA.presence, y= Duration, x= participant, label =    scales::percent(Duration))) + 
geom_bar(width = .7, position="fill", stat="identity") +
scale_y_continuous(labels = scales::label_percent()) +
facet_grid(~task) +
ggtitle("Presence of CA in the dataset") +
theme(plot.title = element_text(size = 15)) +
theme(plot.title = element_text(hjust = 0.5)) +
theme(legend.text = element_text(size = 10)) +
theme(strip.text.x = element_text(size = 13)) +
labs(x = "Participant",
     y = "Proportion of discourse time",
     fill = "CA presence") +
theme(axis.text = element_text(size = 10)) +
theme(axis.title = element_text(size = 10)) +
scale_fill_brewer(palette = "Greens") +
coord_flip()

Here is the output of dput() for the head of the dataset:

structure(
  list(
    participant = c("L001", "L001", "L002", "L002", "L016", "L016"),
    task = c("T05", "T12", "T05", "T12", "T05", "T12"),
    language = c("French", "French", "French", "French", "French", "French"),
    Duration = c(8823, 46275, 2459, 38193, 20488, 160970),
    CA.presence = c("Presence", "Presence", "Presence", "Presence", "Presence", "Presence")
  ),
  class = c("grouped_df", "tbl_df", "tbl", "data.frame"),
  row.names = c(NA, -6L), groups = structure(
    list(
      participant = c("L001", "L001", "L002", "L002", "L016", "L016"),
      task = c("T05", "T12", "T05", "T12", "T05", "T12"),
      .rows = structure(list(
        1L, 2L, 3L, 4L, 5L, 6L
      ), ptype = integer(0), class = c(
        "vctrs_list_of",
        "vctrs_vctr", "list"
      ))
    ),
    class = c("tbl_df", "tbl", "data.frame"),
    row.names = c(NA, -6L), .drop = TRUE
  )
)
1

There are 1 best solutions below

2
stefan On

Ordering the bars in each facet individually requires some effort and a helper column. First, instead of relying on position="fill" compute the percentage shares manually. Second, arrange your dataset by task, by CA.Presence such that the "Presence" Category comes first (Note: I converted CA.Presence to a factor to make this easier) and finally by the (just created) share of duration column. Third, create a helper column as the interaction of task and participant. Fourth, convert to a factor with the order set according to the order in the (re-arranged) data for which I use forcats::fct_inorder. Fifth, map this new column on y (or x if your prefer using coord_flip). Finally, get rid of the task part in the axis labels:

In general, you could use reorder to create a factor ordered by the value of a numeric variable. Your case is a bit special as you want to reorder by one category of a stacked barchart (and one category of the faceting variable??). To this end we could use a small trick, which means to use an ifelse to set the values for the other categories to zero for the reordering and use FUN=sum. Using this trick I reordered participants according to the value for "Presence" category for task "TO5".

As your example data contained only of the CA.presence categories I created some fake example data:

set.seed(123)

CA_presence <- expand.grid(
  participant = c("L001", "L002", "L016", "L004", "L005"),
  task = c("T05", "T12"),
  CA.presence = c("Presence", "Possible presence", "Absence")
) |>
  transform(
    Duration = rnorm(30, 20000, 5000)
  )

library(ggplot2)
library(dplyr, warn=FALSE)

CA_presence <- CA_presence |>
  mutate(
    CA.presence = factor(
      CA.presence, rev(c("Presence", "Possible presence", "Absence"))
    )
  ) |> 
  mutate(
    duration_share = Duration / sum(Duration), 
    .by = c(task, participant)
  ) |> 
  # Order by Task, Presence first and Duration ascending
  arrange(task, desc(CA.presence), desc(duration_share)) |> 
  # Create a helper column to be mapped on y
  mutate(
    group = paste(task, participant, sep = "."),
    group = forcats::fct_inorder(group)
  )

CA_presence |>
  ggplot(aes(
    fill = CA.presence,
    x = duration_share,
    y = group,
    label = scales::percent(duration_share)
  )) +
  geom_col(
    width = .7, position = "fill"
  ) +
  scale_x_continuous(labels = scales::label_percent()) +
  # Remove the task part from the labels
  scale_y_discrete(labels = \(x) gsub("^.*?\\.", "", x)) +
  facet_wrap(~task, scales = "free_y") +
  labs(
    x = "Proportion of discourse time",
    y = "Participant",
    fill = "CA presence",
    title = "Presence of CA in the dataset"
  ) +
  theme(axis.text = element_text(size = 10)) +
  theme(axis.title = element_text(size = 10)) +
  theme(plot.title = element_text(size = 15, hjust = 0.5)) +
  theme(legend.text = element_text(size = 10)) +
  theme(strip.text.x = element_text(size = 13)) +
  scale_fill_brewer(palette = "Greens")