How to pivot multiple columns from wide to long by matching all possible combinations in R?

40 Views Asked by At

I found this case and this other but they don't quite answer my problem.

Input data are in wide format:

dat0
    # A tibble: 5 × 6
         id gp       a1    a2    a3    a4
      <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
    1     1 A         2    10   150  1000
    2     2 B         4    20   200  3000
    3     3 C         7    30   350  4000
    4     4 D         8    40   400  7000
    5     5 E         9    50   700  8000  

dat0 <-
structure(list(id = c(1, 2, 3, 4, 5), gp = c("A", "B", "C", "D", 
"E"), a1 = c(2, 4, 7, 8, 9), a2 = c(10, 20, 30, 40, 50), a3 = c(150, 
200, 350, 400, 700), a4 = c(1000, 3000, 4000, 7000, 8000)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -5L)) 

How to pivot the multiple columns into two columns ax and ay so that:

  • all combinations of initial columns a1 to a4 are represented,
  • the two initial first columns (here id and gp, given as example) are replicated as many times as there are possible combinations,
  • a new column ax_vs_ay is created to categorize the combinations.

Desired output data:

print(dat1, n=30)
# A tibble: 30 × 5
      id gp    ax_vs_ay    ax    ay
   <dbl> <chr> <chr>    <dbl> <dbl>
 1     1 A     a1_vs_a2     2    10
 2     2 B     a1_vs_a2     4    20
 3     3 C     a1_vs_a2     7    30
 4     4 D     a1_vs_a2     8    40
 5     5 E     a1_vs_a2     9    50
 6     1 A     a1_vs_a3     2   150
 7     2 B     a1_vs_a3     4   200
 8     3 C     a1_vs_a3     7   350
 9     4 D     a1_vs_a3     8   400
10     5 E     a1_vs_a3     9   700
11     1 A     a1_vs_a4     2  1000
12     2 B     a1_vs_a4     4  3000
13     3 C     a1_vs_a4     7  4000
14     4 D     a1_vs_a4     8  7000
15     5 E     a1_vs_a4     9  8000
16     1 A     a2_vs_a3    10   150
17     2 B     a2_vs_a3    20   200
18     3 C     a2_vs_a3    30   350
19     4 D     a2_vs_a3    40   400
20     5 E     a2_vs_a3    50   700
21     1 A     a2_vs_a4    10  1000
22     2 B     a2_vs_a4    20  3000
23     3 C     a2_vs_a4    30  4000
24     4 D     a2_vs_a4    40  7000
25     5 E     a2_vs_a4    50  8000
26     1 A     a3_vs_a4   150  1000
27     2 B     a3_vs_a4   200  3000
28     3 C     a3_vs_a4   350  4000
29     4 D     a3_vs_a4   400  7000
30     5 E     a3_vs_a4   700  8000  

Thanks for help

2

There are 2 best solutions below

2
Jon Spring On BEST ANSWER

This might be simplest to think of as a self-join of the data in longer format + some relabeling.

dat0_long <- dat0 |> pivot_longer(-(1:2))

dat0_long |>
  inner_join(dat0_long, join_by(id, gp, name < name)) |>
  mutate(id, gp, ax_vs_ay = paste0(name.x, "_vs_", name.y),
         ax = value.x, ay = value.y, .keep = "none")

Result

# A tibble: 30 × 5
      id gp    ax_vs_ay    ax    ay
   <dbl> <chr> <chr>    <dbl> <dbl>
 1     1 A     a1_vs_a2     2    10
 2     1 A     a1_vs_a3     2   150
 3     1 A     a1_vs_a4     2  1000
 4     1 A     a2_vs_a3    10   150
 5     1 A     a2_vs_a4    10  1000
 6     1 A     a3_vs_a4   150  1000
 7     2 B     a1_vs_a2     4    20
 8     2 B     a1_vs_a3     4   200
 9     2 B     a1_vs_a4     4  3000
10     2 B     a2_vs_a3    20   200
11     2 B     a2_vs_a4    20  3000
12     2 B     a3_vs_a4   200  3000
13     3 C     a1_vs_a2     7    30
14     3 C     a1_vs_a3     7   350
15     3 C     a1_vs_a4     7  4000
16     3 C     a2_vs_a3    30   350
17     3 C     a2_vs_a4    30  4000
18     3 C     a3_vs_a4   350  4000
19     4 D     a1_vs_a2     8    40
20     4 D     a1_vs_a3     8   400
21     4 D     a1_vs_a4     8  7000
22     4 D     a2_vs_a3    40   400
23     4 D     a2_vs_a4    40  7000
24     4 D     a3_vs_a4   400  7000
25     5 E     a1_vs_a2     9    50
26     5 E     a1_vs_a3     9   700
27     5 E     a1_vs_a4     9  8000
28     5 E     a2_vs_a3    50   700
29     5 E     a2_vs_a4    50  8000
30     5 E     a3_vs_a4   700  8000
0
ThomasIsCoding On

Forgive me that I am not that proficient with dplyr, but I hope it could help

dat0 %>%
    pivot_longer(!c(id, gp)) %>%
    reframe(
        ax_vs_ay = c(combn(name, 2, paste0, collapse = "_vs_")),
        value = combn(value, 2, \(x) setNames(data.frame(t(x)), paste0("a", c("x", "y"))), simplify = FALSE), .by = c(id, gp)
    ) %>%
    unnest(value)

which gives

# A tibble: 30 × 5
      id gp    ax_vs_ay     ax    ay
   <dbl> <chr> <chr>     <dbl> <dbl>
 1     1 A     a1_vs_a2      2    10
 2     1 A     a1_vs_a3      2   150
 3     1 A     a1_vs_a4      2  1000
 4     1 A     a2_vs_a3     10   150
 5     1 A     a2_vs_a4     10  1000
 6     1 A     a3_vs_a4    150  1000
 7     2 B     a1_vs_a2      4    20
 8     2 B     a1_vs_a3      4   200
 9     2 B     a1_vs_a4      4  3000
10     2 B     a2_vs_a3     20   200
11     2 B     a2_vs_a4     20  3000
12     2 B     a3_vs_a4    200  3000
13     3 C     a1_vs_a2      7    30
14     3 C     a1_vs_a3      7   350
15     3 C     a1_vs_a4      7  4000
16     3 C     a2_vs_a3     30   350
17     3 C     a2_vs_a4     30  4000
18     3 C     a3_vs_a4    350  4000
19     4 D     a1_vs_a2      8    40
20     4 D     a1_vs_a3      8   400
21     4 D     a1_vs_a4      8  7000
22     4 D     a2_vs_a3     40   400
23     4 D     a2_vs_a4     40  7000
24     4 D     a3_vs_a4    400  7000
25     5 E     a1_vs_a2      9    50
26     5 E     a1_vs_a3      9   700
27     5 E     a1_vs_a4      9  8000
28     5 E     a2_vs_a3     50   700
29     5 E     a2_vs_a4     50  8000
30     5 E     a3_vs_a4    700  8000