Unlist/unnest list column into several columns

Question

Unlist/unnest list column into several columns

298 Views Asked by deschen At 08 December 2020 at 22:36

I know teh question of unnesting a list column in a data frame has been raised and answered multiple times. However, here's the potentially 237. problem of that kind.

I have the following data:

set.seed(666)
dat <- data.frame(sysRespNum = c(1,2,3,4,5,6),
                  product1   = sqrt(rnorm(6, 20, 5)^2),
                  product2   = sqrt(rnorm(6, 20, 5)^2),
                  product3   = sqrt(rnorm(6, 20, 5)^2))

data:

  sysRespNum  product1 product2 product3
1          1 23.766555 13.46907 24.32327
2          2 30.071773 15.98740 11.39922
3          3 18.224328 11.03880 20.67063
4          4 30.140839 19.78984 19.62087
5          5  8.915628 30.75021 24.29150
6          6 23.791981 11.14885 21.72450

Now, I want to calculate the proportion of each product among the sum of all products, so I want to calculate product1/sum(my three products), then the same for product 2 and 3. So I'm expecting three new columns.

I've tried the following:

library(tidyverse)    
dat %>%
  mutate(sum_Product = apply(across(-sysRespNum), 1, function(x) list(sum_Product = x/sum(x))))

(side question: is there maybe a more straightforward way of mutating this directly without the need to create a list. I now I could create a sum column first and then do a simple mutate along with across. But I'm wondering if teh calculations can be achieved without creating a temporary sum column first)

Now my problem is that it's difficult to unnest the sum_Product list column. unnest_wider doesn't work, the sum_Product column is still a list.

So the only thing that worked for me is

following this solution: https://stackoverflow.com/a/60824506/2725773
changing my code above and replace the list part by data.frame:

full code:

dat %>%
  mutate(sum_Product = apply(across(-sysRespNum), 1, function(x) data.frame(sum_Product = x/sum(x)))) %>%
  unnest(cols = everything()) %>%
  mutate(product = rep(1:3, nrow(.)/3)) %>%
  pivot_wider(values_from = sum_Product,
              names_from = product,
              names_prefix = "share_product")

which gives the correct result:

# A tibble: 6 x 7
  sysRespNum product1 product2 product3 share_product1 share_product2
       <dbl>    <dbl>    <dbl>    <dbl>          <dbl>          <dbl>
1          1    23.8      13.5     24.3          0.386          0.219
2          2    30.1      16.0     11.4          0.523          0.278
3          3    18.2      11.0     20.7          0.365          0.221
4          4    30.1      19.8     19.6          0.433          0.285
5          5     8.92     30.8     24.3          0.139          0.481
6          6    23.8      11.1     21.7          0.420          0.197
# … with 1 more variable: share_product3 <dbl>

However, it feels unnecessary complicated to unnest everything and then reshape with pivot_wider.

So a) is there a more elegant way to calculate my share variables and b) is there a more elegant/shorter/less verbose way of reshaping a list column into multiple vector columns?

Original Q&A

There are 3 best solutions below

ThomasIsCoding On 08 December 2020 at 22:47

I guess the following base R code should work for you

cbind(
  dat,
  setNames(dat[-1] / rowSums(dat[-1]), paste0("share_product", seq_along(dat[-1])))
)

which gives

  sysRespNum  product1 product2 product3 share_product1 share_product2
1          1 23.766555 13.46907 24.32327      0.3860783      0.2187998
2          2 30.071773 15.98740 11.39922      0.5233660      0.2782431
3          3 18.224328 11.03880 20.67063      0.3649701      0.2210688
4          4 30.140839 19.78984 19.62087      0.4333597      0.2845348
5          5  8.915628 30.75021 24.29150      0.1393996      0.4807925
6          6 23.791981 11.14885 21.72450      0.4198684      0.1967490
  share_product3
1      0.3951219
2      0.1983909
3      0.4139610
4      0.2821054
5      0.3798079
6      0.3833826

Gwang-Jin Kim On 08 December 2020 at 22:55

Good old plain basic R

rdat <- dat[-1]
rdat <- rdat/rowSums(rdat)
colnames(rdat) <- paste0("share_", colnames(rdat))
cbind(dat, rdat)

Which gives:

  sysRespNum  product1 product2 product3 share_sum_product1 share_sum_product2
1          1 23.766555 13.46907 24.32327          0.3860783          0.2187998
2          2 30.071773 15.98740 11.39922          0.5233660          0.2782431
3          3 18.224328 11.03880 20.67063          0.3649701          0.2210688
4          4 30.140839 19.78984 19.62087          0.4333597          0.2845348
5          5  8.915628 30.75021 24.29150          0.1393996          0.4807925
6          6 23.791981 11.14885 21.72450          0.4198684          0.1967490
  share_sum_product3
1          0.3951219
2          0.1983909
3          0.4139610
4          0.2821054
5          0.3798079
6          0.3833826

**akrun** · Accepted Answer · 2020-12-08T22:38:03.713000

It is easier to do this rowSums i.e. divide the 'product1' by the rowSums on the columns that starts with key word 'product'. Instead of doing rowwise with c_across, this is vectorized and should be fast as well

library(dplyr)
dat %>%
    mutate(sum_product = product1/rowSums(select(., starts_with('product'))))

NOTE: There is a mixing of base R code (apply) and the tidyverse option with across which doesn't seem to be the optimal way

If we need to do this for all the 'product' columns, create a sum column first with mutate and then use across on the columns that starts with 'product' to divide the column by 'Sum_col'

dat %>%
     mutate(Sum_col = rowSums(select(., starts_with('product'))),
           across(starts_with('product'),
        ~ ./Sum_col, .names = '{.col}_sum_product')) %>%
     select(-Sum_col)

-output

#ysRespNum  product1 product2 product3 product1_sum_product product2_sum_product product3_sum_product
#1          1 23.766555 13.46907 24.32327            0.3860783            0.2187998            0.3951219
#2          2 30.071773 15.98740 11.39922            0.5233660            0.2782431            0.1983909
#3          3 18.224328 11.03880 20.67063            0.3649701            0.2210688            0.4139610
#4          4 30.140839 19.78984 19.62087            0.4333597            0.2845348            0.2821054
#5          5  8.915628 30.75021 24.29150            0.1393996            0.4807925            0.3798079
#6          6 23.791981 11.14885 21.72450            0.4198684            0.1967490            0.3833826

Or using base R

nm1 <- startsWith(names(dat), 'product')
dat[paste0('sum_product', seq_along(nm1))] <- dat[nm1]/rowSums(dat[nm1])

Unlist/unnest list column into several columns

There are 3 best solutions below

Related Questions in R

Related Questions in LIST

Related Questions in UNNEST

Trending Questions

Popular # Hahtags

Popular Questions