I know teh question of unnesting a list column in a data frame has been raised and answered multiple times. However, here's the potentially 237. problem of that kind.
I have the following data:
set.seed(666)
dat <- data.frame(sysRespNum = c(1,2,3,4,5,6),
product1 = sqrt(rnorm(6, 20, 5)^2),
product2 = sqrt(rnorm(6, 20, 5)^2),
product3 = sqrt(rnorm(6, 20, 5)^2))
data:
sysRespNum product1 product2 product3
1 1 23.766555 13.46907 24.32327
2 2 30.071773 15.98740 11.39922
3 3 18.224328 11.03880 20.67063
4 4 30.140839 19.78984 19.62087
5 5 8.915628 30.75021 24.29150
6 6 23.791981 11.14885 21.72450
Now, I want to calculate the proportion of each product among the sum of all products, so I want to calculate product1/sum(my three products), then the same for product 2 and 3. So I'm expecting three new columns.
I've tried the following:
library(tidyverse)
dat %>%
mutate(sum_Product = apply(across(-sysRespNum), 1, function(x) list(sum_Product = x/sum(x))))
(side question: is there maybe a more straightforward way of mutating this directly without the need to create a list. I now I could create a sum column first and then do a simple mutate along with across. But I'm wondering if teh calculations can be achieved without creating a temporary sum column first)
Now my problem is that it's difficult to unnest the sum_Product list column. unnest_wider doesn't work, the sum_Product column is still a list.
So the only thing that worked for me is
- following this solution: https://stackoverflow.com/a/60824506/2725773
- changing my code above and replace the
listpart bydata.frame:
full code:
dat %>%
mutate(sum_Product = apply(across(-sysRespNum), 1, function(x) data.frame(sum_Product = x/sum(x)))) %>%
unnest(cols = everything()) %>%
mutate(product = rep(1:3, nrow(.)/3)) %>%
pivot_wider(values_from = sum_Product,
names_from = product,
names_prefix = "share_product")
which gives the correct result:
# A tibble: 6 x 7
sysRespNum product1 product2 product3 share_product1 share_product2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 23.8 13.5 24.3 0.386 0.219
2 2 30.1 16.0 11.4 0.523 0.278
3 3 18.2 11.0 20.7 0.365 0.221
4 4 30.1 19.8 19.6 0.433 0.285
5 5 8.92 30.8 24.3 0.139 0.481
6 6 23.8 11.1 21.7 0.420 0.197
# … with 1 more variable: share_product3 <dbl>
However, it feels unnecessary complicated to unnest everything and then reshape with pivot_wider.
So a) is there a more elegant way to calculate my share variables and b) is there a more elegant/shorter/less verbose way of reshaping a list column into multiple vector columns?
It is easier to do this
rowSumsi.e. divide the 'product1' by therowSumson the columns that starts with key word 'product'. Instead of doingrowwisewithc_across, this is vectorized and should be fast as wellNOTE: There is a mixing of
base Rcode (apply) and the tidyverse option withacrosswhich doesn't seem to be the optimal wayIf we need to do this for all the 'product' columns, create a
sumcolumn first withmutateand then useacrosson the columns that starts with 'product' to divide the column by 'Sum_col'-output
Or using
base R