How do I transpose observations into variables/columns?

81 Views Asked by At

I've got a data frame that looks something like this:

precinct, race, age, people
1001, black, 18-40, 1
1001, white, 18-40, 2
1001, hispanic, 18-40, 3
1001, asian, 18-40, 4
1001, black, 40 or older, 5
1001, white, 40 or older, 6
1001, hispanic, 40 or older, 7
1001, asian, 40 or older, 8

I want to make it look like this:

precinct, black, white, hispanic, asian, 18-40, 40 or older
1001, 6, 8, 10, 12, 10, 26

I've used dcast

dcast(
  data = mydataframe,
  formula = Precinct ~ race + age,
  fun.aggregate = sum,
  value.var = 'people'
)

but this does not produce my desired result.

1

There are 1 best solutions below

0
akrun On BEST ANSWER

When we create formula with + on the rhs of ~ it creates the combinations between those columns instead of having every single unique element from those columns. In order to have the latter, we may need to melt to long format and then use dcast on the single column (assuming those columns are of the same type)

library(data.table)
dcast(melt(setDT(mydataframe), id.var = c('precinct', 'people')),
    precinct ~ value, fun.aggregate = sum, value.var = 'people')

-output

Key: <precinct>
   precinct 18-40 40 or older asian black hispanic white
      <int> <int>       <int> <int> <int>    <int> <int>
1:     1001    10          26    12     6       10     8

library(dplyr)
library(tidyr)
mydataframe %>% 
  pivot_longer(cols = c(race, age), names_to = NULL) %>% 
  pivot_wider(names_from = value, values_from = people, values_fn = sum)

-output

# A tibble: 1 × 7
  precinct black `18-40` white hispanic asian `40 or older`
     <int> <int>   <int> <int>    <int> <int>         <int>
1     1001     6      10     8       10    12            26

data

mydataframe <- structure(list(precinct = c(1001L, 1001L, 1001L, 1001L, 1001L, 
1001L, 1001L, 1001L), race = c("black", "white", "hispanic", 
"asian", "black", "white", "hispanic", "asian"), age = c("18-40", 
"18-40", "18-40", "18-40", "40 or older", "40 or older", "40 or older", 
"40 or older"), people = 1:8), row.names = c(NA, -8L), 
class = "data.frame")