I want to modify my dataset in R so that I can conduct a market basket analysis

47 Views Asked by At

I need to modify in R my dataset that looks like a classic table with observations in the row and variables in the column. Specifically, the first column reports a set of transactions identified by an ID code (InvoiceNo)and the second column represents the description of product. I reproduce below the image of the dataset as it appears.

enter image description here

Specifically, my intent is to obtain a table that lends itself to be used for a market basket analysis where by row you have the different transactions (InvoiceNo) and by column the different attributes (Description). In other words I want to get binary data indicating the presence or absence of that specific product (reported in the column Description) in that specific transaction. I reproduce below the image of the dataset as I would like it to look:

enter image description here

1

There are 1 best solutions below

0
JGr On

You can use pivot_wider from tidyr.

Creating a dataset with the same structure as yours:

library(tidyr)
library(tibble)
library(dplyr)

df <- tibble(id = c(1,2,3,4,5,6), description = c('lalalala', 'blob', 'blob', 'foo', 'foo', 'lalalala'))

Applying function:

df %>%
  mutate(for_pivot = 1) %>% # creating a new column for your binary output
  pivot_wider(id_cols = 'id', names_from = 'description', values_from = 'for_pivot') %>% # pivoting
  mutate_all(coalesce, 0) # transforming all NA as 0