The problem is based around splitting the dataframe using the split() function which returns a list object, theoretically what I would like to do is split the dataframe by an id column which contains over 2000 unique records, doing this however , seems to create memory issues and although this can be computed, I can neither access in the GUI or reference it in R terminal/Rstudio. Alternatively I've been looking into the ff package, but unsure of whether it'll will work and am currently looking for new approaches to the problem.
I've tried:
1) The split() function by id
2) The split() function by a different smaller length character vector
3)
After Splitting: I would like to combine all the transactional information from multiple rows into one rows, i.e one row representing a complete transaction (in a monthly/yearly period
So moving from a granular space to a less granular space via aggregation.
Original Data (commas are delimiters)
Bob, cat, dog, house,day 1
Bob,cat, dog,house ,day 2
Bob,dog , chair ,house ,day 3
Expected Outcome:
Bob, cat cat cat , dog dog chair , house,house house
Alternatively (an encoded method would look like this):
cat = x
dog = y
chair = a
house = b
Bob , 3x,2y + a, 3b
Reproducible Code Block
L3 <- LETTERS[1:3]
fac <- sample(L3, 10, replace = TRUE)
ids <- c("Bob","John") ## ideally I would have about 100k unique ids ( 100,000)
d <- data.frame(x = ids, y = 1:10, fac = fac))
d2 <- split(d$x)
d2