What's the most efficient way of splitting a dataframe using over 2000 unique elements in a character vector

77 Views Asked by At

The problem is based around splitting the dataframe using the split() function which returns a list object, theoretically what I would like to do is split the dataframe by an id column which contains over 2000 unique records, doing this however , seems to create memory issues and although this can be computed, I can neither access in the GUI or reference it in R terminal/Rstudio. Alternatively I've been looking into the ff package, but unsure of whether it'll will work and am currently looking for new approaches to the problem.

I've tried:
1) The split() function by id
2) The split() function by a different smaller length character vector
3)

After Splitting: I would like to combine all the transactional information from multiple rows into one rows, i.e one row representing a complete transaction (in a monthly/yearly period
So moving from a granular space to a less granular space via aggregation.

Original Data (commas are delimiters)

Bob, cat, dog, house,day 1
Bob,cat, dog,house ,day 2
Bob,dog , chair ,house ,day 3

Expected Outcome:

Bob, cat cat cat , dog dog chair , house,house house

Alternatively (an encoded method would look like this):

cat = x
dog = y
chair = a
house = b
Bob , 3x,2y + a, 3b

Reproducible Code Block

    L3 <- LETTERS[1:3]

         fac <- sample(L3, 10, replace = TRUE)
            ids <- c("Bob","John") ## ideally I would have about 100k  unique ids ( 100,000)
            d <- data.frame(x = ids, y = 1:10, fac = fac))
               d2 <- split(d$x)
               d2
0

There are 0 best solutions below