I have a large dataset with around 25L rows, where this function "status" is applied. Its a flagging procedure. Inside the fn, operations are vectorised and apply functions are used. c1-c4 are the columns in my data. Still it takes about 5-6 hrs to execute the fn.
status(mydata)
status <- function (x) {
x<- subset(x, x$RECORD_TYPE != "INPUT")
x$c1<- as.character(x$c1)
x$c2 <- as.factor(x$c2)
x$c3 <- as.factor(x$c3)
return ( data.frame(cbind(
tapply(x$c2, x$c4,
function (x) ifelse (!(any(x=="BAD")), "G", sum(x== "BAD"))) ,
tapply(x$c2D, x$c4,
function (x) sum (x== "NEG")) )))
}
Is there any way to further speed up the fn. I work in a server which has 16 cores. So i believe it can be further sped up.
Perhaps a
data.tableapproach would be faster than trying to parallelize your code, but I would need a sample of your data to make sure this answer addresses your question: