I'm trying to run a numerical simulation across a range of points from a data set created with expand grid. I'd like to use plyr or dplyr for this if possible. However, I don't understand the syntax.
Is there a small perturbation on the code below that applies the values of x and y individually against f?
f <- function(x, y) {
A <- data_frame(a = x*runif(100) - y)
B <- data_frame(b = A$a - rnorm(100)*y)
sum(A$a) - sum(B$b)
}
X <- expand.grid(x = 1:10, y = 2:8)
X %>% mutate(z = f(x, y))
I had hoped ddply might make this easier.
EDIT: This seems to behave as intended:
X %>% ddply(.(x, y), transform, z = f(x, y))
Let's rewrite your function to do the same thing without the
data_framecalls, just using vectors will be faster:Since you want to apply this to every row, you could do it with
plyrordplyr. These tools are made for "split-apply-combine", where you you split a data frame into pieces by some grouper, do something to each piece, and put it back together. You want to something to every individual row, so we set bothxandyas grouping variables, which works because a combination of x and y uniquely defines a row:For both
plyranddplyr, themutatefunction is used because you want to add a column to an existing data frame, keeping the same number of rows. The other common function to use issummarize, which is used when you want to condense groups that have multiple rows into a single summary row.mutateis very similar tobase::transform.There is really no advantage to using
plyrfor data frame manipulation,dplyris faster and most people think easier to understand. It really shines when you have more complex manipulations and are using groups rather than individual rows. For individual rows, the base functionmapplyworks well:(thanks to @jeremycg in the comments). You can use
dplyrbut there's no reason to do so in this case.