How to use ddply or dplyr to evaluate a multivariable function with unvectorized inputs against a data frame?

Question

How to use ddply or dplyr to evaluate a multivariable function with unvectorized inputs against a data frame?

292 Views Asked by wdkrnls At 23 June 2015 at 20:14

I'm trying to run a numerical simulation across a range of points from a data set created with expand grid. I'd like to use plyr or dplyr for this if possible. However, I don't understand the syntax.

Is there a small perturbation on the code below that applies the values of x and y individually against f?

f <- function(x, y) {
    A <- data_frame(a = x*runif(100) - y)
    B <- data_frame(b = A$a - rnorm(100)*y)
    sum(A$a) - sum(B$b)
}

X <- expand.grid(x = 1:10, y = 2:8)
X %>% mutate(z = f(x, y))

I had hoped ddply might make this easier.

EDIT: This seems to behave as intended:

 X %>% ddply(.(x, y), transform, z = f(x, y))

Original Q&A

There are 1 best solutions below

**Gregor Thomas** · Accepted Answer · 2015-06-23T22:00:16.057000

Let's rewrite your function to do the same thing without the data_frame calls, just using vectors will be faster:

f <- function(x, y) {
    a = x * runif(100) - y
    b = a - rnorm(100) * y
    sum(a) - sum(b)
}

Since you want to apply this to every row, you could do it with plyr or dplyr. These tools are made for "split-apply-combine", where you you split a data frame into pieces by some grouper, do something to each piece, and put it back together. You want to something to every individual row, so we set both x and y as grouping variables, which works because a combination of x and y uniquely defines a row:

# plyr
ddply(X, .(x, y), plyr::mutate, z = f(x, y))

# dplyr
group_by(X, x, y) %>% dplyr::mutate(z = f(x, y))

For both plyr and dplyr, the mutate function is used because you want to add a column to an existing data frame, keeping the same number of rows. The other common function to use is summarize, which is used when you want to condense groups that have multiple rows into a single summary row. mutate is very similar to base::transform.

There is really no advantage to using plyr for data frame manipulation, dplyr is faster and most people think easier to understand. It really shines when you have more complex manipulations and are using groups rather than individual rows. For individual rows, the base function mapply works well:

X$z = mapply(f, X$x, X$y)

(thanks to @jeremycg in the comments). You can use dplyr but there's no reason to do so in this case.

How to use ddply or dplyr to evaluate a multivariable function with unvectorized inputs against a data frame?

There are 1 best solutions below

Related Questions in R

Related Questions in PLYR

Trending Questions

Popular # Hahtags

Popular Questions