I need to calculate some weights for my large data frame based on two different variables. Let's say they are x = c("a","b","c","d","e") and y = c("v","w","x","y","z"). There are 25 different combinations of these two variables.
The weight I will use depends on how frequently each level of x and y appear in my data frame. So it could be something like "a" shows up 34% of the time, "b" shows up 12% of the time, etc. Let's say I have saved the values of those proportions to x_prop and y_prop. My weight is calculated by taking the proportion of the level of x divided by the level of y.
My question is, is there a better, more efficient way to go about this than to brute force it with something like:
weights <- c()
for (i in 1:nrow(df)){
if (df$x[i] == "a" & df$y[i] == "v"){weights[i] <- (x_prop[1] / y_prop[1])}
else if (df$x[i] == "a" & df$y[i] == "w"){weights[i] <- (x_prop[1] / y_prop[2])}
...
else if (df$x[i] == "e" & df$y[i] == "z"){weights[i] <- (x_prop[5] / y_prop[5])}
}
I actually need to do this for multiple combinations of variables. Doing this for a single pair results in 25 lines of fairly redundant code with these if statements and I can't help but think that there's a lot more efficient way to do this.
Any advice?
Edit: my data frame looks like the following:
df <- data.frame(x = c("a", "a", "c", "e", "e", "b", "a", "d"),
y = c("v", "w", "z", "x", "x", "x", "y", "v"))
There are other variables in there also (and in reality, my data frame has thousands of data points), but what I need to do is take an action on another variable based on every possible combination of x and y.
One way to do this would be to make a lookup table with all the possible x/y combinations and their respective weights.
Or we could skip the lookup table and make it two joins. Either way should be quite fast, and is pretty legible.
Result