Consider empirically estimating the conditional distribution discrete in both X and Y,
Pr(Y|X)
Both variables have been mapped to integer sets such that
X in {1, ..., N_X} and Y in {1, ..., N_Y}
I have a dataframe of observations obs, such that obs$x[t] and obs$y[t] are my observed X and Y values for event t.
My question then is, what is the most efficient way to convert obs into a matrix F containing the empirical distributions such that
F[i,j] = sum((obs$x == i) & (obs$y == j))/sum(obs$x == i)
Of course I can use a double for loop for i in (1:N_X) and j in (1:N_Y) but I'm looking for the most efficient way.
here is a method using
data.tablewhich probably can be optimized furtherwould love to learn a faster method to calculate this