I have the following code that selects (4 rows of iris x 1000) *100 and calculates the bias of each column.
library(SimDesign)
library(data.table)
do.call(rbind,lapply(1:100, function(x) {
bias(
setDT(copy(iris))[as.vector(sapply(1:1000, function(X) sample(1:nrow(iris),4)))][
, lapply(.SD, mean), by=rep(c(1:1000),4), .SDcols=c(1:4)][,c(2:5)],
parameter=c(5,3,2,1), #parameter is the true population value used to calculate bias
type='relative' #denotes the type of bias being calculated
)
}))
This takes 1000 samples of 4 rows, calculates the mean by sample #, giving me 1000 means. The bias for the 1000 means is found for each column, and then is done 99 more times giving me a distribution of bias estimates for each column. This is mimicking a random sampling design. However, I also want to do this for a stratified design. So I use splitstackshape's stratified function.
do.call(rbind,lapply(1:100, function(x) {
bias(
setDT(copy(iris))[as.vector(sapply(1:1000, function(X) stratified(iris,group="Species", size=1)))][
, lapply(.SD, mean), by=rep(c(1:1000),4), .SDcols=c(1:4)][,c(2:5)],
parameter=c(5,3,2,1),
type='relative'
)
}))
I would've thought that it is just a matter of swapping out the functions, but I keep on getting errors (i is invalid type (matrix)). Perhaps in future a 2 column matrix could return a list of elements of DT . I think it might be something related to setDT, but I'm not really sure how to fix it. Anybody know where I'm going wrong?
I've split into a couple of functions for you
Load data.table, SimDesign, and splitstackshape
Function to get
nstratified samples of sizesampsizeand return column means of those samplesNow, lets get the distribution of bias across
ysuch iterations of these samplesUsage (using defaults)
Output:
Some comments on what was going wrong above
stratified(iris,group="Species", size=1), you will get a 3 row data.table, because you are effectively selecting one row at random from each of the three Speciessapply(1:1000, function(x)...), you get 5 x 1000 column matrix, where each column is contains 5 lists of length 3 .. Below, I'm showing you what this looks like if you didsapply(1:6, function(x)...)This is not really what you want, because you cannot then
lapplyover these the way you then intended. What you want to do instead is uselapply(1:1000, function(x) ...)to create a list of such 3-row datatables, and then bind them together (after adding anidcolumn to each one).