What is the proper way to use the raked weights created by the rake() function in the R survey package?
R documentation gives the following example:
library(survey)
data(api)
dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
pop.types <- data.frame(stype=c("E","H","M"), Freq=c(4421,755,1018))
pop.schwide <- data.frame(sch.wide=c("No","Yes"), Freq=c(1072,5122))
dclus1r<-rake(dclus1, list(~stype,~sch.wide), list(pop.types, pop.schwide))
svymean(~api00, dclus1r)
mean SE
api00 641.23 23.704
But when I use the weights directly, I get a different SE estimate:
apiclus1$rwt<-weights(dclus1r)
rakeddesign<-svydesign(id=~dnum, weights=~rwt, fpc=~fpc, weight=~rwt,data=apiclus1)
svymean(~api00, rakeddesign)
mean SE
api00 641.23 23.806
The difference is small in this example, but the difference is very large in other data that I have.
The answer you get after raking is correct and the answer you get by just putting the raked weights into
svydesignand not telling it they were raked is wrong (but usually not very wrong).Using
rakeadjusts the weights, and also adds information to the survey design object that allows for correct standard error estimation. Raking adds information (from the population totals that you specify) and this information reduces the standard errors relative to what you'd get if you just happened to have that set of weights.Raking is done by iterated post-stratification. Computing standard errors after post-stratification involves subtracting off the post-stratum means from the residuals that will be used in the standard error estimate. For a raking estimator, this subtraction is done iteratively.
The relevant code (inside
survey::svyrecvar) looks likeIf you are given the raked weights and you know the raking variables, you can tell
svydesignthat the weights have already been raked with thecalibrate.formulaoption.Sometimes, with public-use data collected by someone else, you don't have the raking variables and you have to just pretend the raked weights are the sampling weights. When you do that, the standard error estimates become slightly larger than they should be.