Analyzing Survey data in R after creating raked weights with rake() from the R survey package

325 Views Asked by At

What is the proper way to use the raked weights created by the rake() function in the R survey package?

R documentation gives the following example:

library(survey)
   data(api)
   dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
   pop.types <- data.frame(stype=c("E","H","M"), Freq=c(4421,755,1018))
   pop.schwide <- data.frame(sch.wide=c("No","Yes"), Freq=c(1072,5122))
   dclus1r<-rake(dclus1, list(~stype,~sch.wide), list(pop.types, pop.schwide))
   svymean(~api00, dclus1r)


            mean     SE
     api00 641.23 23.704

But when I use the weights directly, I get a different SE estimate:

 apiclus1$rwt<-weights(dclus1r)
   rakeddesign<-svydesign(id=~dnum, weights=~rwt,  fpc=~fpc, weight=~rwt,data=apiclus1)
   svymean(~api00, rakeddesign)


          mean     SE
   api00 641.23 23.806

The difference is small in this example, but the difference is very large in other data that I have.

1

There are 1 best solutions below

3
Thomas Lumley On

The answer you get after raking is correct and the answer you get by just putting the raked weights into svydesign and not telling it they were raked is wrong (but usually not very wrong).

Using rake adjusts the weights, and also adds information to the survey design object that allows for correct standard error estimation. Raking adds information (from the population totals that you specify) and this information reduces the standard errors relative to what you'd get if you just happened to have that set of weights.

Raking is done by iterated post-stratification. Computing standard errors after post-stratification involves subtracting off the post-stratum means from the residuals that will be used in the standard error estimate. For a raking estimator, this subtraction is done iteratively.

The relevant code (inside survey::svyrecvar) looks like

            else if (inherits(psvar, "raking")) {
                for (iterations in 1:10) {
                  for (margin in psvar) {
                    psw <- attr(margin, "weights")
                    x <- x - psw * apply(x/psw, 2, ave, margin)
                  }
                }
            }

If you are given the raked weights and you know the raking variables, you can tell svydesign that the weights have already been raked with the calibrate.formula option.

Sometimes, with public-use data collected by someone else, you don't have the raking variables and you have to just pretend the raked weights are the sampling weights. When you do that, the standard error estimates become slightly larger than they should be.