Cross multiplication to equalize sample proportions

52 Views Asked by At

I have a larger dataset and below is a subset of that data. The category is the dependent variable and Day_1 and Day_2 are independent variables.

ID <- c("e-1", "e-2", "e-3", "e-8", "e-9", "e-10", "e-13", "e-16", "e-17", "e-20")
Day_1 <- c(0.58, 0.62, 0.78, 0.18, 0.98, 0.64, 0.32, 0.54, 0.94, 0.87)
Day_2 <- c(0.58, 0.65, 0.25, 0.34, 0.17, 0.82, 0.67, 0.39, 0.49, 0.86)
Category <- c(1, 1, 0, 1, 0, 1, 1, 1, 0, 1)

df <- data.frame(ID, Day_1, Day_2, Category)

As the sample sizes of Category 0 & 1 are different (3 - Category 0 and 7 Category 1), I want to perform a cross multiplication. That means repeating all category 0 data points 7 times, and all category 1 data points 3 times, so that both have a new sample size of 7*3. The final data frame should contain all the columns as 'df' but with all the added rows as well.

How I supposed to do this in R?

1

There are 1 best solutions below

0
jay.sf On BEST ANSWER

This might be the wrong approach, as you will increase the overall sample size and thus inflate the t-statistic.

See this small example also with a binary dependent variable. By doubling the sample size (and not changing proportions of "am") you get different results.

summary(glm(am ~ mpg, mtcars, family='binomial'))
#             Estimate Std. Error z value Pr(>|z|)   
# mpg           0.3070     0.1148   2.673  0.00751 **
  
summary(glm(am ~ mpg, rbind(mtcars, mtcars), family='binomial'))
#             Estimate Std. Error z value Pr(>|z|)   
# mpg          0.30703    0.08121   3.781 0.000156 ***

What you want are frequency weights which you derive by dividing population proportions (which in your case are both .5) by sample proportions. You can use mapply for that.

mtcars <- transform(mtcars, 
                    w=mapply(`/`, 
                             c(`0`=.5, `1`=.5), 
                             proportions(table(am)))[as.character(am)])

summary(glm(am ~ mpg, mtcars, weights=w, family='binomial'))
#             Estimate Std. Error z value Pr(>|z|)   
# mpg           0.3005     0.1123   2.676  0.00746 **