I have a larger dataset and below is a subset of that data. The category is the dependent variable and Day_1 and Day_2 are independent variables.
ID <- c("e-1", "e-2", "e-3", "e-8", "e-9", "e-10", "e-13", "e-16", "e-17", "e-20")
Day_1 <- c(0.58, 0.62, 0.78, 0.18, 0.98, 0.64, 0.32, 0.54, 0.94, 0.87)
Day_2 <- c(0.58, 0.65, 0.25, 0.34, 0.17, 0.82, 0.67, 0.39, 0.49, 0.86)
Category <- c(1, 1, 0, 1, 0, 1, 1, 1, 0, 1)
df <- data.frame(ID, Day_1, Day_2, Category)
As the sample sizes of Category 0 & 1 are different (3 - Category 0 and 7 Category 1), I want to perform a cross multiplication. That means repeating all category 0 data points 7 times, and all category 1 data points 3 times, so that both have a new sample size of 7*3. The final data frame should contain all the columns as 'df' but with all the added rows as well.
How I supposed to do this in R?
This might be the wrong approach, as you will increase the overall sample size and thus inflate the t-statistic.
See this small example also with a binary dependent variable. By doubling the sample size (and not changing proportions of
"am") you get different results.What you want are frequency
weights which you derive by dividing population proportions (which in your case are both.5) by sample proportions. You can usemapplyfor that.