I'd Like to create a simple randomization of a dataset. The goal is to have 500 in treatment and 500 in control. This question is about Stata efficiency: I want to do it in one line.
I can do it in one line with imbalanced groups or three lines with perfect balance.
One line:
clear all
set obs 1000
//one line
gen treatment = mod(floor(runiform() * 1000),2)
This is most likely imbalanced.
Three lines:
gen rand_n = runiform()
sum (rand_n),d
gen treatment_again = rand_n <= r(p50)
clunky, terrible, you can't even bysort in a single line like this!
I want to do this in one line, maybe two.
Why? Because Stata.
Since
splitsampleis precluded (it is slow), there are two options.First, you can repackage your clunky code into a program on the fly. I am not sure if that counts as a solution in your mind, but is a good strategy if you have to sample multiple times.
Second, use
egenmore(short for extended generate).egenmoreis usually where one-line solutions to such problems can be found. You will need to install it withssc install egenmoreas it is a community-contributed command.Here's an example of all three producing balanced groups of 500: