I am trying to generate test data using python. Let me know if i can achieve something like this. I will input a number, say 20, i need 20 value of rows to be generated. one column contains range of values for each of the group. Values should be generated within that range. (some doesnt have range of values and only string or values with commas which should be separated) Sample input and expected output below.
Sample input:
VarName Label Score
0 LtoV 0.00<=to>=0.68 20
1 LtoV 0.69<=to>=2 33
2 LtoV 2toHigh 40
3 Age 0to20 36
4 Age 21to40 15
5 Age 41to60 50
6 Indicator A 30
7 Indicator B 40
8 Indicator C 40
9 Oc Code 100,20,30 10
10 Oc Code 5,10,16 20
Output:
when i give value as 20, i need 20 rows to be generated for LtoV with column2 having any values between 0.00 to high and based on the value the corresponding scores should be given. For string column, the values should just be repeated for the 20 rows. For column with values separated as ",", it should separate the values and have each value put in a row. Sameway 20 rows for age also.
SNo VarName column2 Score
1 LtoV 0.00 20
2 LtoV 0.42 20
3 LtoV 1 33
4 LtoV 2.5 40
...
20 LtoV 50 40
`1` Indicator A 30
2 Indicator B 40
3 Indicator C 40
4 Indicator A 30
5 Indicator B 40
6 Indicator C 40
7 Indicator A 30
8 Indicator B 40
....
20 Indicator C 40`
1 Oc Code 100 10
2 Oc Code 20 10
3 Oc Code 30 10
4 Oc Code 5 20
5 Oc Code 10 20
6 Oc Code 15 20
....
20 Oc Code 5 20
generate 20 such values for LtoV and 20 such values for Age and based on the range corresponding value for score.
Is this feasible?
********************
You could extract the lower/upper bound of the range, optionally set up a default low/high if missing (else
dropna). Thensample20 rows per group withreplace=Trueand vectorially generate the random values withnumpy.random.uniform:Example output:
Intermediate: