How can I evenly divide records into N groups based on the values?

75 Views Asked by At

For a table as follows, how can I divide these records evenly into 3 groups based on the value of “factor_value”?

sym    date       factor_value
------ ---------- ------------
100000 2022.04.27 1           
100001 2022.04.27 2           
100002 2022.04.27 3           
100003 2022.04.27 4           
100004 2022.04.27 5           
100005 2022.04.27 6           
100006 2022.04.27 7           
100007 2022.04.27 8           
100008 2022.04.27 9           
100009 2022.04.27 10          
100010 2022.04.28             
100000 2022.04.28             
100001 2022.04.28             
100002 2022.04.28 3           
100003 2022.04.28 4           
100004 2022.04.28 5           
100005 2022.04.28 6           
100006 2022.04.28 7           
100007 2022.04.28 8           
100008 2022.04.28 9    

1

There are 1 best solutions below

0
FFF On

This can be implemented by DolphinDB functions cutPoints and asof.

sym=take(string(100000..100010),20)
date=sort(take(2022.04.27..2022.04.28,20))
factor_value= 1..10 join take(int(),3) join 3..9
tb= table( sym, date, factor_value)
select *,asof(cutPoints(int(factor_value*100000),3),factor_value*100000)+1 as factor_quantile from tb context by date  csort  factor_value having  size(distinct(factor_value*100000))>3 

First, use contexy by with csort to sort the column factor_value. Then allocate the records into 3 groups evenly with cutPoints. asof returns the grouping number for each element in the group.

output:

sym    date       factor_value factor_quantile
------ ---------- ------------ ---------------
100000 2022.04.27 1            1              
100001 2022.04.27 2            1              
100002 2022.04.27 3            1              
100003 2022.04.27 4            1              
100004 2022.04.27 5            2              
100005 2022.04.27 6            2              
100006 2022.04.27 7            2              
100007 2022.04.27 8            3              
100008 2022.04.27 9            3              
100009 2022.04.27 10           3              
100010 2022.04.28              1              
100000 2022.04.28              1              
100001 2022.04.28              1              
100002 2022.04.28 3            1              
100003 2022.04.28 4            2              
100004 2022.04.28 5            2              
100005 2022.04.28 6            2              
100006 2022.04.28 7            3              
100007 2022.04.28 8            3              
100008 2022.04.28 9            3