I have a dataset with 1000 distinct id, lets call them id_ref.
On the same dataset, i have a lot of other id (lets call them id_to_sample). The id_to_sample can be associeted to multiple id_ref.
I want for each id_ref, 4 other id_to_sample . Moreover, i want id_to_sample to be selected uniquly and randomly.
In conclusion, we will have a dataset with 4000 rows, with 1000 distinct id_ref and 4000 distinct id_to_sample.
So far i have tired this method:
sel distinct id_ref, id_to_sample
qualify row_number() over (partition by id_ref order by randommm )<= 4
from
(sel *, random(1,10000) as randommm
from dataset)t ;
Any idea? Thanks for helping!