Random shuffled value generator?

111 Views Asked by At

Is there an algorithm that can be used as pseudo random value generator as if the sequence was produced by shuffling ? By that I mean that in a set of k values and k generator calls each value is returned exactly once.

The goal is to have a pseudo random generator that has no duplicates.

The shuffling algorithm is well known and documented. It randomly picks a value in the set of not yet returned values.

This is simple and fine when the list of k values left to pick from may be stored in memory. When the set is big, like for instance the whole set of 32 bit values, it is not feasible and optimal.

EDIT: Thank you very much for the answers and comments. I forgot to mention that it should be difficult to predict the sequence of values even when the user managed collect a large set of them.

This is why this question is not a duplicate of this question.

Ciphering the sequence number 0 to 2^32-1 as suggested by @rossum seam to be a promising method.

I have the impression that the use of asymmetric ciphering would be preferable to make it harder to deduce the key from a sequence of values. A solution like this one could probably do the job. I now need to derive the code to do that. :)

2

There are 2 best solutions below

3
rossum On

Use an encryption. Because encryption is one-to-one if you start with a unique number, you will get a unique number out. So, if you encrypt 0, 1, 2, 3, ... with a fixed key you will get a set of unique numbers out. All you have to keep track of is how far you have got in the input sequence, and the key you are using.

For 32 bit numbers, use a 32 bit block cipher. For 64 bit numbers use a 64 bit cipher etc.

For other ranges you could use some form of Format Preserving Encryption, likely cycle walking, for just numbers in a given range.

1
Dave On

Say we want to shuffle n integers, and n is too big to fit in memory. Choose k s.t. 4n/k fits in memory.

Divide the range 0-(n-1) into k consecutive subranges (0-(n/k-1), n/k to 2n/k-1, ...).

Process these subranges in order.

For each subrange, perform Fisher-Yates as if we were performing it for the full 0-(n-1) range, but when you'd swap with something not in memory, store the swap instead. Each swap is a pair of integers, so at the end of each subrange we might have 3x the size of the subrange in memory. As you store swaps, keep together swaps that are in the same remote subrange (e.g. by using a hash from subrange_id => swap).

After building this hash for a given subrange, in turn load each remote subrange that has at least one swap into memory. Parse your swaps and swap as appropriate. Save the remote subrange back to file once processed.

Then, save the local subrange to file and move on to the next subrange.