I am trying to generate "random" numbers from a uniform distribution inside a CUDA __global__ kernel using two different approaches. The first is using the cuRAND device API, and the second is using thrust. For each approach I have created a different class.
Here is my cuRAND solution:
template<typename T>
struct RNG1
{
__device__
RNG1(unsigned int tid) {
curand_init(tid, tid, 0, &state);
}
__device__ T
operator ()(void) {
return curand_uniform(&state);
}
curandState state;
};
And here is my thrust solution:
template<typename T>
struct RNG2
{
__device__
RNG2(unsigned int tid)
: gen(tid)
, dis(0, 1) { gen.discard(tid); }
__device__ T
operator ()(void) {
return dis(gen);
}
thrust::default_random_engine gen;
thrust::uniform_real_distribution<T> dis;
};
The way I use them is the following:
template<typename T> __global__ void
mykernel(/* args here */)
{
unsigned int tid = blockIdx.x * blockDim.x + threadIdx.x;
RNG1<T> rng(tid);
// or
RNG2<T> rng(tid);
T a_random_number = rng();
// do stuff here
}
Both of them work but the cuRAND solution is much slower (more than 3 times slower). If I set the second parameter of curand_init (sequence number) to 0, then the performance is the same as that of the thrust solution, but the random numbers are "bad". I can see patterns and artefacts in the resulting distribution.
Here are my two questions:
- Can someone explain to me why the
cuRANDsolution with a non-zero sequence is slower? - How can
thrustbe as fast ascuRANDwith zero sequence, but also generate good random numbers? - While searching on Google, I noticed that most people use
cuRAND, and very few usethrustto generate random numbers inside device code. Is there something I should be aware of? Am I misusingthrust?
Thank you.
Perhaps the performance difference happens because cuRAND and Thrust use different PRNG algorithms with different performance profiles and demands on memory. Note that cuRAND supports five different PRNG algorithms, and your code doesn't give which one is in use.
Thrust's
default_random_engineis currentlyminstd_rand, but its documentation notes that this "may change in a future version". (A comment written after I wrote mine also noted that it'sminstd_rand.)minstd_randis a simple linear congruential generator that may be faster than whatever PRNG cuRAND is using.This was a comment converted to an answer and edited.