Behavior of discard for boost random number generators

294 Views Asked by At

I am wrapping boost random number generators with an adapter class to implement a Monte Carlo routine. When writing unit tests on the member functions of the class, I assumed that the behavior of .discard(unsigned int N) is to draw N random numbers without storing them, thus advancing the state of the rng. The boost code is :

void discard(boost::uintmax_t z)
{
    if(z > BOOST_RANDOM_MERSENNE_TWISTER_DISCARD_THRESHOLD) {
        discard_many(z);
    } else {
        for(boost::uintmax_t j = 0; j < z; ++j) {
            (*this)();
        }
    }
}

which supports my assumption. However, I find that the sequence that results from .discard(1) is not one number different than the same sequence without discard. The code:

#include <iostream>
#include <iomanip>
#include <random>
#include <boost/random.hpp>

int main()
{
    boost::mt19937 uGenOne(1);
    boost::variate_generator<boost::mt19937&, boost::normal_distribution<> > distOne(uGenOne, boost::normal_distribution<>());

    boost::mt19937 uGenTwo(1);
    boost::variate_generator<boost::mt19937&, boost::normal_distribution<> > distTwo(uGenTwo, boost::normal_distribution<>());
    distTwo.engine().discard(1);

    unsigned int M = 10;
    std::vector<double> variatesOne(M);
    std::vector<double> variatesTwo(M);

    for (unsigned int m = 0; m < M; ++m) {
        variatesOne[m] = distOne();
        variatesTwo[m] = distTwo();
    }

    for (unsigned int m = 0; m < M; ++m)
        std::cout << std::left << std::setw(15) << variatesOne[m] << variatesTwo[m] << std::endl;

    return 0;
}

outputs

2.28493        0.538758  
-0.668627      -0.0017866
0.00680682     0.619191  
0.26211        0.26211   
-0.806832      -0.806832 
0.751338       0.751338  
1.50612        1.50612   
-0.0631903     -0.0631903
0.785654       0.785654  
-0.923125      -0.923125

Is my interpretation of how .discard operates incorrect? Why are the two sequences different in the first three outputs, and then identical?

(This code was compiled on msvc 19.00.23918 and g++ 4.9.2 on cygwin, with identical results).

2

There are 2 best solutions below

5
NathanOliver On BEST ANSWER

The issue here seems to be that the engine is not being modified correctly or the distrobution is adding doing some extra work. If we use the engine directly like

int main()
{
    boost::mt19937 uGenOne(1);

    boost::mt19937 uGenTwo(1);
    uGenTwo.discard(1);

    unsigned int M = 10;
    std::vector<double> variatesOne(M);
    std::vector<double> variatesTwo(M);

    for (unsigned int m = 0; m < M; ++m) {
        variatesOne[m] = uGenOne();
        variatesTwo[m] = uGenTwo();
    }

    for (unsigned int m = 0; m < M; ++m)
        std::cout << std::left << std::setw(15) << variatesOne[m] << variatesTwo[m] << std::endl;

    return 0;
}

It produces

1.7911e+09     4.28288e+09
4.28288e+09    3.09377e+09
3.09377e+09    4.0053e+09
4.0053e+09     491263
491263         5.5029e+08
5.5029e+08     1.29851e+09
1.29851e+09    4.29085e+09
4.29085e+09    6.30312e+08
6.30312e+08    1.01399e+09
1.01399e+09    3.96591e+08

Which is a 1 shifted sequence as we expect since we discarded the first output.

So you are correct in how discard works. I am not exactly sure why there is a discrepancy when you do it through the boost::variate_generator. I cannot see why the first three numbers differ but all of the rest of the output matches.

0
tompdavis On

Just to add on an important detail that was in the comments of the previous answer. As @NathanOliver mentioned, the .discard increments the generator, which sends uniforms to the normal distribution, which converts the uniforms to normal distribution. boost::normal_distribution uses the Ziggurat algorithm which is an "acceptance/rejection" type of algorithm. It draws a random uniform, manipulates it, and then checks to see if it is in the desired distribution. If not, it is rejected and a new random uniform.

for(;;) {
        std::pair<RealType, int> vals = generate_int_float_pair<RealType, 8>(eng);
        int i = vals.second;
        int sign = (i & 1) * 2 - 1;
        i = i >> 1;
        RealType x = vals.first * RealType(table_x[i]);
        if(x < table_x[i + 1]) return x * sign;
        if(i == 0) return generate_tail(eng) * sign;
        RealType y = RealType(table_y[i]) + uniform_01<RealType>()(eng) * RealType(table_y[i + 1] - table_y[i]);
        if (y < f(x)) return x * sign;
    }

The crucial point is that if the last if fails then the for loop is started again and the call to generate_int_float_pair will be triggered again. This means the number of times the underlying generator is incremented is not known.

The normal sequences will therefore have different numbers, until the point where the sum of rejections in the sub-sequence is the same, at which point the remaining uniform sequence is identical. This happened at the third position in the example posted in the question. (It actually is a bit more subtle because the underlying generator can be called once or twice in the Ziggurat algorithm, but the essence is the same - once the sequences get in sync they can never produce different variates).