Checking Uniqueness of Shifting Sieve of Eratosthenes

109 Views Asked by At

My code is as follows:

int main(){
    vector<int> primes;
    vector<int> primesSum;
    int tally = 1;
    bool primeFound = false;
    while (true) {
        int i = 0;
        while (!primeFound) {
            primeFound = true;
            tally++;
            for (i = 0; i < (int)primes.size(); i++) {
                if (tally == primesSum[i]) {
                    primeFound = false;
                    primesSum[i] += primes[i];
                }
            }
        }
        primeFound = false;
        primesSum.push_back(tally*2);
        primes.push_back(tally);
    }
}

It seems to generate prime numbers fine but I was wondering if I could increase its efficiency by implementing the rule that all primes are 6n +/- 1 after 2 and 3. That would seem to sacrifice my achievement of initially empty vectors though.

I've seen prime number verifiers that make use of the 6n +/- i rule but not really in sieve programs, likely as it breaks the 2 filter.

Edit: as an aside I would prefer to not use modulo as that is another achievement for this program, unless a modulo is more efficient (time-wise not space-wise) than what I currently have.

2

There are 2 best solutions below

2
RandomBits On BEST ANSWER

Here is some code that I wrote to efficiently generate primes using a sieve that only considers integers that are +-1 mod 6. It is written as a generator using a coroutine library, but the logic is equivalent. It uses a vector of uint64_t's as a bit mask.

There are no modulo's related to computing the primes, but they are used to access the bitmap.

Update

I extracted my original code from the generator and created a simple function that puts the primes in a vector. One method uses the +-1 mod 2 search and the other +-1 mod 6 search. I also extracted the OP's code into a function.

I ran some simple timing measurements on an M1 Max (arm64).

function primes<1M primes<10M primes<100M
sieve_index 12904 ms still running...
sieve_2n 2 ms 30 ms 222 ms
sieve_6n. 1 ms 15 ms 132 ms

The code and build instructions can be found at the GitHub repo cpp-core/so

Code

auto sieve_mod6_prime_seq(int max = int{1} << 20) {
    std::vector<int> primes;
    primes.push_back(2);
    primes.push_back(3);

    auto max_index = max / 3;
    auto bits_per = sizeof(uint64_t) * CHAR_BIT;
    auto nwords = (bits_per + max_index - 1) / bits_per;
    std::vector<uint64_t> words(nwords);

    words[0] |= 1;
    size_t wdx = 0;
    while (wdx < nwords) {
        auto b = std::countr_one(words[wdx]);
        auto p = 3 * (64 * wdx + b) + 1 + (b bitand 1);
        if (b < 64 and p < max) {
            primes.push_back(p);

            for (auto j = p; j < max; j += 6 * p) {
                auto idx = j / 3;
                auto jdx = idx / 64;
                auto jmask = uint64_t{1} << (idx % 64);
                words[jdx] |= jmask;
            }

            for (auto j = 5 * p; j < max; j += 6 * p) {
                auto idx = j / 3;
                auto jdx = idx / 64;
                auto jmask = uint64_t{1} << (idx % 64);
                words[jdx] |= jmask;
            }
        }
        else {
            ++wdx;
        }
    }
    return primes;
}

auto sieve_mod2_prime_seq(int max = int{1} << 20) {
    std::vector<int> primes;

    auto bits_per = 2 * sizeof(uint64_t) * CHAR_BIT;
    auto nwords = (bits_per + max - 1) / bits_per;
    std::vector<uint64_t> words(nwords);
    if (max > 2)
        primes.push_back(2);
    words[0] |= 1;
    size_t idx = 0;
    while (idx < nwords) {
        auto bdx = 2 * std::countr_one(words[idx]) + 1;
        auto p = 128 * idx + bdx;
        if (bdx < 128 and p < max) {
            primes.push_back(p);
            for (auto j = p; j < max; j += 2 * p) {
                auto jdx = j / 128;
                auto jmask = uint64_t{1} << ((j % 128) / 2);
                words[jdx] |= jmask;
            }
        }
        else {
            ++idx;
        }
    }
    return primes;
}

auto sieve_index(int max = int{1} << 20) {
    std::vector<int> primes;
    std::vector<int> primesSum;
    int tally = 1;
    bool primeFound = false;
    while (tally < max) {
        int i = 0;
        while (!primeFound) {
            primeFound = true;
            tally++;
            for (i = 0; i < (int)primes.size(); i++) {
                if (tally == primesSum[i]) {
                    primeFound = false;
                    primesSum[i] += primes[i];
                }
            }
        }
        primeFound = false;
        primesSum.push_back(tally*2);
        primes.push_back(tally);
    }
    return primes;
}

template<class Work>
void measure(std::ostream& os, std::string_view desc, Work&& work) {
    chron::StopWatch timer;
    timer.mark();
    work();
    auto millis = timer.elapsed_duration<std::chrono::milliseconds>().count();
    os << fmt::format("{:>30s}: {} ms", desc, millis) << endl;
}

int tool_main(int argc, const char *argv[]) {
    ArgParse opts
        (
         argValue<'n'>("number", 1000, "Number of primes"),
         argFlag<'v'>("verbose", "Verbose diagnostics")
         );
    opts.parse(argc, argv);
    auto n = opts.get<'n'>();
    // auto verbose = opts.get<'v'>();

    measure(cout, "sieve_index", [&]() { sieve_index(n); });
    measure(cout, "sieve_2n", [&]() { sieve_mod2_prime_seq(n); });
    measure(cout, "sieve_6n", [&]() { sieve_mod6_prime_seq(n); });
    return 0;
}

End Update

You can easily adapt it to construct a vector of primes by replacing each co_yield with vec.push_back.

coro::Generator<uint64_t> sieve_mod6_prime_seq(uint64_t max = uint64_t{1} << 20) {
    co_yield 2;
    co_yield 3;

    auto max_index = max / 3;
    auto bits_per = sizeof(uint64_t) * CHAR_BIT;
    auto nwords = (bits_per + max_index - 1) / bits_per;
    std::vector<uint64_t> words(nwords);

    words[0] |= 1;
    size_t wdx = 0;
    while (wdx < nwords) {
        auto b = std::countr_one(words[wdx]);
        auto p = 3 * (64 * wdx + b) + 1 + (b bitand 1);
        if (b < 64 and p < max) {
            co_yield p;

            for (auto j = p; j < max; j += 6 * p) {
                auto idx = j / 3;
                auto jdx = idx / 64;
                auto jmask = uint64_t{1} << (idx % 64);
                words[jdx] |= jmask;
            }

            for (auto j = 5 * p; j < max; j += 6 * p) {
                auto idx = j / 3;
                auto jdx = idx / 64;
                auto jmask = uint64_t{1} << (idx % 64);
                words[jdx] |= jmask;
            }
        }
        else {
            ++wdx;
        }
    }
    co_return;
}
0
Will Ness On

I must say, the naming of variables and the code structure as posted in the question, with the double while instead of an if, is very non-conducive to the code clarity.

Consequently, I'd re-structure your code as

int main(){
    vector<int> primes;   // primes known so far
    vector<int> mults;    // smallest multiple of each prime, above c
    int c = 1;            // candidates: 2,3,...
    bool composite;
    while (true) {
        composite = false;
        c++;              // 2,3,...
        for (int i = 0; i < (int)primes.size(); i++) {
            if (c == mults[i]) {  // if c is a multiple of ith prime,
                composite = true;   // then it's composite.
                mults[i] += primes[i]; // next multiple of this prime
            }
        }
        if( !composite ) {
            mults.push_back(c*2);   // first multiple of 
            primes.push_back(c);    //   newly found prime
        }
    }
}

Now we can see, first of all, a major algorithmic deficiency in your code: the wrong "data structure" you use to maintain the primes' multiples. Instead of min-heap priority queue which would immediately produce the next known multiple of a prime, you maintain two linear vectors and scan through them for each candidate number. This is bound to have major implications, complexity-wise. And with worse complexity no amount of constant factors improvements will help much.

But putting this aside, on to your question. You enumerate the multiples of each prime p as 2p, 3p, 4p, ....

First step is to switch it to p², p²+p, p²+2p, ....

Next, we notice that for all odd primes, is odd as well and so the enumeration can be changed to p², p²+2p, p²+4p, ..., to enumerate all odd multiples of it.

And what to do with the even ones? How to test for them now? The whole point to this "wheel factorization" optimization is to not do any such tests, but just ignore them. That's why it's an optimization in the first place.

And the way to ignore them is to not generate any even candidates, in the first place -- because we known in advance that all of them aren't primes any way. And that is the essence of the wheel optimization:

int main(){
    vector<int> primes;   // primes known so far
    vector<int> mults;    // smallest multiple of each prime, above c
    int c = 1;            // candidates: 3,5,...
    bool composite;
    while (true) {
        composite = false;
        c+=2;             // 3,5,...
        for (int i = 0; i < (int)primes.size(); i++) {
            if (c == mults[i]) {  // if c is a multiple of ith prime,
                composite = true;   // then it's composite
                mults[i] += 2*primes[i]; // next multiple of this prime
            }
        }
        if( !composite ) {
            mults.push_back(c*c);   // first multiple of 
            primes.push_back(c);    //   newly found prime
        }
    }
}

Next, switching to your desired mod 6 +-1 enumeration, we will ignore the multiples of 2 and 3. But then it means, we'll need to not generate any of them as well. The enumeration of multiples will be m=p²; m+=4; m+=2; m+=4; m+=2; m+=4; m+=2; ... for some, and m=p²; m+=2; m+=4; m+=2; m+=4; m+=2; m+=4; ... for others (you can figure out the details).

And the enumeration of the candidates will just start from 5, and also proceed as c=5; c+=2; c+=4; c+=2; c+=4; c+=2; c+=4; ....

Hopefully you can write this down now in an actual code.

P.S. you can find more details in several of my answers on the matter, e.g. here and here etc..