lock-free queue to count prime numbers in c performance

140 Views Asked by At

I try to use multi-threads with lock-free queues to do the best performance to count prime numbers without editing the naive prime check function and in a maximum of only 1.8MB of RAM space.

On the first try, I did a multi-thread but I had a problem with the memory, so I moved to a linked list to reduce memory, and after I read about the lock-free queue.

but I stuck with my attempt to improve the performance.

In the code, I get a very large amount of numbers and I need to count how many prime numbers from stdin.

My question is: how I can reduce the time to best performance?

Second Attempt:

#include <stdio.h>
#include <stdbool.h>
#include <pthread.h>
#include <stdatomic.h>

#define MAX_ARRAY_SIZE 500000 

int array[MAX_ARRAY_SIZE];
atomic_int shared_counter = ATOMIC_VAR_INIT(0);
int total_primes = 0;

// Function to check if a number is prime
bool isPrime(int n)
{
    if (n <= 1)
    {
        return false;
    }
    for (int i = 2; i * i <= n; i++)
    {
        if (n % i == 0)
        {
            return false;
        }
    }
    return true;
}

// Worker function to find prime numbers in the array
void *worker(void *arg)
{
    int local_counter = 0;
    int index;

    while ((index = atomic_fetch_sub(&shared_counter, 1)) > 0)
    {
        if (isPrime(array[index]))
        {
            local_counter++;
        }
    }

    return (void *)local_counter;
}

int main()
{
    int num;
    pthread_t threads[4];

    // Read numbers from stdin and store them in the array until end of file or array full
    int array_size = 0;
    while (scanf("%d", &num) != EOF && array_size < MAX_ARRAY_SIZE)
    {
        array[array_size++] = num;
    }

    atomic_store(&shared_counter, array_size);

    // Create worker threads
    for (int i = 0; i < 4; i++)
    { // Create 4 threads, you may adjust this number based on your system
        pthread_create(&threads[i], NULL, worker, NULL);
    }

    // Join worker threads and accumulate prime counts
    for (int i = 0; i < 4; i++)
    {
        void *result;
        pthread_join(threads[i], &result);
        total_primes += (int)result;
    }

    printf("%d total primes.\n", total_primes);

    return 0;
}
1

There are 1 best solutions below

14
Maxim Egorushkin On

Your particular problem of counting prime numbers in an array using multiple threads has a minimalist solution which doesn't require anything more than one shared atomic counter.

The algorithm:

  1. Load the integers into a plain integer array.
  2. Create a signed atomic counter initialized with the number of elements in the array.
  3. Create the worker threads sharing the array of integers and the atomic counter. Each worker thread maintains its own non-atomic prime number count.
  4. Each worker thread does fetch_sub(shared_counter, 1) - 1 to get the index of the next array element to check for whether it is a prime number.
  5. If the index is non-negative, check whether array element at that index is a prime number and update the prime number counter. Goto 4.
  6. Otherwise, if the index is negative, the worker thread terminates, returning its non-shared prime number count (cast to void*).
  7. The main thread joins the worker threads and sums up their prime number count return values.

This way, your worker threads contend only on fetch_sub of the shared counter. Contending on one shared atomic counter is as simple and efficient solution as it ever gets.

To minimise contention to absolute bare minimum, each worker thread may want to grab not the next 1 index to process, but, say, next 1,000,000 indexes to process.


The ideas I want to convey here are:

  1. One shared atomic counter can be enough to spread task processing between multiple worker threads, and that is as cheap and efficient as it gets. No fancy multi-producer-multi-consumer data structures may be necessary.
  2. Be mindful to not bottleneck on contending updating the shared counter too often. Increasing the chunk size is a simple way to minimize this contention. My arbitrary 1,000,000 chunk size above may not be ideal for this particular problem.