I try to use multi-threads with lock-free queues to do the best performance to count prime numbers without editing the naive prime check function and in a maximum of only 1.8MB of RAM space.
On the first try, I did a multi-thread but I had a problem with the memory, so I moved to a linked list to reduce memory, and after I read about the lock-free queue.
but I stuck with my attempt to improve the performance.
In the code, I get a very large amount of numbers and I need to count how many prime numbers from stdin.
My question is: how I can reduce the time to best performance?
Second Attempt:
#include <stdio.h>
#include <stdbool.h>
#include <pthread.h>
#include <stdatomic.h>
#define MAX_ARRAY_SIZE 500000
int array[MAX_ARRAY_SIZE];
atomic_int shared_counter = ATOMIC_VAR_INIT(0);
int total_primes = 0;
// Function to check if a number is prime
bool isPrime(int n)
{
if (n <= 1)
{
return false;
}
for (int i = 2; i * i <= n; i++)
{
if (n % i == 0)
{
return false;
}
}
return true;
}
// Worker function to find prime numbers in the array
void *worker(void *arg)
{
int local_counter = 0;
int index;
while ((index = atomic_fetch_sub(&shared_counter, 1)) > 0)
{
if (isPrime(array[index]))
{
local_counter++;
}
}
return (void *)local_counter;
}
int main()
{
int num;
pthread_t threads[4];
// Read numbers from stdin and store them in the array until end of file or array full
int array_size = 0;
while (scanf("%d", &num) != EOF && array_size < MAX_ARRAY_SIZE)
{
array[array_size++] = num;
}
atomic_store(&shared_counter, array_size);
// Create worker threads
for (int i = 0; i < 4; i++)
{ // Create 4 threads, you may adjust this number based on your system
pthread_create(&threads[i], NULL, worker, NULL);
}
// Join worker threads and accumulate prime counts
for (int i = 0; i < 4; i++)
{
void *result;
pthread_join(threads[i], &result);
total_primes += (int)result;
}
printf("%d total primes.\n", total_primes);
return 0;
}
Your particular problem of counting prime numbers in an array using multiple threads has a minimalist solution which doesn't require anything more than one shared atomic counter.
The algorithm:
fetch_sub(shared_counter, 1) - 1to get the index of the next array element to check for whether it is a prime number.void*).This way, your worker threads contend only on
fetch_subof the shared counter. Contending on one shared atomic counter is as simple and efficient solution as it ever gets.To minimise contention to absolute bare minimum, each worker thread may want to grab not the next 1 index to process, but, say, next 1,000,000 indexes to process.
The ideas I want to convey here are: