counting a vector containing random numbers ranging from 0 to 9 using threads

48 Views Asked by At

Problem Description: Description: Implement a program that does the following tasks.

  • Generate a vector of 10^8 positions randomly filled with some digit between 0 to 9 (it can be one or more threads for this);

  • Count and Store how many times each possible digit appeared (that is, 10 variables for the digits 0 to 9);

  • Display in real time the count of how many digits have been found so far;

  • Allow choosing how many threads will perform the counting task, ranging from 1, 2, 5 and 10;

  • Approach on how to organize the threads to perform the search is by the team;

  • Ensure mutual exclusion using the semaphore and lock strategies, comparing the performance of the two for each of the four cases;

  • Repeat each possible case running the experiment 30 times, that is, for the 8 existing cases with the combination number of threads and mutual exclusion strategy;

  • Describe in the document to be delivered the comparison of the average duration of execution in each possible case (showing average time and confidence interval);

import threading
import time
import random

def count_vector(vector):
    # Count the number of elements in the vector
    count = 0
    for i in vector:
        count += 1
    return count

# Create a list of threads with different numbers of threads
threads = [1, 2, 5, 10]

# Generate a list of 10^8 random digits between 0 and 9
vector = [random.randint(0, 9) for _ in range(10**8)]

# Measure the time it takes for each set of threads to count the vector
for num_threads in threads:
    start_time = time.perf_counter()

    # Create a list of threads
    thread_list = []
    for i in range(num_threads):
        t = threading.Thread(target=count_vector, args=(vector,))
        thread_list.append(t)

    # Start the threads
    for t in thread_list:
        t.start()

    # Wait for the threads to finish
    for t in thread_list:
        t.join()

    end_time = time.perf_counter()
    elapsed_time = end_time - start_time
    print(f'{num_threads} threads: {elapsed_time:.2f} seconds')

This is my output Benter image description here

I would like each thread to tell me how many numbers they counted, so I decided to code it this way:

import threading
import random

def count_digits(vector, digit):
    # Count the number of occurrences of the specified digit in the vector
    count = 0
    for i in vector:
        if i == digit:
            count += 1
    return count

# Create a list of threads with different numbers of threads
threads = [1, 2, 5, 10]

# Generate a list of 10^8 random digits between 0 and 9
vector = [random.randint(0, 9) for _ in range(10**8)]

# Create a list of threads to count the digits
thread_list = []
for i in range(10):
    t = threading.Thread(target=count_digits, args=(vector, i))
    thread_list.append(t)

# Start the threads
for t in thread_list:
    t.start()

# Wait for the threads to finish
for t in thread_list:
    t.join()

# Print the results
for i, t in enumerate(thread_list):
    result = t.get_result()
    print(f'Number of {i}: {result}')

But this is the result I am getting: enter image description here

How can I solve? The desired output would look something like this: enter image description here

1

There are 1 best solutions below

0
Alex Bochkarev On

Threads don't return whatever your function returns. They perform computations and that's it. You have to save the results at the end of count_digits function in a shared storage. Something like this:

thread_list = [None] * 10
thread_results = [None] * 10

def count_digits(vector, digit):
    # Count the number of occurrences of the specified digit in the vector
    count = 0
    for i in vector:
        if i == digit:
            count += 1
    thread_results[digit] = count

...

for i in range(len(thread_list)):
    t = threading.Thread(target=count_digits, args=(vector, i))
    thread_list[i] = t

...

for i, result in enumerate(thread_results):
    print(f'Number of {i}: {result}')

However, I want to highlight that your usage of parallelism is rather odd. You won’t see any speed up compared to a single-threaded version of the code since each thread iterates through the whole vector. To achieve speed up you have to assign to each thread only a part of the vector and then aggregate the results.