concurrent.futures code is failing with "BrokenProcessPool"

116 Views Asked by At

Hello Stack Overflow community,

I am encountering a 'BrokenProcessPool' error while using 'concurrent.futures.ProcessPoolExecutor'. The error indicates that a process in the process pool was terminated abruptly while the future was running or pending, and I am having difficulty understanding the root cause.

The relevant code is as follows:

from concurrent.futures import ProcessPoolExecutor
import os
import pickle
from tqdm import tqdm
import numpy as np
from numpy.linalg import norm
from tensorflow.keras.preprocessing import image

# Function to extract features in parallel using ProcessPoolExecutor
def extract_features_parallel(file_path):
    # Call the extract_features function with the global 'model'
    return extract_features(file_path, model)

# Create an empty list to store feature vectors
feature_list = []

# Process files in small batches (e.g., 200 files)
batch_size = 200

# Use ProcessPoolExecutor to initiate parallel processes
with ProcessPoolExecutor() as executor:
    # Iterate over files in batches, tracking progress with tqdm
    for batch_start in tqdm(range(0, len(file_names), batch_size)):
        batch_files = file_names[batch_start:batch_start + batch_size]

        # Apply extract_features_parallel function in parallel using executor.map
        batch_features = list(executor.map(extract_features_parallel, batch_files))

        # Remove None values from the list
        batch_features = [x for x in batch_features if x is not None]

        # Extend the global list with the obtained feature vectors
        feature_list.extend(batch_features)

# Specify the output directory path
output_directory = '/content/drive/MyDrive/ara_proje/pklb5'

# Save the obtained feature vectors to the specified path
output_file_path = os.path.join(output_directory, 'embeddings.pkl')
pickle.dump(feature_list, open(output_file_path, 'wb'))

# Save the file names to the specified path
output_file_names_path = os.path.join(output_directory, 'filenames.pkl')
pickle.dump(file_names, open(output_file_names_path, 'wb'))

The error message is as follows:

  0%|          | 0/595 [00:00<?, ?it/s]
---------------------------------------------------------------------------
BrokenProcessPool                         Traceback (most recent call last)
<ipython-input-16-63a9ec49812d> in <cell line: 19>()
     23 
     24         # extract_features_parallel fonksiyonunu paralel olarak uygula
---> 25         batch_features = list(executor.map(extract_features_parallel, batch_files))
     26 
     27         # None değerleri temizle

4 frames
/usr/lib/python3.10/concurrent/futures/process.py in _chain_from_iterable_of_lists(iterable)
    573     careful not to keep references to yielded objects.
    574     """
--> 575     for element in iterable:
    576         element.reverse()
    577         while element:

/usr/lib/python3.10/concurrent/futures/_base.py in result_iterator()
    619                     # Careful not to keep a reference to the popped future
    620                     if timeout is None:
--> 621                         yield _result_or_cancel(fs.pop())
    622                     else:
    623                         yield _result_or_cancel(fs.pop(), end_time - time.monotonic())

/usr/lib/python3.10/concurrent/futures/_base.py in _result_or_cancel(***failed resolving arguments***)
    317     try:
    318         try:
--> 319             return fut.result(timeout)
    320         finally:
    321             fut.cancel()

/usr/lib/python3.10/concurrent/futures/_base.py in result(self, timeout)
    456                     raise CancelledError()
    457                 elif self._state == FINISHED:
--> 458                     return self.__get_result()
    459                 else:
    460                     raise TimeoutError()

/usr/lib/python3.10/concurrent/futures/_base.py in __get_result(self)
    401         if self._exception:
    402             try:
--> 403                 raise self._exception
    404             finally:
    405                 # Break a reference cycle with the exception in self._exception

BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

Despite my research, I am having difficulty understanding the cause of the error. My code uses parallel processes to extract feature vectors from a specific dataset, and I am encountering this error.

What did you try and what were you expecting?

I have attempted the following troubleshooting steps to address the BrokenProcessPool error:

  1. Memory Usage Check: I carefully monitored the memory usage to ensure that the processes in the pool were not exceeding the allocated memory limits.

  2. Debugging extract_features_parallel: I expanded the extract_features_parallel function with additional debugging statements and error handling to gain insights into the specific point of failure. However, despite these efforts, I couldn't pinpoint the exact cause of the error.

  3. Thread Safety Considerations: Considering the parallel nature of the code, I paid attention to thread safety concerns and utilized appropriate synchronization mechanisms to prevent potential conflicts.

Expectation: I expected that the steps mentioned above would help identify and resolve the BrokenProcessPool error. However, despite these efforts, the issue persists, and I am seeking assistance from the Stack Overflow community to understand the root cause and find a resolution.

This additional context will provide more information to those who might help you on Stack Overflow.

0

There are 0 best solutions below