why adding if __name__ == '__main__': does not fix the problem in my case

168 Views Asked by At

I am using AutoSKlearn in Python

The code works fine but when I change parameter for n_jobs = -1 that cause this error

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
/usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 12 leaked semaphore objects to clean up at shutdown

I googled and found a solution

https://github.com/automl/auto-sklearn/issues/996

The solution states that using

if __name__ == '__main__':

should fix the problem

I did that but still having the same error

Am I using it in a wrong way?

Can someone advise if I am setting that line correctly and how I should use it?

Here is my code:

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
import pyodbc 
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score, precision_score, recall_score
import datetime
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import cross_val_predict
#import winsound
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import AdaBoostClassifier
import time
import autosklearn.classification

if __name__ == '__main__':            

    df = pd.read_csv("c:\\my.csv")
    
    X = df.drop(Code, axis=1, errors='ignore')
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    mdl = autosklearn.classification.AutoSklearnClassifier(
        time_left_for_this_task=60*5,
        per_run_time_limit=30*1,
        n_jobs=-1,
        memory_limit = 1024 * 10,
        initial_configurations_via_metalearning=0,
        smac_scenario_args={'runcount_limit': 50},    )
    
    
    mdl.fit(X_train,y_train)
    y_pred=mdl.predict(X_test)    
0

There are 0 best solutions below