Pandarallel cannot apply two transformations

57 Views Asked by At

I have a pandas data frame (with ~57 million rows of floats) that I want to undergo two transformations.

These are the two functions to apply the transformations:

def apply_feature_aggregation(df, weights, scale, shift, cores): #This runs without a problem
    t1 = time.time()
    pandarallel.initialize(nb_workers=cores, progress_bar=False) 

    new_df = df[['id']].copy()
    new_df['weight'] = df.parallel_apply(calculate_weight, axis=1, weights=weights, scale=scale, shift=shift)

    t2 = time.time()
    print(f'init weights {t2-t1}')

    return new_df


def apply_weight_scaling(df, cores): #The apply section of it only works if I click run

    pandarallel.initialize(nb_workers=cores, progress_bar=False) # initialize(36) or initialize(os.cpu_count()-1)
    t2 = time.time()
    new_df, second_min, current_max = interval_mapping_preprocessing(df, 'weight')
    t3 = time.time()

    print(f'initial mapping preprocessing finished {t3-t2}')

    new_df['weight'] = new_df.parallel_apply(apply_linear_transformation, axis=1, second_min=second_min, current_max= current_max) #This is not run 


    t4 = time.time()

    print(f'I calculated second weights {t4-t3}') #This is not printed

 

The problem is whenever I'm running my code on PyCharm by clicking execute, the two transformations are applied successfully. But whenever I try to run with nohup, although on top command I can see parallel workers twice, but the second run never ends.

My question is how to run two subsequent transformations? I even tried to have the two transformations on the same wrapper function, but I encountered the same problem.

This is the output I get in nohup:

INFO: Pandarallel will run on 36 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.
init weights 135.39842891693115
INFO: Pandarallel will run on 36 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.
initial mapping preprocessing finished 2.7095065116882324

This is the output when I run it with PyCharm:

INFO: Pandarallel will run on 36 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.
init weights 143.19737672805786
INFO: Pandarallel will run on 36 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.
initial mapping preprocessing finished 2.6010115146636963
I calculated second weights 117.14078521728516

Once the parallel executors of the second function finish with the nohup case, there is only one memory-intensive job and nothing else happens

Thanks.

0

There are 0 best solutions below