use ray to parallelize a list of processing tasks

30 Views Asked by At

I have a data preprocessing code that consists in a list of perfectly parallelizable tasks. Is there a better way to leverage ray on a single multi-core machine than the floowing?

import ray
import pandas as pd

df = pd.csv_read('file.csv')

ray.init(num_cpus=1)

df_ref  = ray.put(df)

@ray.remote
def pow2(data):
    pows = []
    for i in range(len(data)):
        pows.append(data.iloc[i, 'values']**2)
return daily_avg

@ray.remote
def pow3(data):
    pows = []
    for i in range(len(data)):
        pows.append(data.iloc[i, 'values']**3)
return daily_avg

result_pow2 = pow2.remote(df_ref)
result_pow3 = pow3.remote(df_ref)

pow2_result, pow3_result = ray.get([result_pow2, result_pow3])

df["pow2"] = pow2_result
df["pow3"] = pow3_result

print(df)

Will the tasks be parallelized correctly in this way? Would the first one block the execution of the second?

0

There are 0 best solutions below