I have a data preprocessing code that consists in a list of perfectly parallelizable tasks. Is there a better way to leverage ray on a single multi-core machine than the floowing?
import ray
import pandas as pd
df = pd.csv_read('file.csv')
ray.init(num_cpus=1)
df_ref = ray.put(df)
@ray.remote
def pow2(data):
pows = []
for i in range(len(data)):
pows.append(data.iloc[i, 'values']**2)
return daily_avg
@ray.remote
def pow3(data):
pows = []
for i in range(len(data)):
pows.append(data.iloc[i, 'values']**3)
return daily_avg
result_pow2 = pow2.remote(df_ref)
result_pow3 = pow3.remote(df_ref)
pow2_result, pow3_result = ray.get([result_pow2, result_pow3])
df["pow2"] = pow2_result
df["pow3"] = pow3_result
print(df)
Will the tasks be parallelized correctly in this way? Would the first one block the execution of the second?