Making an apply function faster in Python

331 Views Asked by At

I am running the following code on about 6 million rows. It's so slow and never ends.

df['City'] = df['POSTAL_CODE'].apply(lambda x: nomi.query_postal_code(x).county_name)

It assigns a corresponding city to each postal code. When I run it on a slice of dateset(e.g, 1000 rows) it works well. But running the code on the whole data never gives me any output.

Can anyone modify the code to make it faster?

Thank you!

1

There are 1 best solutions below

2
DejaVuSansMono On
!pip3 install multiprocess

from multiprocess import Pool

def parallelize_dataframe(data, func, n_cores=4):
       data_split = np.array_split(data, n_cores)
       pool = Pool(n_cores)
       data = pd.concat(pool.map(func, data_split))
       pool.close()
       pool.join()
       return data


df['City'] = parallelize_dataframe(df['POSTAL_CODE'], lambda x: nomi.query_postal_code(x).county_name, 4)