dask cudf has no access to map_partitions

23 Views Asked by At

I tried to create a dask_cudf dataframe but got an error.

import dask_cudf
import cudf


# Example pandas DataFrame with a datetime string column
pdf = pd.DataFrame({'datetime_str': ['2024-03-19 12:00:00', '2024-03-19 10:00:00', '2024-03-19 11:00:00']})

# Convert the pandas DataFrame to a cuDF DataFrame
cdf = cudf.from_pandas(pdf)

# Convert the cuDF DataFrame to a Dask cuDF DataFrame
ddf = dask_cudf.from_cudf(cdf, npartitions=2) # error 

I got an error:

AttributeError: DataFrame object has no attribute map_partitions

I found that

cudf.core.dataframe.DataFrame # no map_partitions
dask_cudf.DataFrame.map_partitions 
dask_cudf.core.map_partitions
dask_cudf.core.DataFrame.map_partitions
dask.dataframe.map_partitions

How to make "dask_cudf.from_cudf" access map_partitions ? thanks

1

There are 1 best solutions below

0
Dmitry On

According to documentation:

dask_cudf.from_cudf is a thin wrapper around dask.dataframe.from_pandas()

And the first parameter data expected to be pandas.DataFrame or pandas.Series.

So the first option is:

# Sample data frame
pdf = pd.DataFrame({
    'datetime_str': [
        '2024-03-19 12:00:00', 
        '2024-03-19 10:00:00', 
        '2024-03-19 11:00:00']})

# Create Dask-cuDF DataFrame
ddf = dask_cudf.from_cudf(pdf, npartitions=2)

The second option is:

For on-disk data that are not supported directly in Dask-cuDF, we recommend using Dask’s data reading facilities, followed by calling from_dask_dataframe() to obtain a Dask-cuDF object.