How to convert convert a datetime string to timestamp in dask cudf and then sort the dataframe by this column

33 Views Asked by user3448011 At 19 March 2024 at 20:49

I would like to convert a datetime string to timestamp in dask cudf and then sort the dataframe by this column.

Example:

import dask_cudf as ddf
import pandas as pd

# Sample data (replace with your actual data)
cdf = cudf.DataFrame({
    'city': ['Dallas', 'Bogota', 'Chicago', 'Juarez'],
    'timestamp': ['2019-12-29 14:15:08 UTC', '2019-12-30 10:30:15 UTC', '2019-12-31 18:45:30 UTC', '2020-01-01 03:20:45 UTC']
})

# Create a Dask-cuDF DataFrame
dask_df = ddf.from_cudf(cdf, npartitions=2)

def to_timestamp(x):
    import time
    import datetime
    element = datetime.datetime.strptime(x,"%Y-%m-%d %H:%M:%S UTC")
    return datetime.datetime.timestamp(element)

dask_df['timestamp'] = dask_df['timestamp'].map_partitions(to_timestamp, meta=("timestamp", "str"))

dask_df.head()

I got error:

TypeError: strptime() argument 1 must be str, not Series

How can I do this for large dataframe on dask cudf ?

==========update ==========

I have tried this:

   dask_df["timestamp"] = dask_df["timestamp"].map_partitions(to_timestamp, meta=("timestamp", "str"))

and got error:

  TypeError: strptime() argument 1 must be str, not Series

Original Q&A

There are 1 best solutions below

UnicornOnAzur On 19 March 2024 at 21:37

This map_partitions thread seems to cover all the tricks of using map_partitions on a row-by-row basis.

Furthermore, you can refactor your function somewhat. The import statements can be moved outside of the function to save on loading time. You're only using datetime in the function therefore you can skip on importing time. The function could then look like this:

def to_timestamp(x):
    datetime_object = datetime.datetime.strptime(x,"%Y-%m-%d %H:%M:%S UTC")
    timestamp = datetime.datetime.timestamp(element)
    return timestamp

How to convert convert a datetime string to timestamp in dask cudf and then sort the dataframe by this column

There are 1 best solutions below

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in DASK

Related Questions in DASK-DISTRIBUTED

Related Questions in CUDF

Trending Questions

Popular # Hahtags

Popular Questions