I have a cuDF dataframe with following columns:
columns = ["col1", "col2", "dt"]
The (dt) in the form of datetime64[ns].
I would like to write a UDF to apply to each group in this dataframe, and get max of dt for each group. Here is what I am trying, but seems like numba doesn't support the datetime64[ns] values in UDFs.
def f1(dt, out):
l = len(dt)
maxvalue = dt[0]
for i in range(cuda.threadIdx.x, l, cuda.blockDim.x):
if dt[i] > maxvalue:
maxvalue = dt[i]
out[:0] = maxvalue
gdf = df.groupby(["col1", "col2"], method="cudf")
df = gdf.apply_grouped(f1, incols={"dt": "dt"}, outcols=dict(out=numpy.datetime64))
Here is the error I get:
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x7effda063510>)
[2] During: typing of call at <string> (10)
I have similar functions, which work fine with integers and floats. Does it mean that numba doesnt support datetimes?
Apply_groupswon't give you what I think you're after, which is groupby on maxdt. You needed to useaggswith max ondt. cudf's groupby functions would have done the rest. To get your values indatetime64[ms], you useastype(), and save it back to the dataframe (very fast). See my example:dtcolumn values would be formatted to between 0.1-40 milliseconds as nanoseconds from Jan 1st, 1970, giving you a print out of