I need to find difference between sucessive datetimes ... and split the file when I find a time difference of more than 5 minutes. When I subtract periods I get answers in milliseconds and there is also a missing value in the lagged column. If I skip the missing value, then I can't update the dataframe. Not sure how to proceed.
using CSV
using DataFrames
using ShiftedArrays: lag
using Dates
df = CSV.read(IOBuffer("""
date
2024-01-25 19:15
2024-01-25 19:20
2024-01-25 19:25
2024-01-25 21:20
2024-01-25 21:25
"""),DataFrame; dateformat=DateFormat("yy-mm-dd H:M"))
condf=transform(df,[:date,:date] => ((x,y)->Dates.CompoundPeriod.(skipmissing(x-lag(y,1)))) =>:ldate)
It is not fully clear what you want to do with the rows with
missingvalue.If you want to get rid of them then use the
dropmissingfunction before your operation.If you want to keep it it depends on what do you assume that the
missingactually held. The easiest is to probably useImpute.locforImputel.nocbfunction from Impute.jl to fillmissingwith the value that will give you the desired split.If what I write is not clear then please comment (best would be if you would show the input and desired output - including
missingvalues in the example).Some additional cases to consider:
missingvalues?missingvalues?