I am trying to calculate how long an item has been active in a specific state with KQL. Currently I am using window functions together with partitioning. This works well when I need to calculate the total "time-in-state" over a period, but the issue arises when trying to break it down per day (or any arbitrary granularity).
Input:
let inputData=datatable(id:string, status: string, timestamp: datetime) [
"id1","P",datetime(2024-03-12T05:30:15),
"id1","F",datetime(2024-03-14T10:10:00),
"id2","P",datetime(2024-03-12T05:30:15)
];
let startDate=datetime(2024-03-12T00:00:00);
let endDate=datetime(2024-03-15T00:00:00);
Applying the following query yields the total time each id has spent in each state:
inputData
| partition hint.strategy=native by id
(
order by timestamp asc
| extend tsDiff = min_of(endDate, next(timestamp)) - timestamp
| extend pTime = iif(status == "P", tsDiff, timespan(0))
| extend fTime = iif(status == "F", tsDiff, timespan(0))
)
| summarize totalPTime=sum(pTime), totalFTime=sum(fTime) by id
Results:
id totalPTime totalFTime
id1 2.04:39:45 13:50:00
id2 2.18:29:45 00:00:00
Now I'm stumped on how to break this down on a day-to-day basis. Trying to use for example make-series gives:
inputData
| partition hint.strategy=native by id
(
order by timestamp asc
| extend tsDiff = min_of(endDate, next(timestamp)) - timestamp
| extend pTime = iif(status == "P", tsDiff, timespan(0))
| extend fTime = iif(status == "F", tsDiff, timespan(0))
)
| make-series
totalP=sum(pTime),
totalF=sum(fTime) on timestamp from startDate to endDate step 1d by id
Results:
id totalP totalF timestamp
id1 ["2.04:39:45","00:00:00","00:00:00"] ["00:00:00","00:00:00","13:50:00"] ["2024-03-12","2024-03-13","2024-03-14"]
id2 ["2.18:29:45","00:00:00","00:00:00"] ["00:00:00","00:00:00","00:00:00"] ["2024-03-12","2024-03-13","2024-03-14"]
What I would need the results to be is at a maximum 1 day and the overflow onto the next day if that makes sense. E.g:
id totalP totalF timestamp
id1 ["1.00:00:00","1.00:00:00","04:39:45"] ["00:00:00","00:00:00","13:50:00"] ["2024-03-12","2024-03-13","2024-03-14"]
id2 ["1.00:00:00","1.00:00:00","18:29:45"] ["00:00:00","00:00:00","00:00:00"] ["2024-03-12","2024-03-13","2024-03-14"]
Am I on the right track here or should I utilize some other functionality for this purpose? Any help is greatly appreciated!
I just wanted to say thank you for such an interesting question, it's occupied my mind almost all weekend. At first glance it seemed fairly logical but the more I thought about it and tested some code the more difficult it got! I'm sure you were in the exact same position...
I'm confident someone else will come and give you a much more eloquent and computationally efficient solution, but I do believe you are on the right path with what you were doing. The thing that seemed to be missing is that like most query language KQL isn't good at showing things which aren't there. So my personal preference is to always start with something we know. For your question I took the time ranges you had already available.
There's a good amount of partition here (as you were using), range, scan and dynamic variables.
There are some caveats to this solution, namely in its current format it's only accurate to the minute, we may not need calculate the final status and I'm inferring the status is only ever P or F.
I know I'm not calculating the time in the same way as your query, but hopefully you may be able to tweak it as needed.