Kusto user defined aggregate function

86 Views Asked by At

I am looking to create a user defined aggregate function in KQL to be used in a summarize function. I want to calculate the average value of a device sensor, but I need it to include the duration of time when a sensor has a specific value.

I've searched online, in the Microsoft documentation and StackOverflow, but I am unable to find any example that may help.

Note: The code examples in this question are slimmed down to just include the relevant information.

My table setup is as follows:

.create table tempMetrics (
    TimeStamp: datetime,
    Name: string,
    Value: decimal
    )

Let's use this as input data:

.ingest inline into table tempMetrics <|
datetime(2024-01-03 10:00:00),Sensor1,1
datetime(2024-01-03 10:03:00),Sensor1,5
datetime(2024-01-03 10:05:00),Sensor1,3
datetime(2024-01-03 10:06:00),Sensor1,7
datetime(2024-01-03 10:00:00),Sensor2,5
datetime(2024-01-03 10:02:00),Sensor2,6
datetime(2024-01-03 10:05:00),Sensor2,3
datetime(2024-01-03 10:09:00),Sensor2,1

I want to calculate the average value per five minutes, per metric. A simple query to get this would be:

tempMetrics
| summarize Average=avg(Value) by bin(TimeStamp, 5m), Name

However, I want it to be the average over time. So if a sensor measures 1 at the start of the time block, and three minutes in the value changes to 5, the average shouldn't be 3, but 2.6.

Above query gives the following results.

TimeStamp Name Average Should be
2024-01-03T10:00:00Z Sensor1 3 2.6
2024-01-03T10:00:00Z Sensor2 5.5 5.6
2024-01-03T10:05:00Z Sensor1 5 6.2
2024-01-03T10:05:00Z Sensor2 2 2.6

I want to solve this using a user defined aggregate function, because this would keep the query clean.

Does anyone have any experience with this?

3

There are 3 best solutions below

1
Daniel Perez Efremova On

I sugest you to use the range operator to create a time series for each sensor and then summarize with average over time and Name.

tempMetrics
| extend TimeSlot = range(bin(TimeStamp, 5m), bin(TimeStamp + 5m, 5m), 1m)
| summarize Average = avg(Value) by TimeSlot, Name

EDIT: This solution assumes that the time intervals are aligned, and it might need adjustments.

1
Aswin On

Creating a user defined function for finding this average is not possible. You can directly create the logics with the existing functions in Kusto. Below is the code.

let tempMetrics = datatable(TimeStamp:datetime, Name:string, Value:decimal)
[
    datetime(2024-01-03 10:00:00), 'Sensor1', 1,
    datetime(2024-01-03 10:03:00), 'Sensor1', 5,
    datetime(2024-01-03 10:05:00), 'Sensor1', 3,
    datetime(2024-01-03 10:06:00), 'Sensor1', 7,
    datetime(2024-01-03 10:00:00), 'Sensor2', 5,
    datetime(2024-01-03 10:02:00), 'Sensor2', 6,
    datetime(2024-01-03 10:05:00), 'Sensor2', 3,
    datetime(2024-01-03 10:09:00), 'Sensor2', 1,
];
tempMetrics
| partition hint.strategy=native by Name
(
 make-series Value = max(Value) default=long(null) on TimeStamp from datetime("2024-01-03 10:00:00") to datetime("2024-01-03 10:10:00")   step 1m
 | project Name , Value, TimeStamp
)
|  mv-expand TimeStamp, Value=series_fill_forward(Value)
| extend TimeStamp=todatetime(TimeStamp), Value= tolong(Value)
| summarize Average=avg(Value) by bin(TimeStamp, 5m), Name
| where  isnan(Average)==0

This code uses the make-series operator to create a time series of the maximum value for each metric, and then uses mv-expand to expand and fill in any missing values. Finally, it uses summarize to calculate the average value per five minutes, per metric.

Output

TimeStamp Name Average
2024-01-03T10:00:00Z Sensor1 2.6
2024-01-03T10:00:00Z Sensor2 5.6
2024-01-03T10:05:00Z Sensor1 6.2
2024-01-03T10:05:00Z Sensor2 2.6
1
G.Smulders On

I have had contact with a Microsoft Cloud Solution Architect, who is assisting us and he has confirmed that it is not possible to create a user defined aggregate function.

The query is to be used in a Materialized View, so serialization is not possible (order by, partition, etc.). For this reason I was looking into creating a user defined function. As this is not possible, I will look into other possibilities.