For every Azure resource in all my subscriptions I need to get the total time resource has been unavailable in a specified timeframe and based on that calculate the percentage of availbility in a given timeframe. I don't want to use the application insights in Azure Monitor because it requires me to check every app individually and I'd like to get all resources at once. I have the following KQL script in graph explorer which returns the time of each availbility state change along with the information whether it changed to available or unavailable.
healthresourcechanges
| where id contains "/providers/Microsoft.ResourceHealth/availabilityStatuses/current"
| project id, resourceId = tostring(properties.targetResourceId), name, type, location, resourceGroup, subscriptionId, timeStamp = tostring(properties.changeAttributes.timestamp), avStatePrev = properties.changes['properties.availabilityState'].previousValue, avStateNew = properties.changes['properties.availabilityState'].newValue
| order by timeStamp desc
I want to take the state changes it gives me, group them by resource, find every change where it becomes unavailable, calculate the time it took from that moment to the next change to available, add that time to the total time the resource has been down and at the end divide the total downtime of each resource by the total time I'm monitoring (for example 7 days). How can the script be modified to achieve that? Is it possible to do in the resource graph explorer? If not then what other tools can I use to achieve that result for all my resources at once?
I tried to group the status changes by resource with the following line:
| summarize count=count() by resourceId, tostring(avStateNew)
but that way it only returns the number of state changes to every status without the timestamp of each change so I can't use it to calculate time between them.
To calculate the total downtime and availability percentage for each resource based on the changes in availability state in the
healthresourcechangestable, below KQL query is used. I have reproduced with sample data inhealthresourcechangestable.projectoperator is used to extract theresourceId,avStatePrev,avStateNew, andtimestampcolumns from thehealthresourcechangestable. Then the results are sorted byresourceIdandtimestampin ascending order. Next, to calculate the duration of each downtime period for each resource,extendis used. The value ofduration_downtime_in_daysis set as the difference between the timestamp of the change where the availability state changed from "Unavailable" to "Available". For other case, we set the value ofduration_downtime_in_daysas 0. Thensummarizeoperator is used to calculate the total downtime and total time for each resource. Finally, the value ofdown_time_percentageis calculated as the total downtime divided by the total time, and the value ofavailable_percentageto 1 minus the downtime percentage.fiddle
Output for sample data