I have events table in Snowflake, one row per event. status can be failed that implies error or completed that implies error resolution.
For each customer_id, I want to find two timestamps resolved_at and first_failed_at and ultimately calculate the error resolution in days between these two timestamps. How to achieve this?
I find the resolved_at like this but couldn't find first_failed_at.
Note that there can be multiple events failed consecutively.
with windowed as (
select
*,
lag(status) over (partition by customer_id order by created_at) as prev_status,
lead(status) over (partition by customer_id order by created_at) as next_status,
lead(created_at) over (partition by customer_id order by created_at) as next_created_at
from events
)
select
event_id,
customer_id,
status,
created_at,
case
when status = 'failed' and next_status = 'completed' then next_created_at
else null
end as resolved_at
from windowed
Here is the example data and two additional desired columns:
+--------------------------------------+-------------+-----------+---------------------+---------------------+---------------------+
| event_id | customer_id | status | created_at | resolved_at | first_failed_at |
+--------------------------------------+-------------+-----------+---------------------+---------------------+---------------------+
| e7921c9c-aa8f-49bb-83fa-31fa37d629ee | a | completed | 2024-01-16 16:23:31 | | |
| 1d6a5bdf-e254-441d-9574-821844dc6c05 | a | failed | 2024-01-17 19:17:33 | | 2024-01-17 19:17:33 |
| 1a06783a-2ca4-4f68-80f0-7e7c84ac57fd | a | failed | 2024-01-22 18:07:54 | | 2024-01-17 19:17:33 |
| 61ecc8da-1e7f-4b19-92f1-fda93661a3af | a | failed | 2024-01-23 17:54:12 | 2024-01-24 19:34:48 | 2024-01-17 19:17:33 |
| 068e4220-bd57-410a-85f3-1fbb18f86053 | a | completed | 2024-01-24 19:34:48 | | |
| 4ad30d16-ff13-48f3-b315-8a588d869c08 | a | completed | 2024-01-25 17:30:54 | | |
| 82f8c57a-0e80-4ea6-96ad-3b578e5ec68a | a | failed | 2024-01-27 17:45:11 | | 2024-01-27 17:45:11 |
| 17d736d6-40de-4440-a220-e4a0612acb71 | a | failed | 2024-01-28 17:51:15 | 2024-01-29 17:23:31 | 2024-01-27 17:45:11 |
| be657472-69d0-4cd5-82e0-b33434ff213a | a | completed | 2024-01-29 17:23:31 | | |
| 3bce3901-2639-489d-8280-04518a253b33 | b | failed | 2023-12-15 15:06:32 | | 2023-12-15 15:06:32 |
| 99b1bd8b-0ac3-4868-b687-73fc93ef9ddf | b | failed | 2023-12-16 15:07:21 | 2023-12-17 15:10:42 | 2023-12-15 15:06:32 |
| 406b42e6-63c6-402e-9519-87240d007315 | b | completed | 2023-12-17 15:10:42 | | |
| 946e4c25-db5a-44e4-8c29-d7144ff5b03c | c | failed | 2023-12-15 15:06:10 | | 2023-12-15 15:06:10 |
| ba0c96d2-4b68-43e5-99a6-0f30530dd1ad | c | failed | 2023-12-16 15:06:32 | | 2023-12-15 15:06:10 |
+--------------------------------------+-------------+-----------+---------------------+---------------------+---------------------+
See example for SQL Server. I think, this is applicable for snowflake.
Set number for each group or rows, where 0,1 or more 'failed' rows and 0 or 1 'completed' row. Group number is number of 'completed' row for customer_id, ordered by created_ad.
Calculate datetime for 'completed' row for group (null if no sach row), and calculate first 'failed' row.
Query output
Fiddle