I have the following table:
| AppId | Id | Direction | Text | Date |
|---|---|---|---|---|
| aaa | 11 | in | hello | 11/2/2021 3:03:00 PM |
| aaa | 22 | out | yes? | 11/2/2021 3:04:00 PM |
| aaa | 33 | in | need help! | 11/3/2021 3:06:00 PM |
| aaa | 44 | in | you there? | 11/4/2021 3:10:00 PM |
| aaa | 55 | out | yes! | 11/5/2021 4:00:00 PM |
| bb | 111 | out | welcome! | 11/6/2021 6:09:00 PM |
| bb | 222 | in | can i call? | 11/6/2021 6:39:00 PM |
| bb | 333 | out | sure. | 11/6/2021 8:22:00 PM |
| cc | 1111 | out | hello? | 11/8/2021 2:22:00 PM |
| cc | 2222 | in | Whatsup! | 11/8/2021 3:22:00 PM |
Id is a primary key, direction is basically telling us if it is an incoming message or outgoing. App id is to identify various ids that are part of a single conversation. I want to identify the first response time after an incoming message as:
| AppId | Id | Direction | Text | Date | ReplyDate |
|---|---|---|---|---|---|
| aaa | 11 | in | hello | 11/2/2021 3:03:00 PM | 11/2/2021 3:04:00 PM |
| aaa | 22 | out | yes? | 11/2/2021 3:04:00 PM | null |
| aaa | 33 | in | need help! | 11/3/2021 3:06:00 PM | 11/5/2021 4:00:00 PM |
| aaa | 44 | in | you there? | 11/4/2021 3:10:00 PM | 11/5/2021 4:00:00 PM |
| aaa | 55 | out | yes! | 11/5/2021 4:00:00 PM | null |
| bb | 111 | out | welcome! | 11/6/2021 6:09:00 PM | null |
| bb | 222 | in | can i call? | 11/6/2021 6:39:00 PM | 11/6/2021 8:22:00 PM |
| bb | 333 | out | sure. | 11/6/2021 8:22:00 PM | null |
| cc | 1111 | out | hello? | 11/8/2021 2:22:00 PM | null |
| cc | 2222 | in | Whatsup! | 11/8/2021 3:22:00 PM | null |
For any 'out' text the reply column is null, but for every 'in' text it brings the next latest timestamp for 'out'. If there is no outcoming text following an incoming text then the 'ReplyDate' for incoming is also null as in the case of 'cc'.
Can this be done in SQL. I am using vertica and it doesn't allow recursive or loop statements so I must achieve this without them.
I have been able to use lead () to get when the next one is outcoming text but not able to fill it for all previous incoming texts.
This is what I have tried so far but this doesn't get me the required result:
with cte as (
select
row_number() over(partition by AppId order by date asc) as rn,
Id,
AppId
Direction,
Text,
Date,
lead(Direction, 1) over(order by Date asc) as lead_direction,
lead(Date, 1) over (order by Date asc) as lead_date,
from table
order by Date desc)
select
Id,
AppId
Direction,
Text
Date,
case when Direction = 'Out' then null
when lead_direction = null then null
when rn <> 1 and Direction = 'In' and Direction = lead_direction then null
when rn <> 1 and Direction = 'In' and Direction <> lead_direction then lead_date
end as ReplyDate
from cte
Any help will be greatly appreciated.
Took a few attempts, but now I think I got it. The first CTE in the
WITHclause is not part of the final query - it's just putting your original input into a self contained demo query.The real query, and the real
WITHclause starts after that.As the closing 'out' row for each 'in' row can come one, two, or several rows later, this can be solved with a behavioural pattern: one or more 'in' rows, followed by one 'out' row. That's what the
MATCH()clause is for. Only rows fulfilling the pattern are returned in the query containing the clause.The depending function
PATTERN_ID()returns the ordinal number of the found pattern within thePARTITION BY... ORDER BYexpression.GROUPing by the
PARTITION BYcolumns and thePATTERN_IDwill help me get the last timestamp of the pattern, which I need for thereplyts(I changed the column names to avoid reserved words likeDATEorTEXT).At the end, I just need to left join the
indataCTE with the query containing theMATCH()clause, on equality of theidcolumn and thedirectionbeing equal to 'in', and that second query with the grouping query.I left the intermediate result as an illustration of the mechanism within the CTE expressions ...