I'm struggling to optimize a query which currently uses multiple subqueries. The subqueries run fast individually, but when joined together, are very slow. Below I've outlined a sample set of data and my current query.
Purpose of Query
To display the date different events occurred on an order
Structure of data
There is an Orders table and an Order History table
Orders
| Order | Part | Status |
|---|---|---|
| 1 | A | Planned |
| 2 | B | Released |
| 3 | A | Closed |
| 4 | B | Released |
| 5 | C | Closed |
Order History
| Timestamp | Order | Message | NewStatus |
|---|---|---|---|
| 10:00am | 1 | Created | Planned |
| 10:15am | 2 | Created | Planned |
| 10:16am | 2 | Status Changed | Released |
| 10:20am | 3 | Created | Planned |
| 10:24am | 3 | Status Changed | Released |
| 10:25am | 2 | Status Changed | Planned |
| 10:30am | 4 | Created | Planned |
| 10:35am | 3 | Status Changed | Closed |
| 10:40am | 5 | Created | Planned |
| 10:45am | 4 | Status Changed | Released |
| 10:50am | 5 | Status Changed | Released |
| 10:55am | 2 | Status Changed | Released |
| 11:00am | 5 | Status Changed | Planned |
| 11:05am | 5 | Status Changed | Released |
| 11:15am | 5 | Status Changed | Closed |
| 11:20am | 3 | Status Changed | Released |
| 11:25am | 3 | Status Changed | Closed |
Desired Result
| Order | Part | Status | Created Date | First Release | Latest Release | Closed Date |
|---|---|---|---|---|---|---|
| 1 | A | Planned | 10:00am | |||
| 2 | B | Released | 10:15am | 10:16am | 10:55am | |
| 3 | A | Closed | 10:20am | 10:24am | 11:20am | 11:25am |
| 4 | B | Released | 10:30am | 10:45am | 10:45am | |
| 5 | C | Closed | 10:40am | 10:50am | 11:05am | 11:15am |
Current Query
My current query does a subquery against the history table for each timestamp field (Created, First Release, Latest Release, Closed) and then joins them all up:
SELECT
c.order
,c.part
,c.status
,c.cDate
,fr.frDate
,lr.lrDate
,cd.cdDate
FROM
(SELECT
o.order
,o.part
,o.status
,min(oh.timestamp)
FROM order o
JOIN order_history oh
ON o.order = oh.order
WHERE oh.message = 'Created' AND oh.NewStatus = 'Planned'
group by o.order, o.part, o.status) c
JOIN
(SELECT
o.order
,o.part
,o.status
,min(oh.timestamp)
FROM order o
JOIN order_history oh
ON o.order = oh.order
WHERE oh.message = 'Status Changed' AND oh.NewStatus = 'Released'
group by o.order, o.part, o.status) fr
ON c.order = fr.order and c.part = fr.part and c.status = fr.status
JOIN
(SELECT
o.order
,o.part
,o.status
,max(oh.timestamp)
FROM order o
JOIN order_history oh
ON o.order = oh.order
WHERE oh.message = 'Status Changed' AND oh.NewStatus = 'Released'
group by o.order, o.part, o.status) lr
ON c.order = lr.order and c.part = lr.part and c.status = lr.status
JOIN
(SELECT
o.order
,o.part
,o.status
,max(oh.timestamp)
FROM order o
JOIN order_history oh
ON o.order = oh.order
WHERE oh.message = 'Status Changed' AND oh.NewStatus = 'Closed'
group by o.order, o.part, o.status) cd
ON c.order = cd.order and c.part = cd.part and c.status = cd.status
So as you can see... it's ugly! And really slow. What can I do to optimize?
Please keep in mind this is much simplified, fake example. I've tried to show the complexity in the data and query. Apologies if there are mistakes in the query, but I can verify that the current query does work to give me the results I want, so I'm not looking for just corrections, but optimizations.
It's impossible to optimize a query without access to the real query, table and index structures, and execution plan. However simplifying it to remove the subqueries and reduce the number of joins is unlikely to hurt, and might give you a better place to start investigating the plan and look for real optimization opportunities.
It also doesn't help that your 'simplified, fake' version uses illegal identifiers and has various other errors (including needing outer joins to get all five rows back), which you said to ignore but as Mitch commented, those things and the general fakeness make it kind of useless. Still, as a general idea of where to start, perhaps, with your example you could rewrite it using conditional aggregation, for example as:
fiddle including a working version of your query with legal identifiers (though 'timestamp' still isn't ideal) and other changes, and this modified query.
(This works with the 'timestamp' as strings with your fake values, but would also work with real dates or timestamps, or course. And it doesn't necessarily need to be an outer join if there is always a 'Created' record, which seems likely.)