Running on AWS Aurora MySQL 8.0
I have a visit_data table of _record_id, location_id, visit_date.
I need to provide the query a list of location_id and determine if one of those location_id was the last visit for any of the record_id.
I do have a functional query that does work and in production but performance is very slow.
SELECT
ud.record_id,
vd.location_id
FROM
user_data as ud
visit_data as vd
WHERE ud.record_id = vd._record_id
AND vd._campaign_id in (123)
AND vd.location_id = (SELECT
pvd.location_id
FROM
visit_data as pvd
WHERE
pvd._campaign_id in (123)
AND pvd._record_id = ud.record_id
AND pvd.location_id IN ('location-1', 'location-2',
'location-3', 'location-4',
'location-5', 'location-1800')
ORDER BY pvd.visit_date DESC
LIMIT 1)
AND vd.location_id IN ('location-1', 'location-2',
'location-3', 'location-4',
'location-5', 'location-1800')
AND vd.visit_date BETWEEN '2022-01-01' AND '2024-01-31'
GROUP BY ud.record_id
I do get the expected results however on a table with only little over 2 million records, it is taking upwards of 60 seconds.
I have used all the permutation of indexes using record_id (rid), campaign_id (cid), poi and visit_date, keeping visit_date last. I could think and I just can't seem to get the right combination.
| select_type | table | type | possible_key | key | key_len | ref | rows | filtered | extra |
|---|---|---|---|---|---|---|---|---|---|
| Primary | rd | range | _poi__visit_date__index | 770 | 105917 | 9.5 | Using index condition; Using where; Using temporary | ||
| Primary | fd | eq_ref | Primary | Primary | 4 | 1 | 100 | Using index | |
| Dependent | pvd | ref | _cid__rid__index | 4 | 3 | 12.17 | Using index condition; Using where; Using filesort |
I am hoping there is some strategic logic I am not seeing and using the right query structure that some else might see.
I hope I provided enough info to help.
Thanks --FS