We use MariaDB 10.6.14.
I wrote this rather messy query
SELECT DISTINCT t.text
FROM dba.questionnaire_translation t
JOIN dba.filter_translation ft ON t.translationId = ft.translationId
JOIN dba.filter f ON ft.filterId = f.id
WHERE t.text IN (SELECT DISTINCT fi.filterName
FROM analyse.filter_interaction fi
WHERE fi.filterName NOT REGEXP "^\\S.*\\S$|^\\S$"
AND fi.filterName != "");
which returns 16 values from our database (disregard that it is not ideal in any way, it's not going to be used).
However, the inner query
SELECT DISTINCT filterName
FROM analytics.filter_interaction fi
WHERE fi.filterName NOT REGEXP "^\\S.*\\S$|^\\S$"
and fi.filterName != "";
returns only 11 values. The 5 "new" values are completely different from the other 11 values.
Why does this happen?
I asked the AI of my IDE to refactor the query to see if I am doing something wrong.
It came up with using a common table expression (CTE) for the subquery and joining instead of using IN:
WITH unusual_filter_names AS (SELECT DISTINCT filterName
FROM analyse.filter_interaction
WHERE filterName NOT REGEXP "^\\S.*\\S$|^\\S$"
AND filterName != "")
SELECT DISTINCT t.text
FROM dba.questionnaire_translation t
JOIN dba.filter_translation ft ON t.translationId = ft.translationId
JOIN dba.filter f ON ft.filterId = f.id
JOIN unusual_filter_names afi ON afi.filterName = t.text;
This query now only returns the expected 11 values.
However, failing to see the significant logical difference, I also checked what happens when using IN instead of the JOIN with the same CTE:
WITH unusual_filter_names AS (SELECT DISTINCT filterName
FROM analyse.filter_interaction
WHERE filterName NOT REGEXP "^\\S.*\\S$|^\\S$"
AND filterName != "")
SELECT DISTINCT t.text
FROM dba.questionnaire_translation t
JOIN dba.filter_translation ft ON t.translationId = ft.translationId
JOIN dba.filter f ON ft.filterId = f.id
WHERE t.text IN(SELECT filterName FROM unusual_filter_names);
This appears to me as identical to my original query, but it still returns the correct 11 values. Me and two of my colleagues were unable to see the logical difference between the queries, so I am looking for enlightenment here.