Apparently, sorting is somehow applied incorrectly in mysql 8 (checked on 8.0.33 to 8.0.35) when query has ORDER BY + GROUP BY + GROUP_CONCAT() + COUNT(*) OVER() window function. See test case below (NOTE that it's synthetic and oversimplified for clarity - obviously in a real case grouping wouldn't be run on a single table).
Question 1: why in the first query ordering is not applied as I expect it (i.e. ascending by sort field)?
Question 2: what would be a recommended fix (or work around)? Duplicating the ordering specification into COUNT(*) OVER() doesn't seem a very elegant or robust solution.
Schema and test data
CREATE TABLE IF NOT EXISTS users (
id INT AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(255),
email VARCHAR(255),
sort INT
);
INSERT INTO users (username, email, sort) VALUES ('user1', '[email protected]', 50);
INSERT INTO users (username, email, sort) VALUES ('user2', '[email protected]', 30);
INSERT INTO users (username, email, sort) VALUES ('user3', '[email protected]', 20);
INSERT INTO users (username, email, sort) VALUES ('user4', '[email protected]', 90);
INSERT INTO users (username, email, sort) VALUES ('user5', '[email protected]', 40);
INSERT INTO users (username, email, sort) VALUES ('user6', '[email protected]', 70);
Unexpected result
Query:
SELECT
sort,
username,
GROUP_CONCAT(email) AS email_concat,
COUNT(*) OVER () AS total_count
FROM users
GROUP BY id
ORDER BY sort;
(Note: total_count is supposed to give a total number of results in a result set, for when query in paginated (pagination omitted from the example)
Result (unexpected, not ordered by sort):
| sort | username | email_concat | total_count |
| ---- | -------- | ----------------- | ----------- |
| 50 | user1 | [email protected] | 6 |
| 30 | user2 | [email protected] | 6 |
| 20 | user3 | [email protected] | 6 |
| 90 | user4 | [email protected] | 6 |
| 40 | user5 | [email protected] | 6 |
| 70 | user6 | [email protected] | 6 |
Execution plan:
-> Window aggregate with buffering: count(0) OVER ()
-> Table scan on <temporary> (cost=2.5..2.5 rows=0)
-> Temporary table (cost=0..0 rows=0)
-> Group aggregate: group_concat(users.email separator ',')
-> Sort: users.id
-> Stream results (cost=0.85 rows=6)
-> Sort: users.sort (cost=0.85 rows=6)
-> Table scan on users (cost=0.85 rows=6)
Expected result
However, if we add ORDER BY into window function (which, in my understanding, is a no-op/redundant in this case), sorting is applied as expected:
SELECT
sort,
username,
GROUP_CONCAT(email) AS email_concat,
COUNT(*) OVER (ORDER BY NULL) AS total_count
FROM users
GROUP BY id
ORDER BY sort;
Result (as wanted, ordered by sort):
| sort | username | email_concat | total_count |
| ---- | -------- | ----------------- | ----------- |
| 20 | user3 | [email protected] | 6 |
| 30 | user2 | [email protected] | 6 |
| 40 | user5 | [email protected] | 6 |
| 50 | user1 | [email protected] | 6 |
| 70 | user6 | [email protected] | 6 |
| 90 | user4 | [email protected] | 6 |
Execution plan:
-> Sort: users.sort
-> Table scan on <temporary> (cost=2.5..2.5 rows=0)
-> Temporary table (cost=0..0 rows=0)
-> Window aggregate with buffering: count(0) OVER (ORDER BY NULL )
-> Sort: NULL
-> Stream results
-> Group aggregate: group_concat(users.email separator ',')
-> Sort: users.id
-> Stream results (cost=0.85 rows=6)
-> Table scan on users (cost=0.85 rows=6)
It appears to be a bug triggered by the use of
GROUP_CONCAT(), as removing that produces the correct ordering.A workaround is to put the grouped query in a subquery, and use
ORDER BY sortin the main query.And if you want to get a running count instead of a total count, you need to use
OVER (ORDER BY id). This also works around the bug somehow.