Apparently, sorting is somehow applied incorrectly in mysql 8 (checked on 8.0.33 to 8.0.35) when query has ORDER BY + GROUP BY + GROUP_CONCAT() + COUNT(*) OVER() window function. See test case below (NOTE that it's synthetic and oversimplified for clarity - obviously in a real case grouping wouldn't be run on a single table).

Question 1: why in the first query ordering is not applied as I expect it (i.e. ascending by sort field)?

Question 2: what would be a recommended fix (or work around)? Duplicating the ordering specification into COUNT(*) OVER() doesn't seem a very elegant or robust solution.

Schema and test data

CREATE TABLE IF NOT EXISTS users (
    id INT AUTO_INCREMENT PRIMARY KEY,
    username VARCHAR(255),
    email VARCHAR(255),
    sort INT
);

INSERT INTO users (username, email, sort) VALUES ('user1', '[email protected]', 50);
INSERT INTO users (username, email, sort) VALUES ('user2', '[email protected]', 30);
INSERT INTO users (username, email, sort) VALUES ('user3', '[email protected]', 20);
INSERT INTO users (username, email, sort) VALUES ('user4', '[email protected]', 90);
INSERT INTO users (username, email, sort) VALUES ('user5', '[email protected]', 40);
INSERT INTO users (username, email, sort) VALUES ('user6', '[email protected]', 70);

Unexpected result

Query:

SELECT
    sort,
    username,
    GROUP_CONCAT(email) AS email_concat,
    COUNT(*) OVER () AS total_count
FROM users
GROUP BY id
ORDER BY sort;

(Note: total_count is supposed to give a total number of results in a result set, for when query in paginated (pagination omitted from the example)

Result (unexpected, not ordered by sort):

| sort | username | email_concat      | total_count |
| ---- | -------- | ----------------- | ----------- |
| 50   | user1    | [email protected] | 6           |
| 30   | user2    | [email protected] | 6           |
| 20   | user3    | [email protected] | 6           |
| 90   | user4    | [email protected] | 6           |
| 40   | user5    | [email protected] | 6           |
| 70   | user6    | [email protected] | 6           |

Execution plan:

 -> Window aggregate with buffering: count(0) OVER () 
    -> Table scan on <temporary>  (cost=2.5..2.5 rows=0)
        -> Temporary table  (cost=0..0 rows=0)
            -> Group aggregate: group_concat(users.email separator ',')
                -> Sort: users.id
                    -> Stream results  (cost=0.85 rows=6)
                        -> Sort: users.sort  (cost=0.85 rows=6)
                            -> Table scan on users  (cost=0.85 rows=6)

DB Fiddle 1 (wrong result)

Expected result

However, if we add ORDER BY into window function (which, in my understanding, is a no-op/redundant in this case), sorting is applied as expected:

SELECT
    sort,
    username,
    GROUP_CONCAT(email) AS email_concat,
    COUNT(*) OVER (ORDER BY NULL) AS total_count
FROM users
GROUP BY id
ORDER BY sort;

Result (as wanted, ordered by sort):

| sort | username | email_concat      | total_count |
| ---- | -------- | ----------------- | ----------- |
| 20   | user3    | [email protected] | 6           |
| 30   | user2    | [email protected] | 6           |
| 40   | user5    | [email protected] | 6           |
| 50   | user1    | [email protected] | 6           |
| 70   | user6    | [email protected] | 6           |
| 90   | user4    | [email protected] | 6           |

Execution plan:

 -> Sort: users.sort
    -> Table scan on <temporary>  (cost=2.5..2.5 rows=0)
        -> Temporary table  (cost=0..0 rows=0)
            -> Window aggregate with buffering: count(0) OVER (ORDER BY NULL ) 
                -> Sort: NULL
                    -> Stream results
                        -> Group aggregate: group_concat(users.email separator ',')
                            -> Sort: users.id
                                -> Stream results  (cost=0.85 rows=6)
                                    -> Table scan on users  (cost=0.85 rows=6)

DB Fiddle 2 (correct result)

3

There are 3 best solutions below

2
Barmar On

It appears to be a bug triggered by the use of GROUP_CONCAT(), as removing that produces the correct ordering.

A workaround is to put the grouped query in a subquery, and use ORDER BY sort in the main query.

SELECT *
FROM (
  SELECT
      sort,
      username,
      GROUP_CONCAT(email) AS email_concat,
      COUNT(*) OVER () AS total_count
  FROM users
  GROUP BY id) AS x
ORDER BY sort;

And if you want to get a running count instead of a total count, you need to use OVER (ORDER BY id). This also works around the bug somehow.

1
Valentin Marolf On

Question 1: I think, you have a combined issue which leeds to an unexpected behavior. Some facts:

  1. First of all, when using GROUP BY, the columns in the SELECT clause must either be part of the grouping or be used with an aggregate function. Grouping by id makes no sense, as all id's are different
  2. COUNT(*) OVER is a window function which will applied after the GROUP BY clause
  3. COUNT(*) OVER (), without ORDER BY, calculates a count of all rows in the result set without regard to any specific order

Question 2: A good a approach for problem like this is, that you first select your data, then use it as a subquery and then order the result as a second step:

SELECT sort, username, email_concat, total_count
FROM (
  ... your query ...
) AS q1
ORDER BY sort;
1
nbk On

a Window funcrtion needs an order to give the data and it self the right display order.

so add an ODER BY NULL for exanple and the order is correct

SELECT
    sort,
    username,
    GROUP_CONCAT(email) AS email_concat,
    COUNT(*) OVER (ORDER BY NULL) AS total_count
FROM users
GROUP BY id
ORDER BY sort;

But if you need the count of 1,2,3,4,5,6 the you need tpo add a column to siort by

SELECT
    sort,
    username,
    GROUP_CONCAT(email) AS email_concat,
    COUNT(*) OVER (ORDER BY NULL) AS total_count
FROM users
GROUP BY id
ORDER BY sort;
sort username email_concat total_count
20 user3 [email protected] 6
30 user2 [email protected] 6
40 user5 [email protected] 6
50 user1 [email protected] 6
70 user6 [email protected] 6
90 user4 [email protected] 6
SELECT
    sort,
    username,
    GROUP_CONCAT(email) AS email_concat,
    COUNT(*) OVER (ORDER BY sort) AS total_count
FROM users
GROUP BY id
ORDER BY sort;
sort username email_concat total_count
20 user3 [email protected] 1
30 user2 [email protected] 2
40 user5 [email protected] 3
50 user1 [email protected] 4
70 user6 [email protected] 5
90 user4 [email protected] 6

fiddle