MySQL query to find the customers who have placed orders on consecutive days

72 Views Asked by At

I am trying to get the ids of the customers who have placed orders on consecutive days. The table is created as follows:

create table orders(
  orderid INT,
  orderdate date,
  customerid int
);

The values:

insert into orders (orderid, orderdate, customerid)
values(1,'2023-06-20',1),
    (2, '2023-06-21', 2),
    (3, '2023-06-22', 3),
    (4, '2023-06-22', 1),
    (5, '2023-06-23', 3),
    (6, '2023-06-22', 1),
    (7, '2023-06-26', 4),
    (8, '2023-06-27', 4),
    (9, '2023-06-29', 4),
    (10, '2023-06-29', 5),
    (11, '2023-06-30', 5),
    (12, '2023-06-28', 5),
    (13, '2023-06-25', 4),
    (14, '2023-06-24', 4),
    (15, '2023-06-30', 4);

The code that I wrote gave the output if ids with orders consecutive days but left the ids of customers who had a gap in their order despite having more number of orders before the gap actually occurred. The code that I wrote:

with t1 as(
select customerid, orderdate,
case when lead(orderdate) over (partition by customerid order by orderdate) is null then 1
else abs(orderdate - lead(orderdate) over (partition by customerid order by orderdate)) end as gap
from orders)
select customerid, sum(gap) as consecutive
from t1
where gap>0
group by customerid
having count(*)=sum(gap) and count(*)>1;

The output:

+------------+------------------+
| customerid | consecutive_days |
+------------+------------------+
|          3 |                2 |
|          5 |                3 |
+------------+------------------+

The output I would want:

+------------+------------------+
| customerid | consecutive_days |
+------------+------------------+
|          3 |                2 |
|          4 |                4 |
|          4 |                2 |
|          5 |                3 |
+------------+------------------+

Since customer with customerid 4 has ordered from 2023-06-24 to 2023-06-27. The next order from the same customer is on 2023-06-29 and 2023-06-30 hence not continuous and should occur as a separate row.

Edit: The orders placed must be on consecutive days, irrespective of the number of orders placed on a single day.

3

There are 3 best solutions below

4
SelVazi On BEST ANSWER

This is an other solution to resolve this gaps and islands, it can be resolved by calculating the difference between row numbers (DENSE_RANK is used because various customers may place orders on the same day) , which assigns a unique identifier to each sequence of consecutive orders for a customer :

WITH cte AS (
  SELECT
    *, DENSE_RANK() over(ORDER BY orderdate) - DENSE_RANK() over (PARTITION BY customerid ORDER BY orderdate) as grp
  FROM orders
)
SELECT customerid, COUNT(DISTINCT orderdate) AS consecutive_days 
FROM cte
WHERE grp > 0
GROUP BY customerid, grp
HAVING consecutive_days > 1;

Results :

customerid  consecutive_days
3           2
4           4
4           2
5           3

Demo here

0
user1191247 On

This is a typical gaps and islands problem and here is one way of getting your desired result:

SELECT customerid, COUNT(DISTINCT orderdate) AS consecutive_days
FROM (
    SELECT *, orderdate - INTERVAL DENSE_RANK() OVER (PARTITION BY customerid ORDER BY orderdate) DAY AS grp
    FROM orders
) AS order_groups
GROUP BY customerid, grp
HAVING consecutive_days > 1;

Outputs:

customerid consecutive_days
3 2
4 4
4 2
5 3

Here's a db<>fiddle.

If you look at the result of the derived table you can see that by subtracting the DENSE_RANK from the orderdate we create a value we can use for grouping.

orderid orderdate customerid grp
1 2023-06-20 1 2023-06-19
4 2023-06-22 1 2023-06-20
6 2023-06-22 1 2023-06-20
2 2023-06-21 2 2023-06-20
3 2023-06-22 3 2023-06-21
5 2023-06-23 3 2023-06-21
14 2023-06-24 4 2023-06-23
13 2023-06-25 4 2023-06-23
7 2023-06-26 4 2023-06-23
8 2023-06-27 4 2023-06-23
9 2023-06-29 4 2023-06-24
15 2023-06-30 4 2023-06-24
12 2023-06-28 5 2023-06-27
10 2023-06-29 5 2023-06-27
11 2023-06-30 5 2023-06-27

Note that I have used DENSE_RANK, instead of the more typical ROW_NUMBER, to handle customers placing multiple orders on the same day.

1
Zsolt Szilagyi On

It really depends on yxour table size. If you only have a couple hundred orders, and O=N² is viable, go with something as simple as

SELECT DISTINCT o1.customerid
FROM orders o1
JOIN orders o2 ON o1.customerid = o2.customerid
AND DATEDIFF(o1.orderdate, o2.orderdate) = 1;

For performance, put a single index over both customerid and orderdate.

The best way, though somewhat complex, are the MySQL LEAD and LAG functions.