Query to find active records on a given date in a table recording "start" and "stop" dates

342 Views Asked by At

I have a list of "start / stop" activity logged in a table, each one associated with a date. I need to determine which users had "started" on a particular date - i.e. were in progress with the task. My current setup and query can be represented by this simplistic view of it:

CREATE TABLE `registration_statuses` (
    `status_id` INT(11) NOT NULL AUTO_INCREMENT,
    `status_user_id` INT(10) UNSIGNED NOT NULL DEFAULT '0',
    `status_activity` ENUM('start','stop') DEFAULT 'start',
    `status_date` DATE NULL DEFAULT NULL,
    PRIMARY KEY (`status_id`),
    INDEX `status_user_id` (`status_user_id`)
);

INSERT INTO `registration_statuses` (`status_user_id`, `status_activity`, `status_date`)
VALUES (1, 'start', '2020-01-01'),
       (2, 'start', '2020-01-02'),
       (1, 'stop', '2020-01-19'),
       (1, 'start', '2020-01-25'),
       (2, 'stop', '2020-01-31'),
       (1, 'stop', '2020-01-31');

I am then running this query:

SELECT `rs`.`status_user_id`
FROM `registration_statuses` `rs`
  INNER JOIN (
    SELECT `status_user_id`, MAX(status_date) `last_date`
    FROM `registration_statuses`
    WHERE `status_date` < '2020-01-03'
    GROUP BY `status_user_id`
  ) `srs` ON `rs`.`status_user_id` = `srs`.`status_user_id`
            AND `rs`.`status_date` = `srs`.`last_date`
WHERE `status_activity` = 'start';

(See http://sqlfiddle.com/#!9/c8d371/1/0)

By changing the date in the query, this query returns a list of user ids that tell me who is engaged (i.e. has started a task) on that specific date. However, the users are considered (in real life) to have been engaged with the task on the actual date that they stop it. This query doesn't allow for this in that if you were to change the date in the query to reflect 2020-01-19, the day on which user 1 stopped, the query would only return user 2.

I tried changing the <= condition to a strict <, and while this solves that part of the problem, users are not considered busy on the day that they start. With a strict <, only user is returned on '2019-01-25' whereas I want both users to appear.

My only "viable" solution, at this point, is to merge the results of the two versions of the queries (in the form of a DISTINCT / UNION query), but I can't help but think that there must be a more efficient way of obtaining the results I need.

2

There are 2 best solutions below

2
Strawberry On BEST ANSWER

Does this help?

SELECT a.status_id
     , a.status_user_id 
     , a.status_date start
     , MIN(b.status_date) stop
  FROM registration_statuses a
  LEFT
  JOIN registration_statuses b
    ON b.status_user_id = a.status_user_id
   AND b.status_id > a.status_id
   AND b.status_activity = 'stop'
 WHERE a.status_activity = 'start'
 GROUP 
    BY a.status_id;

+-----------+----------------+------------+------------+
| status_id | status_user_id | start      | stop       |
+-----------+----------------+------------+------------+
|         1 |              1 | 2020-01-01 | 2020-01-19 |
|         2 |              2 | 2020-01-02 | 2020-01-31 |
|         4 |              1 | 2020-01-25 | 2020-01-31 |
+-----------+----------------+------------+------------+
2
Gordon Linoff On

One method is a correlated subquery:

select rs.*
from registration_statuses rs
where rs.status_date = (select max(rs2.status_date)
                        from registration_statuses rs2
                        where rs2.status_user_id = rs.status_user_id and
                              rs2.status_date <= ?
                       ) and
      rs.status_activity = 'active';

For performance, you want an index on registration_statuses(status_user_id, status_date).

There are other interesting methods. If you just want the user_id, here is an approach only using aggregation:

select rs.user_id
from registration_statuses rs
where rs.status_date <= ?
group by rs.user_id
having max(rs.status_date) = max(case when rs.status_activity = 'active' then status_date end);

That is, select users whose most recent status date as of a particular date is "active".