MS SQL Server: How to get the counts of each visit type per date range per user?

49 Views Asked by At

I have 2 tables.

The first table Users contains attributes id, session, start_date, and end_date. The id is an int that uniquely identifies each user. The session is an int that keeps track of each continuous date range for each user. start_date and end_date reflect the date range of each session for each user. Users can show up multiple times in this table, and you may assume that if a user is included multiple times, their next session's start_date happens after their previous session's end_date.

Here is an example of the Users table:

| id | session | start_date   | end_date     |
|----|---------|--------------|--------------|
| 1  | 1       | '2017-01-01' | '2017-03-01' |
| 1  | 2       | '2017-04-01' | '2017-10-01' |
| 2  | 1       | '2017-01-01' | '2017-12-31' |
| 3  | 1       | '2017-01-01' | '2017-12-31' |

The second table Visits contains attributes id, visit_date, and visit_type. id is the same id as in the Users table. visit_date (date) and visit_type (string) detail the date of a visit for each user and the type of each visit. Users can show up multiple times in this table or not show up at all.

Here is an example of the Visits table:

| id | visit_date   | visit_type |
|----|--------------|------------|
| 1  | '2017-01-05' | 'a'        |
| 1  | '2017-02-06' | 'a'        |
| 1  | '2017-04-01' | 'a'        |
| 1  | '2017-05-02' | 'b'        |
| 1  | '2017-06-13' | 'b'        |
| 1  | '2017-01-12' | 'c'        |
| 2  | '2017-04-01' | 'a'        |
| 2  | '2017-12-01' | 'd'        |

I need to write a query that returns the number of visits for each visit_type for each user's session in the Users table where the session was at least 90 days long and the visit_date falls within the session range (dates inclusive). If a user's session does not have any visits of a particular type, it is recorded as visit_type of NULL and visit_count of 0.

Here is the desired output based off of the above tables:

-- |id|session|start_date  |end_date    |visit_type|visit_count|
-- |1 |2      |'2017-04-01'|'2017-10-01'|’a’       |1          |
-- |1 |2      |'2017-04-01'|'2017-10-01'|’b’       |2          |
-- |2 |1      |'2017-01-01'|'2017-12-31'|’a’       |1          |
-- |2 |1      |'2017-01-01'|'2017-12-31'|’d’       |1          |
-- |3 |1      |'2017-01-01'|'2017-12-31'|NULL      |0          |
SELECT Users.id, Users.session, Users.start_date, Users.end_date, 
  Visits.visit_date, Visits.visit_type
FROM Users
LEFT JOIN Visits ON Users.id = Visits.id
WHERE (
    DATEDIFF(DAY, Users.start_date, Users.end_date) >= 90
  )
ORDER BY Users.id, Users.session
;

From this code, I expected an output where for each user and session the visit_date and visit_type were recorded. The only sessions included would be those that were 90 days or longer.

However, I'm running into an error using the DATEDIFF function where it says my parameters are incorrect.

How can I use DATEDIFF to filter out the sessions shorter than 90 days in length?

1

There are 1 best solutions below

0
Adrian Maxwell On

This may simply be a matter of using day instead of DAY in the datediff function, but also you seem to need a a group by to achieve the visit count:

SELECT
      Users.id
    , Users.session
    , Users.start_date
    , Users.end_date
    , Visits.visit_type
    , COUNT(Visits.visit_type) AS visit_count
FROM Users
LEFT JOIN Visits ON Users.id = Visits.id
WHERE DATEDIFF(day, Users.start_date, Users.end_date) >= 90
    AND Visits.visit_date BETWEEN Users.start_date AND Users.end_date
GROUP BY
      Users.id
    , Users.session
    , Users.start_date
    , Users.end_date
    , Visits.visit_type
ORDER BY
      Users.id
    , Users.session

see: TSQL datediff

nb this assumes you are using SQL Server. If this is incorrect please choose the correct dbms as a tag under your question. Also note that "sql" is NOT an alias for "sql server".