I have following tables:
CREATE TABLE IF NOT EXISTS auth_user
(
id SERIAL PRIMARY KEY,
username VARCHAR(30),
first_name VARCHAR(30),
last_name VARCHAR(30)
);
CREATE TABLE IF NOT EXISTS statistics_restaccesslog
(
id SERIAL PRIMARY KEY,
user_id INTEGER,
url VARCHAR(200),
course_id INTEGER,
date_visited TIMESTAMP,
ip_address VARCHAR(16)
);
Consider below records:
INSERT INTO auth_user (username, first_name, last_name)
VALUES
('user1', 'John', 'Doe'),
('user2', 'Jane', 'Smith'),
('user3', 'Michael', 'Johnson'),
('user4', 'Emily', 'Brown'),
('user5', 'David', 'Miller'),
('user6', 'Olivia', 'Davis'),
('user7', 'Daniel', 'Wilson'),
('user8', 'Sophia', 'Anderson'),
('user9', 'Andrew', 'Taylor'),
('user10', 'Emma', 'Thomas');
INSERT INTO statistics_restaccesslog (user_id, url, course_id, date_visited, ip_address)
VALUES
(1, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.1'),
(1, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.2'),
(3, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.2'),
(4, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.3'),
(5, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.4'),
(6, 'https://example.com/page', 2, CURRENT_TIMESTAMP, '192.168.0.5'),
(7, 'https://example.com/page', 2, CURRENT_TIMESTAMP, '192.168.0.6'),
(8, 'https://example.com/page', 3, CURRENT_TIMESTAMP, '192.168.0.7'),
(9, 'https://example.com/page', 4, CURRENT_TIMESTAMP, '192.168.0.8'),
(10, 'https://example.com/page', 5, CURRENT_TIMESTAMP, '192.168.0.9');
Point 1: Note the records with same combination of (user_id, url) for two different ip_addresses:
(1, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.1')
(1, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.2')
Point 2: Note the records with same ip_address with two different combination of (user_id, url):
(1, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.2')
(3, 'https://example.com/page', 1, CURRENT_TIMESTAMP, '192.168.0.2')
I want to build SQL query which will return all such records. I tried following:
SELECT rat.id, rat.user_id, rat.url, rat.course_id, rat.date_visited,
rat.ip_address, u.username, CONCAT(u.first_name, ' ', u.last_name) AS name
FROM statistics_restaccesslog rat,
auth_user u
WHERE ((url, user_id) IN (
SELECT url, user_id
FROM statistics_restaccesslog
GROUP BY url, user_id
HAVING COUNT(DISTINCT ip_address) > 1
)
OR ip_address IN (
SELECT ip_address
FROM statistics_restaccesslog
GROUP BY ip_address
HAVING COUNT(DISTINCT user_id) > 1
))
AND
rat.user_id = u.id
;
On db-fiddle, it correctly returns first three records:
My real database have following records:
(The greyed out urls are exactly the same.)
However, when I run above query, it returns only first record, that is one with id=17. Record with id=18 is not returned, even though it satisfies point 2. Why is this so? What I am missing?
Update
I have added four more records to statistics_restaccesslog:
-- same ip_address, two different (user_id, url)
(17, '/module1/api/rest1/', 1019, CURRENT_TIMESTAMP, '10.0.0.3'),
(18, '/module1/api/rest1/', 1019, CURRENT_TIMESTAMP, '10.0.0.3'),
-- same (user_id, url), two different ip_address
(19, '/module1/api/rest1/', 1019, CURRENT_TIMESTAMP, '10.0.0.4'),
(19, '/module1/api/rest1/', 1019, CURRENT_TIMESTAMP, '10.0.0.5');
First two satisfy point 2 and last two satisfy point 1. So all four records should appear in the query output. But they dont. The query output stays the same! I am not getting why these records are not coming in the output.
Also below is equivalent query with JOIN. It also results in similar behavior:
SELECT s.user_id, s.url, s.course_id, s.date_visited, s.ip_address, u.username, u.first_name, u.last_name
FROM statistics_restaccesslog AS s
JOIN auth_user AS u ON s.user_id = u.id
WHERE (s.url, s.user_id) IN (
SELECT url, user_id
FROM statistics_restaccesslog
GROUP BY url, user_id
HAVING COUNT(DISTINCT ip_address) > 1
)
OR s.ip_address IN (
SELECT ip_address
FROM statistics_restaccesslog
GROUP BY ip_address
HAVING COUNT(DISTINCT user_id) > 1
);
PS: Am using postgresql 11


Find the "keys" of the ip-address, resp. the user_id/url combination that matches the criteria using a GROUP BY Query and inner join that group by query back to the input table.
You need to formulate a clever
HAVINGclause:For a
user_id><urlcombination, you need to make a single expression of the two to make a MIN() / MAX() out of it. I used casting theuser_idtoVARCHARand concatenating that withurl.I am almost completely sure you don't get the data you're looking for because there are no
auth_userrows with theidas theuser_ids returned from the two CTE-s.To make my point, I left join the result tables to
auth_user, and add theuser_idcolumn to the result table: