I have a source system where I get for example a count of ID of 2318.
I made two queries, 1 is showing only count of the IDs and a second one has a group by clause. But in the end the outcome is different.
1. Query
SELECT
COUNT(DISTINCT ID)
FROM my_table fact
JOIN d_time time ON ( time.time_5_min_utc = (fact.host_tlt_utc -(fact.host_tlt_utc % (300*1000))))
WHERE fact.host_tlt_utc >= ${fromTimestamp}
AND fact.host_tlt_utc < ${fromTimestamp} + 86400000
AND time.time_5_min_utc >= ${fromTimestamp}
AND time.time_5_min_utc < ${fromTimestamp} + 86400000;
Output of this query is 2318
| COUNT(DISTINCT ID) |
|---|
| 2318 |
2. Query
SELECT
status,
COUNT(DISTINCT ID)
FROM my_table fact
JOIN d_time time ON ( time.time_5_min_utc = (fact.host_tlt_utc -(fact.host_tlt_utc % (300*1000))))
WHERE fact.host_tlt_utc >= ${fromTimestamp}
AND fact.host_tlt_utc < ${fromTimestamp} + 86400000
AND time.time_5_min_utc >= ${fromTimestamp}
AND time.time_5_min_utc < ${fromTimestamp} + 86400000
group by status;
Output if I sum up all the lines is much higher.
| status | COUNT(DISTINCT ID) |
|---|---|
| Open | 1383 |
| In_Progress | 980 |
| Ready | 20 |
Is there any explanation why such different output is shown? I think in the end it should be the same.
Thank you.