I am attempting to identify what quartile values in a particular column in my temp table fall into. Using the ntile function to sort these values into their respective quartiles, the maximum value of one quartile is the same as the minimum value of the following quartile, leading to rows with that value yielding a row representing both quaritles. I'd like to prevent this, and instead have every value fall into only one partile.
Below is the code used to create the quartile table, along with the creation of a random example table.
create table temp_example(
engagement_year integer,
cnt_total_days_engaged integer
)
DROP TABLE IF EXISTS temp_example;
INSERT INTO temp_example (engagement_year,
cnt_total_days_engaged)
VALUES
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int)),
(cast (random() * 2 as int), cast (random() * 100 as int))
SELECT * FROM temp_example;
DROP TABLE IF EXISTS temp_example_quartiles;
SELECT ntile(4) over (partition by engagement_year
order by cnt_total_days_engaged asc) quartile,
cnt_total_days_engaged,
engagement_year
INTO temp_example_quartiles
FROM temp_member_engagement;
SELECT quartile, cnt_total_days_engaged
FROM temp_example_quartiles
WHERE engagement_year =2020
GROUP BY quartile, cnt_total_days_engaged
ORDER BY cnt_total_days_engaged;
For example, when querying on the resulting temp_quartliles table, the value of 70 is both the minimum value of quaritle 3 and the maximum value of quartlie 2, so if I would get duplicate rows for both quartile 2 and 3 wherever cnt_total_days_engaged equals 70. I'd like to change this so that 70 falls into quartlie 2, but not 3, but I'm struggling on how to most efficiently do this.
I greatly appreciate any help that can be offered!