I have a ROAD_PROJECTS table in Oracle 18c:
with road_projects (proj_id, road_id, year_, status, from_measure, to_measure) as (
select 100, 1, 2022, 'APPROVED', null, 100.1 from dual union all
select 101, 1, 2022, 'APPROVED', 0, 100.1 from dual union all
select 102, 1, 2022, 'APPROVED', 0, 200.6 from dual union all
select 103, 1, 2022, 'APPROVED', 0, 199.3 from dual union all
select 104, 1, 2022, 'APPROVED', 0, 201 from dual union all
select 105, 2, 2023, 'PROPOSED', 0, 50 from dual union all
select 106, 2, 2023, 'PROPOSED', 75, 100 from dual union all
select 107, 3, 2024, 'DEFERRED', 0, 100 from dual union all
select 108, 3, 2025, 'DEFERRED', 0, 110 from dual union all
select 109, 4, 2026, 'PROPOSED', 0, null from dual union all
select 110, 4, 2026, 'DEFERRED', 0, null from dual)
select * from road_projects
PROJ_ID ROAD_ID YEAR_ STATUS FROM_MEASURE TO_MEASURE
---------- ---------- ---------- -------- ------------ ----------
100 1 2022 APPROVED null 100.1 --duplicates (other than PROJ_ID); null is to be treated as zero
101 1 2022 APPROVED 0 100.1
102 1 2022 APPROVED 0 200.6 --duplicates: TO_MEASURES are approximately the same (within a 5 metre tolerance)
103 1 2022 APPROVED 0 199.3
104 1 2022 APPROVED 0 201
105 2 2023 PROPOSED 0 50 --not duplicates: FROM_MEASURES are different and TO_MEASURES are different
106 2 2023 PROPOSED 75 100
107 3 2024 DEFERRED 0 100 --not duplicates: YEARS are different and TO_MEASURES are different
108 3 2025 DEFERRED 0 110
109 4 2026 PROPOSED 0 null --not duplicates: STATUSES are different
110 4 2026 DEFERRED 0 null
I want to select rows where the ROAD_ID, YEAR_, STATUS, FROM_MEASURES, and TO_MEASURES are duplicated.
In the case of FROM_MEASURES and TO_MEASURES, I want to use a 5 metre tolerance. For example, these TO_MEASURES would be considered duplicates: 200.6, 199.3, and 201.
- Note: To be honest, I haven't wrapped my head around how the tolerance would work. Would it be 5 +/- from the "middle" of the range? I'm open to ideas -- whatever's easiest.
When comparing FROM_MEASURES or TO_MEASURES, I want to treat null as 0. But when it comes to the output, returning either null or 0 would be fine.
The result would look like this:
PROJ_ID ROAD_ID YEAR_ STATUS FROM_MEASURE TO_MEASURE
---------- ---------- ---------- -------- ------------ ----------
100 1 2022 APPROVED null 100.1 --duplicates
101 1 2022 APPROVED 0 100.1
102 1 2022 APPROVED 0 200.6 --duplicates
103 1 2022 APPROVED 0 199.3
104 1 2022 APPROVED 0 201
How can I select duplicates -- including near-duplicate numbers?
You could use an
existsclause to look for a close duplicate:fiddle
The tolerance is applied by subtracting the two from (or to) measures, getting the absolute (unsigned) value for that difference, and seeing if that is less than 5.
Partly for fun, partly because of @Jorge's comment... you could also do this with
match_recognizepattern matching, which would allow you to keep track of which duplicates go together, and the min/max from/to across a group of duplicates, if that sort of thing might be useful:fiddle