How do I create multiple join conditions from the same table?

81 Views Asked by At

Context: i'm working on a table that contains 2 fields that may include geo-identifying data.

The fields can either contain NUTS codes or Postcodes (even partial). (https://en.wikipedia.org/wiki/Nomenclature_of_Territorial_Units_for_Statistics).

Now I have a second table that includes all (or most) possibles cases for these and i'm trying to match them.

regionDataTable

If a match is made, save the NUTS1/2/3 into the corresponding fields.

Examples :

Country First_Field Second_Field Output
FR NULL 75324 Paris
FR FR101 NULL Paris
FR FRZZZ NULL Extra Region NUTS3
FR NULL 94 Paris
DE NULL 94 Passau Kreisfreie Stadt
DE NULL 936 NULL

Current version

-- Currently i'm updating an existing table, but i could create a copy and add the fields during creation
update src_table
SET geo_1 =  COALESCE(tripleZPost.geo_1, directMatch.geo_1
            ,purgedPostFirstThree.geo_1, purgedPostFirstTwo.geo_1
            )
    ,geo_2 =  COALESCE(tripleZPost.geo_2, directMatch.geo_2
            ,purgedPostFirstThree.geo_2, purgedPostFirstTwo.geo_2
            )
    ,geo_3 =  COALESCE(tripleZPost.geo_3, directMatch.geo_3
            ,purgedPostFirstThree.geo_3, purgedPostFirstTwo.geo_3
            )
FROM src_table as mainTable
-- Check if it's a generic NUTS code (FRZZZ, DEZZZ, ESZZZ)
LEFT JOIN geocodeIndex as tripleZPost on (
    tripleZPost.country = mainTable.COUNTRY
    and mainTable.FIRST_FIELD = mainTable.COUNTRY + 'ZZZ'
    and mainTable.FIRST_FIELD = tripleZPost.geocode)
-- Check if we have a full match
LEFT JOIN geocodeIndex as directMatch on (
    directMatch.country = mainTable.COUNTRY 
    and (
        mainTable.FIRST_FIELD = directMatch.geocode
        or
        mainTable.SECOND_FIELD = directMatch.geocode)
)
-- If no full match is found, try a partial.
-- 75324 is a full postcode, we only keep the first 2-3 characters
LEFT JOIN geocodeIndex as purgedPostFirstTwo on (
    purgedPostFirstTwo.country = mainTable.COUNTRY 
    and LEFT(SECOND_FIELD, 2) = purgedPostFirstTwo.geocode
)
LEFT JOIN geocodeIndex as purgedPostFirstThree on (
    purgedPostFirstThree.country = mainTable.COUNTRY 
    and LEFT(SECOND_FIELD, 3) = purgedPostFirstThree.geocode
)

My inquiry now is two-fold :

  • Is there a cleaner/more efficient way of doing multiple checks like these without using LEFT-JOIN's ?
  • I've used coalesce to take the first non-NULL entry here to make a priority system, is there a way to run the LEFT-joins only if the previous entry didn't result in a match ?

Keep in mind, my table is around 800 million entries so while performance is a concern space/query runtime isn't, I can run it overnight, I just can't make my server crash.

0

There are 0 best solutions below