I have two tables:
CREATE TABLE locations (
site_no integer PRIMARY KEY,
name text,
geom geometry(Point,4283)
);
CREATE TABLE location_attributes (
site_no integer NOT NULL,
qualifier text NOT NULL,
value text NOT NULL
);
location_attributes contains a bunch features at most sites. Not all sites are present in location_attributes and some sites have multiple values of the same qualifier.
I'm trying to generate a wide, human-readable table showing each site_no with a column for each existing qualifier as below:
| site_no | name | qualifier1 | qualifier2 |
|---|---|---|---|
| 1 | name1 | value1.1; value1.2 | value4 |
| 2 | name2 | value2 | value5 |
See my wokring solution below. However, this is not very performant and becomes a lot slower with each added qualifier due to the joins.
WITH aggr AS (
SELECT
site_no,
qualifier,
array_agg(value) as arr
FROM location_attributes
GROUP BY 1,2
)
SELECT
locations.site_no
, locations.name
, array_to_string(qualifier1.arr,'; ') as qualifier1
, array_to_string(qualifier2.arr,'; ') as qualifier2
FROM locations
LEFT JOIN
(SELECT * FROM aggr WHERE qualifier = 'qualifier1') as qualifier1
ON locations.site_no = qualifier1.site_no
LEFT JOIN
(SELECT * FROM aggr WHERE qualifier = 'qualifier2') as qualifier2
ON locations.site_no = qualifier2.site_no
There are a few problems I'd like to solve:
- The number of qualifiers/keys can change over time. The query should get the qualifiers dynamically.
- Re-write the query to be much more performant.
- Which columns to index to improve performance further.
Thanks!