Optimize PostgreSQL VIEW

84 Views Asked by At

I have a piece of code that works very well in every way but one. It takes too long to execute. My suspicion is that the long execution time is caused by an overuse of JOINs (11 to be precise). Here is what I am trying to do:

SETUP I have 3 tables:

  1. v_na_measurement_status
id.           start_measurement.                  stop_measurement.

0       2020-09-01 22:19:48.480668+00:00    2020-09-01 22:29:29.713952+00:00
1       2020-09-01 22:17:10.392089+00:00    2020-09-01 22:18:08.139668+00:00
2       2020-09-01 22:12:49.795404+00:00    2020-09-01 22:15:14.564597+00:00 
...
  1. sensor_double_precision
id.         sensor_name.                  timestamps.                 value_cal
40548   "curved_mirror_steps"   "2020-09-01 22:29:23.526468+00"     432131  
40547   "na_average_enable".    "2020-09-01 22:29:23.410416+00"     1   
40546   "na_averages"           "2020-09-01 22:29:23.295404+00".    16  
40545   "na_power"              "2020-09-01 22:29:23.174255+00"     -5  
40544   "na_stop_freq"          "2020-09-01 22:29:23.05868+00"      18000000000 
40543   "na_start_freq"         "2020-09-01 22:29:22.944205+00"     15000000000 
...
  1. sensor_double_precision_arr
id.        sensor_name.                        timestamp                     value_cal
3831    "na_s11_iq_data_trace2"     "2020-09-01 22:29:27.456345+00".   [array with ~2000 points]
3830    "na_s21_iq_data"            "2020-09-01 22:29:27.389617+00".   [array with ~2000 points]
3829    "na_s11_iq_data_trace2"     "2020-09-01 22:29:20.543466+00".   [array with ~2000 points]
3828    "na_s21_iq_data"            "2020-09-01 22:29:20.443416+00".   [array with ~2000 points]

GOAL Using these 3 tables I want to create a VIEW called v_na_log that looks like this:

  start_measurement                  stop_measurement            curved_mirror_steps.  na_averages   na_start_freq      na_stop_freq.     na_s21_iq_data.               na_s11_iq_data   
"2020-09-01 22:29:22.913366+00"   "2020-09-01 22:29:27.478287+00"    432131                16         15000000000.      18000000000      [array with ~2000 points].     [array with ~2000 points]
...
...

Basically, I want to transpose sensor_double_precision_arr and sensor_double_precision such that the rows in the sensor_name column become columns themselves.

ONE SOLUTION This is the long code I use to achieve this:

DROP VIEW IF EXISTS v_na_log;
      CREATE VIEW v_na_log AS
        SELECT
          final_view.id, vnms.start_measurement,
          vnms.stop_measurement, final_view.curved_mirror_steps,
          final_view.na_averages, final_view.na_start_freq,
          final_view.na_stop_freq,iq_data.freq_resolution,
          iq_data.na_s21_iq_data, iq_data.na_s11_iq_data
        FROM crosstab(
          'WITH sorted_data AS (WITH current_data AS
          (SELECT DISTINCT ON (id,sensor_name)
             id, sensor_name, value_cal, timestamp
             FROM sensor_double_precision
             WHERE sensor_name in (''na_start_freq'', ''na_stop_freq'',
                                  ''na_averages'',''curved_mirror_steps'')
          ORDER BY id, sensor_name, timestamp ASC)
        SELECT
          m.id,
          s.sensor_name,
          s.value_cal
          FROM v_na_measurement_status m
        INNER JOIN current_data s ON s.timestamp
          BETWEEN m.start_measurement AND m.stop_measurement),
        log_ids AS (SELECT distinct id FROM sorted_data),
        sensor_names AS (SELECT distinct sensor_name FROM sorted_data)
        SELECT log_ids.id, sensor_names.sensor_name, sorted_data.value_cal
          FROM log_ids CROSS JOIN sensor_names
        LEFT JOIN sorted_data on (log_ids.id= sorted_data.id and
                                  sensor_names.sensor_name=sorted_data.sensor_name)')
        final_view(id bigint, curved_mirror_steps double precision,
                    na_averages double precision, na_start_freq double precision,
                    na_stop_freq double precision)
        LEFT JOIN v_na_measurement_status vnms on vnms.id = final_view.id
        LEFT JOIN
        (SELECT final_view.id, vnms.start_measurement, vnms.stop_measurement,
          array_length(final_view.na_s21_iq_data, 1) AS freq_resolution,
          final_view.na_s11_iq_data, final_view.na_s21_iq_data
          FROM crosstab(
            'WITH sorted_data AS (with current_data AS
            (SELECT DISTINCT ON (id,sensor_name)
              id, sensor_name, value_cal, timestamp
              FROM sensor_double_precision_arr
                WHERE  sensor_name LIKE ''%na_s21_iq_data%'' OR sensor_name LIKE ''%na_s11_iq_data%''
              ORDER BY id, sensor_name, timestamp ASC)
        SELECT
          m.id,
          s.sensor_name,
          s.value_cal
          FROM v_na_measurement_status m
        INNER JOIN current_data s ON s.timestamp
          BETWEEN m.start_measurement AND m.stop_measurement),
        log_ids AS (SELECT distinct id FROM sorted_data),
        sensor_names AS (SELECT distinct sensor_name FROM sorted_data)
        SELECT log_ids.id, sensor_names.sensor_name, sorted_data.value_cal
          FROM log_ids CROSS JOIN sensor_names
        LEFT JOIN sorted_data ON (log_ids.id= sorted_data.id AND sensor_names.sensor_name = sorted_data.sensor_name)')
        final_view(id bigint, na_s11_iq_data double precision[], na_s21_iq_data double precision[])
        LEFT JOIN v_na_measurement_status vnms ON vnms.id = final_view.id) iq_data ON iq_data.id = final_view.id
        ORDER BY vnms.start_measurement ASC;

PROBLEM I use this code to do it. It works very well and so far I haven't run into issues even on edge cases. But, it takes an annoyingly long time to run. For example, if I run: SELECT * FROM v_na_log LIMIT 10 It takes about 13 seconds to execute. It takes even longer if I remove the LIMIT clause. I usually have to deal with more than 10 rows so the longer it takes, the worse it is for me to analyze the data. Like I said I think this has to with the use of so many JOINs. However, I do not see a better way of doing this. I want to know if there is a better solution to this because I have a suspicion that this problem will keep getting worse and worse as the tables get larger.

I have posted about simplified versions other similar tasks and have received invaluable suggestions. I have incorporated those suggestions in my code. Here's the link for a question I had that relates to this: Crosstab using data from 2 different tables

Any suggestions on a better organization of the tables in the database itself are welcome as well.

1

There are 1 best solutions below

2
Parfait On

Based on posted data and desired output and given pivoted values are known in advance, consider running conditional aggregation or with Postgres' selective aggregates with FILTER().

SELECT m.start_measurement
       , m.stop_measurement
       , MAX(p.value_cal) FILTER(WHERE p.sensor_name = 'curved_mirror_steps') AS curved_mirror_steps
       , ...
       , MAX(pa.value_cal) FILTER(WHERE pa.sensor_name = 'na_s11_iq_data') AS na_s11_iq_data
      
FROM v_na_measurement_status m 
LEFT JOIN sensor_double_precision p 
  ON p.timestamp BETWEEN m.start_measurement 
                     AND m.stop_measurement
LEFT JOIN sensor_double_precision_arr pa
  ON pa.timestamp BETWEEN m.start_measurement 
                     AND m.stop_measurement
GROUP BY m.start_measurement
       , m.stop_measurement