How to get a table with NULL values from a PostgreSQL INNER JOIN and UNION ALL query?

71 Views Asked by At

I have a PostgreSQL query for a time series graph with an inner join. A list of tags is used as a where condition for the inner join. I use union all to get a result table with data from two cases: a query with tags in the list and a query with an empty tag list. I want rows with null values from the first select before the union all if the tag list is empty but PostgreSQL returns SELECT 0 i.e. no rows at all. I want null rows because without them I can't programmatically separate the cases.

How can I get a result with null rows? Or is there a way of obtaining the result only from the select before union all if the tag list is not empty and the result only from the select after the union all if the tag list is empty? If it is not possible to achieve any of this, then suggestions about how to extract the data of the two cases from the current query or a modification of it are very welcome. There is no need for handling cases where a restaurant has multiple tags: either a restaurant has one tag or it does not have any tags.

Examples of the query, tables with dummy values and comments here: db<>fiddle.

-- I want null values from this first SELECT before UNION ALL unless
-- a more elegant/efficient solution is possible. But I get 'SELECT 0'.
SELECT 
  f.days AS days,
  SUM (f.waste_kgs) AS waste,
  SUM (f.prepared_kgs) AS prepared,
  f.waste_type_id AS waste_id,
  f.restaurant_id AS restaurant
FROM food f
INNER JOIN tags t
  ON f.restaurant_id = t.restaurant_id
WHERE t.name = ANY (array[]::text[]) -- Empty tag list as an argument.
  AND f.days BETWEEN '2000-1-1' and '2099-1-1'
  AND f.restaurant_id = ANY (array[1, 2, 3])
GROUP BY days, restaurant, waste_id 
-- Separately running the first SELECT returns
-- days     waste   prepared    waste_id    restaurant
-- SELECT 0

UNION ALL 
-- The same query without the INNER JOIN and tag list argument. 
SELECT
  f.days AS days,
  SUM (f.waste_kgs) AS waste,
  SUM (f.prepared_kgs) AS prepared,
  f.waste_type_id AS waste_id,
  f.restaurant_id AS restaurant
FROM food f
WHERE f.days BETWEEN '2000-1-1' and '2099-1-1'
  AND f.restaurant_id = ANY (array[1, 2, 3])
GROUP BY days, restaurant, waste_id;

-- The whole query returns
-- days       waste     prepared    waste_id    restaurant
-- 2023-01-01   2.5        16.0           1              1
-- 2023-01-02   8.6         7.3           1              2
-- 2023-01-03   10.5        1.8           1              3
-- 2023-01-03   0.8         0.0           2              3
-- SELECT 4
-- No null values to differentiate the empty tag list argument case.

Here are the tables and values from db<>fiddle used in the examples.

CREATE TABLE restaurants (
  id int PRIMARY KEY,
  name text,
  type text
);
CREATE TABLE food (
  id int PRIMARY KEY,
  restaurant_id int REFERENCES restaurants (id),
  waste_type_id smallint NOT NULL,
  product_id int NOT NULL,
  waste_kgs decimal NOT NULL, 
  prepared_kgs decimal NOT NULL,
  customers smallint NOT NULL,
  days date NOT NULL
);
CREATE TABLE tags (
  id int PRIMARY KEY,
  restaurant_id int REFERENCES restaurants (id),
  name text
);
INSERT INTO restaurants VALUES
(1, 'restaurant_1'),
(2, 'restaurant_2'),
(3, 'restaurant_3');
INSERT INTO food VALUES 
(1, 1, 1, 1, 1.7, 8.0, 96, '2023-1-1'),
(2, 1, 1, 10, 0.5, 7.0, 96, '2023-1-1'),
(3, 1, 1, 15, 0.3, 1.0, 96, '2023-1-1'),
(4, 2, 1, 12, 7.0, 0.8, 39, '2023-1-2'),
(5, 2, 1, 10, 1.1, 5.0, 39, '2023-1-2'),
(6, 2, 1, 11, 0.5, 1.5, 39, '2023-1-2'),
(7, 3, 1, 8, 10.0, 0.3, 97, '2023-1-3'),
(8, 3, 2, 17, 0.8, 0.0, 97, '2023-1-3'),
(9, 3, 1, 11, 0.5, 1.5, 39, '2023-1-3');
INSERT INTO tags VALUES
(1, 1, 'tag_1');

The following query from the same tables with an empty tag list gives a table with null values from the inner join so I can use it with union all on itself - like I attempt to do with the query above - and separate the results easily. It is demonstrated in db<>fiddle, too.

-- The first row of the result displays calculations from the tagged restaurants. 
-- In this case it has null values because tag list is empty.
-- The second row of the result displays calculations from all the restaurants without
-- considering any tags.
SELECT Total, Total - Drinks AS "Without drinks", Drinks   
FROM (
  SELECT 
  SUM (f.prepared_kgs) 
  FILTER (
    WHERE f.days BETWEEN '2000-1-1' AND '2099-1-1'
    AND f.restaurant_id = ANY (array[1, 2, 3])
  ) AS Total,
  SUM (f.prepared_kgs)
  FILTER (
    WHERE (f.product_id = 10 OR f.product_id = 17) 
    AND f.days BETWEEN '2000-1-1' AND '2099-1-1'
    AND f.restaurant_id = ANY (array[1, 2, 3])
  ) AS Drinks
  FROM food f
  INNER JOIN tags t 
    ON t.restaurant_id = f.restaurant_id 
  WHERE t.name = ANY (array[]::text[]) -- Empty tag list as an argument.  
)
UNION ALL
-- The same query without the INNER JOIN and tag list argument.
SELECT Total, Total - Drinks AS "Without drinks", Drinks   
FROM (
  SELECT 
  SUM (f.prepared_kgs) 
  FILTER (
    WHERE f.days BETWEEN '2000-1-1' AND '2099-1-1'
    AND f.restaurant_id = ANY (array[1, 2, 3])
  ) AS Total,
  SUM (f.prepared_kgs)
  FILTER (
    WHERE (f.product_id = 10 OR f.product_id = 17) 
    AND f.days BETWEEN '2000-1-1' AND '2099-1-1'
    AND f.restaurant_id = ANY (array[1, 2, 3])
  ) AS Drinks
  FROM food f
);
-- If tag list had elements, my program uses the first row.
-- Otherwise it uses the second row. Time series query returns 
-- rows for each day but the logic I try to use is the same.
total   Without drinks  drinks
 null             null    null
 25.1             13.1    12.0
SELECT 2
1

There are 1 best solutions below

2
bobflux On

how to extract the data of the two cases from the current query

SELECT 1 AS query, columns... FROM query1
UNION ALL    
SELECT 2 AS query, columns... FROM query2

The result will contain a column "query" with value 1 or 2 depending on which branch of the UNION ALL the row came from.

However in your case I think there's a better (and faster) solution.

The first query does an aggregate on a subset of the second. Here's a simplified version, because I'm too lazy to copy all your columns and tables:

SELECT group, sum(qty) FROM table WHERE key IN (...) GROUP BY group
UNION ALL
SELECT group, sum(qty) FROM table GROUP BY group

"key in (...)" is your tag condition, but it could be any boolean condition. If you do instead:

SELECT group, key IN (...) AS flag, sum(qty) FROM table GROUP BY group, flag

Then the result will contain the same data, organized differently: one row for the sum where the condition is satisfied, and one where it is not. To get the total sums, you have to add the sums for these two rows in the application.

You could also use ROLLUP, in which case postgres will do the addition.

This makes for a much simpler query, but if your application code expects a specific format and you can't change it, maybe it won't work. Although you could put the above query in a materialized CTE and reuse it twice to format the results in the way you need.

IMO the database's job is to return data as quickly as possible, and choosing the format in which data is returned is part of the optimization. In other words, formatting is not the database's job, but the application's job. So as suggested, I would move some of the logic into the application.