We can use ARRAY_AGG aggregate function to generate an array from data.
The goal is to find a way to limit the input to specific number of entries like ARRAY_AGG(...) WITHIN GROUP(... LIMIT 3) without restructuring the main query.
Sidenote: UDAF (User-Defined Aggregate Functions) are not available at the moment of writing.
For sample data:
CREATE OR REPLACE TABLE tab(grp TEXT, col TEXT) AS
SELECT * FROM VALUES
('Grp1', 'A'),('Grp1', 'B'),('Grp1', 'C'),('Grp1', 'D'), ('Grp1', 'E'),
('Grp2', 'X'),('Grp2', 'Y'),('Grp2', 'Z'),('Grp2', 'V'),
('Grp3', 'M'),('Grp3', 'N'),('Grp3', 'M');
Output:
GRP ARR_LIMIT_3
Grp3 [ "M", "M", "N" ]
Grp2 [ "V", "X", "Y" ]
Grp1 [ "A", "B", "C" ]
Usage of ARRAY_SLICE is not an option if underlying ARRAY_AGG exceeds 16MB
SELECT grp,
ARRAY_SLICE(ARRAY_AGG(col), 1,3))
FROM big_table
JOIN ...
GROUP BY grp;
-- Result array of ARRAYAGG is too large
It is possible to achieve similar effect by using MIN_BY/MIN_MAX function:
Output:
If the sorting is irrelevant then
MIN_BY(col, 'some_constant', 3).ARRAY_UNIQUE_AGGorARRAY_AGG(DISTINCT ...)is:Output:
It is possible to handle
WITHIN GROUP(ORDER BY <some_col> ASC/DESC)too: