We are collecting error information from machines in BigQuery. Each error consists of two rows, one is written when the error occurred, the second when it was resolved. All error related information (error description, involved machine, meta information...) is compiled when the error occurs, i.e. in the first row. The second row, written when the error is resolved, contains mainly the timestamp of when it was resolved. Both rows share a common errorIdentifier that connects them.
The error messages and hints to solve the issue are available as ARRAYs of STRUCTs (containing the language code as key and the error text as value). The error text itself is saved as an ARRAY of several textlines (due to historical reasons).
To clarify, the data structure looks like this:
CREATE TEMP TABLE ErrorData (
timestampOccurred TIMESTAMP,
timestampResolved TIMESTAMP,
machineId STRING,
errorIdentifier STRING,
errorDescription ARRAY<STRUCT<languageCode STRING, textlines ARRAY<STRING>>>,
solution ARRAY<STRUCT<languageCode STRING, textlines ARRAY<STRING>>>
);
INSERT INTO ErrorData
VALUES(Timestamp('2024-02-12 10:0:0'), NULL, 'machine1', 'uuid-1', [STRUCT("en", ['en-line1', 'en-line2']), STRUCT("de", ['de-line1', 'de-line2'])], [STRUCT("en", ['en-line1', 'en-line2']), STRUCT("de", ['de-line1', 'de-line2'])]);
INSERT INTO ErrorData
VALUES(NULL, Timestamp('2024-02-12 11:0:0'), NULL, 'uuid-1', NULL, NULL);
INSERT INTO ErrorData
VALUES(Timestamp('2024-02-12 11:30:0'), NULL, 'machine2', 'uuid-2', [STRUCT("en", ['en-line1', 'en-line2']), STRUCT("de", ['de-line1', 'de-line2'])], [STRUCT("en", ['en-line1', 'en-line2']), STRUCT("de", ['de-line1', 'de-line2'])]);
INSERT INTO ErrorData
VALUES(NULL, Timestamp('2024-02-12 12:30:0'), NULL, 'uuid-2', NULL, NULL);
In order to further process the error data, I want to combine both rows into one. The final result of the example above should be:
Because we have tons of error data and the data does not change as soon as the error is resolved, I want to realize this using a (partitioned and clustered) Materialized View. However, I did not yet find a way to aggregate the ARRAYs of STRUCTs in the grouping process. ANY_VALUE was my first approach, but it can result in NULL, because there is no IGNORE NULLS for this operation. In my opinion, something like an ANY_NON_NULL_VALUE aggregation function would be needed for the error data. The definition of the Materialized View would then look something like this:
CREATE MATERIALIZED VIEW `project-id.dataset.my-mat-view` AS (
SELECT
MAX(timestampOccurred) AS timestampOccurred,
MAX(timestampResolved) AS timestampResolved,
MAX(machineId) AS machineId,
errorIdentifier,
ANY_NON_NULL_VALUE(errorDescription) AS errorDescription,
ANY_NON_NULL_VALUE(solution) AS solution
FROM
`project-id.dataset.my-table`
GROUP BY
errorIdentifier)
Another idea was to use ARRAY_AGG in combination with IGNORE NULLS on the UNNESTed arrays, but this operation is not supported to create Materialized Views.
Does anyone have a solution for this problem?

