Bigquery: Aggregate ARRAY of STRUCTs in an Materialzed View

43 Views Asked by Till K. At 12 February 2024 at 22:56

We are collecting error information from machines in BigQuery. Each error consists of two rows, one is written when the error occurred, the second when it was resolved. All error related information (error description, involved machine, meta information...) is compiled when the error occurs, i.e. in the first row. The second row, written when the error is resolved, contains mainly the timestamp of when it was resolved. Both rows share a common errorIdentifier that connects them.

The error messages and hints to solve the issue are available as ARRAYs of STRUCTs (containing the language code as key and the error text as value). The error text itself is saved as an ARRAY of several textlines (due to historical reasons).

To clarify, the data structure looks like this:

  CREATE TEMP TABLE ErrorData ( 
    timestampOccurred TIMESTAMP,
    timestampResolved TIMESTAMP,
    machineId STRING,
    errorIdentifier STRING,
    errorDescription ARRAY<STRUCT<languageCode STRING, textlines ARRAY<STRING>>>,
    solution ARRAY<STRUCT<languageCode STRING, textlines ARRAY<STRING>>>
  );
  

INSERT INTO ErrorData 
VALUES(Timestamp('2024-02-12 10:0:0'), NULL, 'machine1', 'uuid-1', [STRUCT("en", ['en-line1', 'en-line2']), STRUCT("de", ['de-line1', 'de-line2'])], [STRUCT("en", ['en-line1', 'en-line2']), STRUCT("de", ['de-line1', 'de-line2'])]);

INSERT INTO ErrorData 
VALUES(NULL, Timestamp('2024-02-12 11:0:0'), NULL, 'uuid-1', NULL, NULL);

INSERT INTO ErrorData 
VALUES(Timestamp('2024-02-12 11:30:0'), NULL, 'machine2', 'uuid-2', [STRUCT("en", ['en-line1', 'en-line2']), STRUCT("de", ['de-line1', 'de-line2'])], [STRUCT("en", ['en-line1', 'en-line2']), STRUCT("de", ['de-line1', 'de-line2'])]);

INSERT INTO ErrorData 
VALUES(NULL, Timestamp('2024-02-12 12:30:0'), NULL, 'uuid-2', NULL, NULL);

In order to further process the error data, I want to combine both rows into one. The final result of the example above should be:

Because we have tons of error data and the data does not change as soon as the error is resolved, I want to realize this using a (partitioned and clustered) Materialized View. However, I did not yet find a way to aggregate the ARRAYs of STRUCTs in the grouping process. ANY_VALUE was my first approach, but it can result in NULL, because there is no IGNORE NULLS for this operation. In my opinion, something like an ANY_NON_NULL_VALUE aggregation function would be needed for the error data. The definition of the Materialized View would then look something like this:

CREATE MATERIALIZED VIEW `project-id.dataset.my-mat-view` AS (
  SELECT
    MAX(timestampOccurred) AS timestampOccurred,
    MAX(timestampResolved) AS timestampResolved,
    MAX(machineId) AS machineId,
    errorIdentifier,
    ANY_NON_NULL_VALUE(errorDescription) AS errorDescription,
    ANY_NON_NULL_VALUE(solution) AS solution
  FROM
    `project-id.dataset.my-table`
  GROUP BY
    errorIdentifier)

Another idea was to use ARRAY_AGG in combination with IGNORE NULLS on the UNNESTed arrays, but this operation is not supported to create Materialized Views.

Does anyone have a solution for this problem?

Original Q&A

Bigquery: Aggregate ARRAY of STRUCTs in an Materialzed View

There are 0 best solutions below

Related Questions in ARRAYS

Related Questions in STRUCT

Related Questions in GROUP-BY

Related Questions in GOOGLE-BIGQUERY

Related Questions in MATERIALIZED-VIEWS

Trending Questions

Popular # Hahtags

Popular Questions