Tables with unapplied upsert data cannot be queried through prefix

560 Views Asked by At

In BigQuery, I have several tables with the same name prefix. The suffix of each table is a month, written in format yyyy_mm like 2023_09.

These tables are created by Google Datastream, copied over from PostgreSQL partitioned table.

For the previous months I have used the following query to create a view to combine these tables

SELECT
  *,
  REPLACE(_TABLE_SUFFIX, '_', '-') AS month,
FROM
  `my_project_id.my_dataset.my_table_*`
WHERE
  REGEXP_CONTAINS(_TABLE_SUFFIX, '^[0-9]{4}_[0-9]{2}$')

It worked well, but recently I encountered a new error in BigQuery which stops me from querying the view.

Tables with unapplied upsert data cannot be queried through prefix. First table with upsert data is my_project_id:my_dataset.my_table_2022_09.

I don't know how to fix this or what workaround can be applied, except writing all tables explicitly in the query.

3

There are 3 best solutions below

0
vilozio On BEST ANSWER

I've raised a Google Support Case and received an answer that this is a known issue, but the current error message is not very accurate. There will be a support for this feature.

The suggested workaround is to use the ‘UNION’, the easiest way to query all tables.

I created the view and added a fixed value column to each table. Filtering by this column reduces the scanned bytes.

CREATE VIEW some.view as ( 
  SELECT *, '2023-01' as month, 
  FROM `my_project_id.my_dataset.my_table_2023_01`
  UNION ALL
  SELECT *, '2023-02' as month, 
  FROM `my_project_id.my_dataset.my_table_2023_02`
  UNION ALL
  SELECT *, '2023-03' as month, 
  FROM `my_project_id.my_dataset.my_table_2023_03`
);

SELECT * FROM some.view WHERE month = '2023-03'
1
Delx On

Did you consider replicating the table by partition root? See here for details. This way you do not have to deal with querying different tables.

In any case this behaviour is intentional, BigQuery identifies that on one partition there are upsert events which have not been processed yet. Therefore, it doesn't allow usage in the materialised view. I suspect that using the table my_project_id:my_dataset.my_table_2022_09 in any view will give the same error.

As for why this is happening now. I have opened an issue at Google IssueTracker, where I am facing the same problem. It seems that even if there are no unprocessed CDC events (in the streaming buffer), the upsert_stream_apply_watermark is not updated correctly. You can find more details on how this should work here.

Also it seems that in the past there was a similar problem with Datastream. Perhaps opening an issue there might get attention.

As a temporary workaround you can try to force a backfill of the table through the Datastream UI. This should force update the upsert_stream_apply_watermark in my experience.

0
swapnil bhoite On

It seems like the issue is related to querying tables with unapplied upsert data in BigQuery. This error typically occurs when you're trying to query a table that has unapplied upsert mutations.

Upsert mutations are a type of operation in BigQuery that allows you to update existing rows in a table and insert new rows if they do not already exist. When you perform an upsert operation, BigQuery applies these changes in two stages. First, it writes the upsert mutations to a separate delta log.

Then, it applies these mutations to the base table. If you try to query the table while these mutations are still in the delta log and not yet applied to the base table, you'll encounter the error you're seeing.