BigQuery update table with COALESCE

56 Views Asked by At

I've tried to follow a data cleaning tutorial on youtube . The mentor uses SQL Server and I use BigQuery (too much trouble install SQL Server on a Mac and I would like learn to adapt SQL syntax to BigQuery).

When it was instructed in SQL Server how to Populate Property Address data, the SQL syntax is

Select *
From PortfolioProject.dbo.NashvilleHousing
--Where PropertyAddress is null
order by ParcelID



Select a.ParcelID, a.PropertyAddress, b.ParcelID, b.PropertyAddress, ISNULL(a.PropertyAddress,b.PropertyAddress)
From PortfolioProject.dbo.NashvilleHousing a
JOIN PortfolioProject.dbo.NashvilleHousing b
    on a.ParcelID = b.ParcelID
    AND a.[UniqueID ] <> b.[UniqueID ]
Where a.PropertyAddress is null


Update a
SET PropertyAddress = ISNULL(a.PropertyAddress,b.PropertyAddress)
From PortfolioProject.dbo.NashvilleHousing a
JOIN PortfolioProject.dbo.NashvilleHousing b
    on a.ParcelID = b.ParcelID
    AND a.[UniqueID ] <> b.[UniqueID ]
Where a.PropertyAddress is null

I've managed to adapt some to BigQuery language

Select *
From `sturdy-filament-415311.NashvilleHousing.NHData`
order by ParcelID
;

Select a.ParcelID, a.PropertyAddress, b.ParcelID, b.PropertyAddress, COALESCE(a.PropertyAddress,b.PropertyAddress)
From `sturdy-filament-415311.NashvilleHousing.NHData` a
JOIN `sturdy-filament-415311.NashvilleHousing.NHData` b
  on a.ParcelID = b.ParcelID
  AND a.UniqueID_ <> b.UniqueID_
Where a.PropertyAddress is null
;

But somehow got error when it comes to the below query

Update a
SET PropertyAddress = COALESCE(a.PropertyAddress,b.PropertyAddress)
From sturdy-filament-415311.NashvilleHousing.NHData a
JOIN sturdy-filament-415311.NashvilleHousing.NHData b
    on a.ParcelID = b.ParcelID
    AND a.UniqueID_ <> b.UniqueID_ 
Where a.PropertyAddress is null

I've got Table "a" must be qualified with a dataset (e.g. dataset.table).

Then I fix it by this

Update sturdy-filament-415311.NashvilleHousing.NHData
SET PropertyAddress = COALESCE(a.PropertyAddress,b.PropertyAddress)
From sturdy-filament-415311.NashvilleHousing.NHData a
JOIN sturdy-filament-415311.NashvilleHousing.NHData b
    on a.ParcelID = b.ParcelID
    AND a.UniqueID_ <> b.UniqueID_ 
Where a.PropertyAddress is null

And got UPDATE/MERGE must match at most one source row for each target row

How do I fix this?

2

There are 2 best solutions below

4
Parfait On

Unlike some other DBMS's, SQL Server allows an update on multiple matching rows. However, without additional logic, such updates run only on first instance of joined matches based on indexes which is a non-deterministic outcome that can change with each run and hence not reliable.

Possibly, due to this scenario, BigQuery raises an error on such multi-row updates.

If goal is to populate missing property addresses with any non-missing property address by corresponding ParcelID, consider using an aggregate query.

Also, be sure to consistently use backticks as hyphen - in table names can raise syntax errors. Also, let's heed, Bad Habits to Kick: Table aliases like (a, b, c) or (t1, t2, t3), by using more informative aliases:

UPDATE `sturdy-filament-415311.NashvilleHousing.NHData` nh
SET nh.PropertyAddress = agg.MaxPropertyAddress
FROM (
   SELECT
       ParcelID, 
       MAX(agg.PropertyAddress) AS MaxPropertyAddress
   FROM `sturdy-filament-415311.NashvilleHousing.NHData`
   GROUP BY 
       ParcelID
) agg
WHERE nh.ParcelID = agg.ParcelID
  AND nh.PropertyAddress IS NULL
1
Linh Nguyen On

I resort to ChatGPT and here's the solution

UPDATE 
    `sturdy-filament-415311.NashvilleHousing.NHData`
SET 
    PropertyAddress = (
        SELECT 
            COALESCE(a.PropertyAddress, b.PropertyAddress)
        FROM 
            `sturdy-filament-415311.NashvilleHousing.NHData` AS a
        JOIN 
            `sturdy-filament-415311.NashvilleHousing.NHData` AS b
        ON 
            a.ParcelID = b.ParcelID
            AND a.UniqueID_ <> b.UniqueID_
        WHERE 
            a.ParcelID = b.ParcelID
            AND a.PropertyAddress IS NULL
        LIMIT 1
    )
WHERE 
    PropertyAddress IS NULL;