I have two very big tables interleaved by its primary key (just one column, so it is one-to-one relationship). A few rows on the parent table have no child on the other and I want to find them.
Currently, I am doing a JOIN query and searching by NULL values:
SELECT Users.userID
FROM Users
LEFT JOIN Licenses
ON Users.userID = Licenses.userID
WHERE Licenses.license IS NULL
But this query still needs to read all Users table to do the JOIN, what is really slow.
I know that if the license column was in the table Users I could create an index with it and would only need to read the rows with NULL license values, but it is not an option to put the column "license" in the same "Users" table.
Is there a way to just pass through the userIDs that do not have yet a license using different tables? e.g. an index with columns from different tables. (I am using interleaved but would it be better foreign keys?)
To clarify, is the following the correct understanding?
In order to avoid scanning the Users table, perhaps the Licenses table can include even UserIds without licenses. Then, an Index on Licenses.license can be created to help search for licenses that are NULL.
Currently, the Licenses table does not have all the UserIds. If going forward you would like to populate Licenses even when the UserId has a NULL license, then one way is:
First update the application upon Users table insertion to also insert into Licenses even when license is NULL. Also update any other applicable areas of the application.
Once that change is deployed, do a one-time backfill of Licenses with all UserIds in Users that do not have a corresponding row in Licenses. Although this step may take some time, the scan over the Users table and corresponding check in the Licenses table can be done with a read-only transaction [1] which does not take locks.
[1] https://cloud.google.com/spanner/docs/transactions#read-only_transactions
Alternatively, the application can track additional state when inserting/deleting UserIds in Users and inserting/deleting licenses in Licenses. For example, keep track of a MissingLicense table containing all UserIds without a license. The MissingLicense table would have a primary key UserId. When inserting a UserId without a license, insert into MissingLicense as well. When inserting a license for a user, delete from MissingLicense. When deleting a UserId from Users, delete from MissingLicense. MissingLicense can be interleaved under the Users table with ON DELETE CASCADE.