So I have this case where I made a python script that make some API calls, collect a bunch of data, encrypt some sensitive fields and them write it on a BigQuery table.
Everything went fine up to here.
Now I need to create a view, querying all columns from this table and apply the decrypt on the fields that were encrypted on previous step.
To encrypt the data I've used python library google-cloud-kms as shown below:
import base64
from google.cloud import kms_v1
class kms_util:
def __init__(self, project_id:str, location:str, key_ring:str, crypto_key:str) -> None:
self.__project_id = project_id
self.__location = location
self.__key_ring = key_ring
self.__crypto = crypto_key
self.client_kms = kms_v1.KeyManagementServiceClient()
self.__key_name = self.client_kms.crypto_key_path(project=self.__project_id, location=self.__location, key_ring=self.__key_ring, crypto_key=self.__crypto)
def encrypt_data(self, data):
plaintext = str(data).encode('utf-8')
response = self.client_kms.encrypt(request={'name': self.__key_name, 'plaintext': plaintext})
return str(base64.b64encode(response.ciphertext),'utf8')
def decrypt_data(self, ciphertext):
response = self.client_kms.decrypt(request={'name': self.__key_name, 'ciphertext': base64.b64decode(ciphertext)})
return response.plaintext.decode('utf-8')
Explaining it a little bit its a class that need all the params to specify the path to the key on Google Cloud KMS. From that It will give us access to kms client, the encrypt_data and decrypt_data methods who will use the key specified with the params used in the constructor.
Note that to encrypt I just need to call kms_v1.client.encrypt method passing the path of the key and the data to encrypt.
The big issue here is on BigQuery. I can't figure out how to decrypt the data using the same process.
My plan was to create an UDF function to decrypt the data, something like this:
CREATE OR REPLACE FUNCTION `project.udf.fnc_decrypt`(ciphertext ANY TYPE)
RETURNS STRING AS (
(
SELECT
# doc: AEAD.DECRYPT_STRING(keyset, ciphertext, additional_data)
AEAD.DECRYPT_STRING(
# doc: KEYS.KEYSET_CHAIN(kms_resource_name, first_level_keyset)
KEYS.KEYSET_CHAIN(
'gcp-kms://projects/<project>/locations/<location>/keyRings/<key_ring>/cryptoKeys/<crypto_key>',
first_level_keyset),
FROM_BASE64(ciphertext), additional_data)
)
);
And then call it on the fields of the view who were encrypted:
SELECT
foo,
bar,
`project.udf.fnc_decrypt`(col_to_decrypt)
FROM `project.dataset.table_with_api_data`
BUT consulting BigQuery encryption functions documentation I just found AEAD functions, that need more than just the key and the chiphertext to do so.
It also needs first_level_keyset and an associated_data, as shown in the second snippet, things that I didn't use in my python script, using google kms library.
Any suggestion how I should proceed?
Thanks in advance
There are multiple things being confused here:
The keyset that the BigQuery AEAD methods use is different than a KMS key. You can use a KMS key to wrap a keyset, but not directly doing BigQuery AEAD encryption/decryption as you presented. I mean your first script should work fine alone, but not together with the second script. You can create a new keyset, wrap it with a KMS key, use it for encryption, and store it within BigQuery. Upon decryption, you could read and unwrap the key into a keyset, and use it for decryption.
The additional authentication data (AAD) is part of AES-GCM definition (RFC-7714), but it's optional (and unencrypted) (see link) and you have omitted it in the first script. If you provide an AAD during encryption, you have to provide the same AAD upon decryption too.