Using DVC with gcloud build

162 Views Asked by At

Using dvc with gcloud / cloud build and google storage isn't well described.

Use case: Build a docker image with a model stored on google storage

Authentication is assumed as requiring service account keys https://dvc.org/doc/user-guide/data-management/remote-storage/google-cloud-storage#custom-authentication

GAuth requires a file, and an env GOOGLE_APPLICATION_CREDENTIALS with the location of the private key for a service account on disk.

Which raises a problem of how do you store those keys for your build? There are several discussions around using gloud auth application-login that don't solve the problem.

1

There are 1 best solutions below

0
pjaol On

Options

  • GCSFuse mount an storage bucket as a file system
    • Viable and requires modifying dvc remote to use a local cache
  • Using KMS to store and decrypt a service key private file
    • Still leaves overhead of managing a key file
  • Have the cloud build service account manage dvc pull
    • Does not require any persistent authentication files

Using the cloud build service appears to be the best option

Assumptions

  • You are using dvc and google storage
  • You are using source control and storing your .dvc reference files in source control
  • You are using cloud build with a repo cloudbuild.yaml file (can also be done in json)

The approach is to add read permissions to the build principal to access the bucket you are using for your models / data

gcloud storage buckets add-iam-policy-binding \
                gs://BUCKET_NAME \
                --member=CLOUD_BUILD_PRINCIPAL_IDENTIFIER \
                --role=storage.objects.get

Substituting CLOUD_BUILD_PRINCIPAL_IDENTIFIER for the cloudbuilder service account, which is generally found on the settings tab of your cloud builds page https://console.cloud.google.com/cloud-build/settings/service-account And BUCKET_NAME for the bucket you are storing your models / data in

Once that's done modify the cloudbuild.yaml to add a step using a python image, install dvc and pull the modules from your source control references. Modify dvc pull models as required.

steps:
  - name: python
    entrypoint: bash
    args: ['-c', 'pip install -U dvc dvc[gs]; dvc pull models;']
    id: Model_Pull
..... 

DVC will authenticate using a metaserver that's available in the cloud and not require GOOGLE_APPLICATION_CREDENTIALS or service key files. All steps in a cloud build load docker images, which mount /workspace/[your code], any modifications to that file system remain for the next step e.g. Build

At which point you will have performed a dvc pull on your models allowing a Dockerfile with a

COPY models /destination

to copy your models to the appropriate destination An additional tip is that you may need to run a chown or chmod on the destination directory if you are using a non-root user e.g.

RUN chmod -R 644 /models