getting ValueError: Cannot determine path without bucket name

21 Views Asked by Olivier At 26 March 2024 at 11:25

I'm running a PySpark job on a Dataproc cluster using Cloud Composer as the orchestration tool.

The job instantiates an object MyClass with a bucket attribute defined like so:

from google.cloud import storage
from .constants import GCS_BUCKET_NAME

class MyClass:
    bucket = storage.Client().bucket(GCS_BUCKET_NAME)

    def load_data(self, filepath: str, contents: str) -> None:
        blob = self.bucket.blob(filepath)
    blob.upload_from_string(contents)

and GCS_BUCKET_NAME is defined in a constants.py file as

GCS_BUCKET_NAME = os.getenv("DATA_BUCKET_NAME")

The DATA_BUCKET_NAME environment variable is set using an initialization script:

#!bin/bash
DATA_BUCKET_NAME=$(/usr/share/google/get_metadata_value attributes/DATA_BUCKET_NAME)
echo "DATA_BUCKET_NAME=${DATA_BUCKET_NAME}" >> /etc/environment

When I connect with SSH into the cluster nodes of the Dataproc cluster, the environment variable is well defined.

But when the DAG runs in Cloud Composer, the job fails when the load_data function is called with the following error: ValueError: Cannot determine path without bucket name

What I've tried so far but none of these seem to work:

use the storage.Client().get_bucket method instead of storage.Client().bucket
specify the project id: storage.Client().get_bucket(GCS_BUCKET_NAME, PROJECT_ID)
allow more storage permissions to the dataproc service account

Thanks for your help!

Original Q&A

getting ValueError: Cannot determine path without bucket name

There are 0 best solutions below

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in GOOGLE-CLOUD-STORAGE

Related Questions in GOOGLE-CLOUD-DATAPROC

Related Questions in GOOGLE-CLOUD-COMPOSER

Trending Questions

Popular # Hahtags

Popular Questions