Having trouble to read the file in DBFS

34 Views Asked by At

Im having some issues to read the the file which is stored in DBFS. it was not a problem before but i made some modification to my workspace and storage account. Late me list them:

  1. I wanted to enable the Unity catalog and created the metastore, then assiged that metastore to my workspace.
  2. Then I Created storage credentials, and external locations ( with the access connector having a storage blob data contributor role on my storage account.
  3. I also changed my cluster from No isolation shared to shared access mode which supports the unity catalog.

Beofre that there was onlu mounting done to the storage account.

Below is my code where Im trying to set the directory if it does not exits and then read the file from that directory. Yous the functions defined in the class.

  class gpg_encryption(gnupg.GPG):

  asc_key_path = '/dbfs/keys/'
 
  def __init__(self):
    self.gpg = gnupg.GPG()
    self.gpg.encoding = 'utf-8'
    self.create_key_path_dir()
   
  def create_key_path_dir(self):
    """
    Creates a folder to store keyfiles.
    """
    if not os.path.exists(self.asc_key_path):
      os.makedirs(self.asc_key_path)
     
  def create_asc_keyfile(self, keyfile_name, key):
    """
    Write key as armored ASCII
    """
    with open(self.asc_key_path + keyfile_name + '.asc', 'w') as f:
      f.write(key)

problem occures when check if the current directory exits or no, It wants me to change the path from /dbfs/keys/ to dbfs:/keys. After changeing that it has the prblem to open that with that path and asking me to change back to /dbfs/keys/.

Could you please help in fixing this and some documentation would be great, as Im reading some documentation myself and I got little bit confused too.

1

There are 1 best solutions below

0
JayashankarGS On

Whenever you are working in a Databricks context, such as reading through Spark or listing using dbutils, you need to use the URI schema way: protocol:/path_to_data.

Example:

  1. dbfs:/tmp/sample.csv for accessing a file in dbfs
  2. file:/tmp/demo.csv for accessing a file from the root location
  3. wasbs://[email protected]
  4. abfss://[email protected]

When it comes to a local or Python context, you need to access files from the root location with POSIX-style paths and prefix the path with /.

Example:

  1. /dbfs/... for accessing a file in dbfs
  2. /Workspace/... for accessing a file in Workspace

Usually, Python modules can only use POSIX-style paths.

Below, you can see it is creating folders in different contexts.

enter image description here

To learn more about working with files in Databricks, refer to this documentation.