Get the list of tables from fabric workspace using abfss path

1.1k Views Asked by At

I am currently attempting to retrieve the list of tables from a lakehouse located in a separate workspace by utilizing the ABFS (Azure Blob File System) path. Despite my efforts, the codes I have employed thus far have not yielded the desired outcome. I am now contemplating whether this task is feasible at all. Code Example shown below.

olspath = "abfss://path................"




#df=spark.read.format('delta').load(olspath)
#df=spark.read.load(olspath)
df=spark.read.schema(olspath)
#df.write.mode("overwrite").format('delta').save("Tables/"+"Account")
df.show()
2

There are 2 best solutions below

2
JayashankarGS On BEST ANSWER

Using dbutils you can get tables path, by checking them if it is delta or not you get tables.

dbutils.fs.ls("<Your_abfss_path>")

enter image description here

code:

from delta.tables import *
files = dbutils.fs.ls("Your_abfss_path")
print(f"Tables in given path")
for i in files:
    if DeltaTable.isDeltaTable(spark,i[0]):
        DeltaTable.forPath(spark,i[0]).toDF().show()

enter image description here

In Onelake

enter image description here

3
Mart Roben On

Edit:

Just as I finished this long post, I found another way to do it with a simple oneliner:

# no need for imports
mssparkutils.fs.ls("abfss://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx@onelake.dfs.fabric.microsoft.com/yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/Tables/")

Original post:

I was able to achieve it in Fabric by using the fsspec module, which is an unified Python interface for interacting with different types of file systems. It seems to be what pyspark is also using under the hood.

Initiate the filesystem:

import fsspec                       # Python unified interface for different file systems

# Inputs
filesystem_code = "abfss"           # Azure Blob File System Secure
onelake_account_name = "onelake"    # This is "onelake" for everyone
onelake_host = "onelake.dfs.fabric.microsoft.com"

# Initiate filesystem
onelake_filesystem_class = fsspec.get_filesystem_class(filesystem_code)

onelake_filesystem = onelake_filesystem_class(
    account_name=onelake_account_name,
    account_host=onelake_host)

This gives a fsspec_wrapper.trident.core.OnelakeFileSystem object, which seems to be only available within Fabric. I got an error trying to get a "abfss" filesystem class on my local device.

List files (tables) with the filesystem object:

import os

# Inputs
workspace_id = "xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxxxxxxx"
lakehouse_id = "yyyyyyyy-yyyy-yyyy-yyyyyyyyyyyyyyyyy"
directory = os.path.join("Tables")     # add additional nesting as needed

# Get file list
dir_path = os.path.join(workspace_id, lakehouse_id, directory)
dir_contents = onelake_filesystem.ls(dir_path, detail=True)

for item in dir_contents:
    print(item)

Sample result:

{'name': 'xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxxxxxxx/yyyyyyyy-yyyy-yyyy-yyyyyyyyyyyyyyyyy/Tables/mytable1', 'size': None, 'type': 'directory'}
{'name': 'xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxxxxxxx/yyyyyyyy-yyyy-yyyy-yyyyyyyyyyyyyyyyy/Tables/othertable', 'size': None, 'type': 'directory'}
{'name': 'xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxxxxxxx/yyyyyyyy-yyyy-yyyy-yyyyyyyyyyyyyyyyy/Tables/anothertable', 'size': None, 'type': 'directory'}

You can get the workspace_id and lakehouse_id from the abfss path of any table in the lakehouse that you're trying to list. The abfss path has the following format and you can find it in object properties:

abfss://xxxxxxxx-xxxx-xxxx-xxxxxxxxxxxxxxxxx@onelake.dfs.fabric.microsoft.com/yyyyyyyy-yyyy-yyyy-yyyyyyyyyyyyyyyyy/Tables/table_name

enter image description here

I'm not quite sure if some specific permissions are needed - I just have access to both workspaces in Fabric and it seems to work.