Currently in my Azure Databricks workspace, the unity catalog is enabled with external locations configured.
I can read the dataframe API using:
testDf = spark.read.option("header", True).format("csv").load('abfss://container_name@storage_account_name.dfs.core.windows.net/RDD_testing/input.txt')
However, if I use the RDD API to read the file such as:
rdd_in=sc.textFile('abfss://container_name@storage_account_name.dfs.core.windows.net/RDD_testing/input.txt').
It shows the error:
Failure to initialize configuration for storage account storage_account_name.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key
I also tried to set the configuration below but no luck:
accessKey = 'my_access_key'
spark.conf.set("spark.hadoop.fs.azure.account.key.storage_account_name.dfs.core.windows.net", accessKey)
I want to read the file using the RDD API directly instead of reading it as dataframe then converting back to RDD.
How can I fix this ?
Thanks
I think it's crucial to correctly configure the access key for your storage account. This configuration should be done as early as possible, preferably immediately after initializing your Spark session. The error you're seeing suggests that the Spark context (sc) is not properly configured with the access key for ADLS Gen2. Here's how you can set the configuration directly on the Spark context's Hadoop configuration, which is a more direct method and often resolves such issues:
Hope this can help with solving your problem.