Can I parameterize the mssparkutils.fs.mount statement?

161 Views Asked by At

Goal: Automate the cyclical processing of data files in Synapse by using a 'config' file (csv) that will list the desired input and output file paths for specific subject types. Full disclosure: I'm a nube and have spent hours googling options but have not been successful. If there is a better way please advise.

The csv file (myconfig.csv) contains columns SubjectType, FullInputFilePath, FullOutputFilePath. I want to use these fields to parameterize the mssparkutils.fs.mount statement.

My sample values in the csv are SubjectType = 'Customer', FullInputFilePath = 'abfss://[email protected]', FullOutputFilePath = 'abfss://[email protected]'

After I successfully mounting the myconfig.csv, I need to read the FullInputFilePath field and populate another mssparkutils.fs.mount statement in order to dynamically change the mount point. I've used the first statement to extract the column that I want, but when I attempt to use it as a parm I get 'TypeError: Row(fullinputpath='abfss://[email protected]/customer.csv') has the wrong type - (<class 'str'>,) is expected.

fullinputpath =(df
.where(df.SubjectType == 'Customer')
.select(df.FullInputFilePath)
.first())

mssparkutils.fs.mount( 
fullinputpath , 
"/test", 
{"LinkedService":"mylinkedservice"} 

This is in an effort to remove hardcoded locations as they may change as new data is added. Any assistance is greatly appreciated

I"ve googled and tried everything I can find: casting as a str, using withColumn, sqlContext, and many more options

0

There are 0 best solutions below