Goal: Automate the cyclical processing of data files in Synapse by using a 'config' file (csv) that will list the desired input and output file paths for specific subject types. Full disclosure: I'm a nube and have spent hours googling options but have not been successful. If there is a better way please advise.
The csv file (myconfig.csv) contains columns SubjectType, FullInputFilePath, FullOutputFilePath. I want to use these fields to parameterize the mssparkutils.fs.mount statement.
My sample values in the csv are SubjectType = 'Customer', FullInputFilePath = 'abfss://[email protected]', FullOutputFilePath = 'abfss://[email protected]'
After I successfully mounting the myconfig.csv, I need to read the FullInputFilePath field and populate another mssparkutils.fs.mount statement in order to dynamically change the mount point. I've used the first statement to extract the column that I want, but when I attempt to use it as a parm I get 'TypeError: Row(fullinputpath='abfss://[email protected]/customer.csv') has the wrong type - (<class 'str'>,) is expected.
fullinputpath =(df
.where(df.SubjectType == 'Customer')
.select(df.FullInputFilePath)
.first())
mssparkutils.fs.mount(
fullinputpath ,
"/test",
{"LinkedService":"mylinkedservice"}
This is in an effort to remove hardcoded locations as they may change as new data is added. Any assistance is greatly appreciated
I"ve googled and tried everything I can find: casting as a str, using withColumn, sqlContext, and many more options