I'm really new to Azure, I started to set up the pipeline but I'm still stuck
I have xlsx format files stored in an Azure storage account that contain one or more sheets.
I want to set up a pipeline that will go through all the folders and subfolders of the specified path to convert the xlsx files into csv files.
Two figs are possible if the 'AAA' file contains three tabs 'Sheet1', 'Sheet2' and 'Sheet3', the pipeline must generate 3 csv files
- 'AAA_sheet1'
- 'AAA_sheet2'
- 'AAA_sheet3' which will be stored in the specific subfolder
the second case the 'AAA' file contains only one tab 'Sheet1' so a single csv file will be generated: 'AAA_Sheet1'
Thanks for your help
I would like to be helped to solve this problem
Directly, you cannot get the sheet names in ADF. You can run the Python code in a Synapse notebook mentioned below to get sheet names from Excel files:
You will get the sheet names as shown below:
Run the notebook using notebook activity. Add a foreach activity to the notebook activity and use the below expression for the item:
Add a copy activity inside foreach, add Excel as source, and delimited as sink datasets with the parameter
sheetNameandfileNamewith @item() expression. Debug the pipeline; it will run successfully as shown below:And the sheets copy successfully in .CSV format as shown below:
Here is the pipeline JSON for your reference: