Copy filenames at source based on wildcard to be transferred to seperate folders in Sink using Azure Data Factory

67 Views Asked by At

I'm looking to configure an ADF job to transfer .txt files from the source container using the Copy function and transfer them to an ADLS Sink container in their respective folders.

For example:

Source Container A Directory -fileA-20240201.txt -fileB-20240302.txt -fileB-20230201.txt -fileC-20230201.txt

Sink Container B Directory -Folder_A -Folder_B -Folder_C

transfer fileA* using wildcard and transfer those files to Folder_A. transfer fileB* using wildcard and transfer those files to Folder_B. transfer fileC* using wildcard and transfer those files to Folder_C.

Looking for help on config details setting up this ADF job.

Tried to use ForEach nested with a Switch Activity using Case Activity for each file name followed by copy activity.

1

There are 1 best solutions below

2
Rakesh Govindula On

If you knew the number of target folders and list of target folders, then you can try the below approach.

These are my sample input files:

sourcecon
    folder
        fileA-20240201.txt
        fileA-20240311.txt
        fileB-20230201.txt
        fileB-20240302.txt
        fileC-20230201.txt
        fileC-20240310.txt

First create an array variable with letters ["A","B","C",..<till required folders>]. Give this array to a For-Each activity.

Create source dataset and target datasets. In target dataset, create a string parameter folder_name and give that to the folder path of the dataset like below.

enter image description here

Inside for-loop, take a copy activity and give the created source and sink datasets. In the source, use the wild card path with this expression file@{item()}* like below.

enter image description here

In the sink, give the expression Folder_@{item()} to the dataset parameter.

enter image description here

Debug the pipeline and all required files will be copied to the respective folders like below.

Target folders:

enter image description here

Files created in the target folder:

enter image description here

My pipeline JSON:

{
    "name": "pipeline1",
    "properties": {
        "activities": [
            {
                "name": "ForEach1",
                "type": "ForEach",
                "dependsOn": [],
                "userProperties": [],
                "typeProperties": {
                    "items": {
                        "value": "@createArray('A','B','C')",
                        "type": "Expression"
                    },
                    "isSequential": true,
                    "activities": [
                        {
                            "name": "Copy data1",
                            "type": "Copy",
                            "dependsOn": [],
                            "policy": {
                                "timeout": "0.12:00:00",
                                "retry": 0,
                                "retryIntervalInSeconds": 30,
                                "secureOutput": false,
                                "secureInput": false
                            },
                            "userProperties": [],
                            "typeProperties": {
                                "source": {
                                    "type": "DelimitedTextSource",
                                    "storeSettings": {
                                        "type": "AzureBlobFSReadSettings",
                                        "recursive": true,
                                        "wildcardFolderPath": "folder",
                                        "wildcardFileName": {
                                            "value": "file@{item()}*",
                                            "type": "Expression"
                                        },
                                        "enablePartitionDiscovery": false
                                    },
                                    "formatSettings": {
                                        "type": "DelimitedTextReadSettings"
                                    }
                                },
                                "sink": {
                                    "type": "DelimitedTextSink",
                                    "storeSettings": {
                                        "type": "AzureBlobFSWriteSettings"
                                    },
                                    "formatSettings": {
                                        "type": "DelimitedTextWriteSettings",
                                        "quoteAllText": true,
                                        "fileExtension": ".txt"
                                    }
                                },
                                "enableStaging": false,
                                "translator": {
                                    "type": "TabularTranslator",
                                    "typeConversion": true,
                                    "typeConversionSettings": {
                                        "allowDataTruncation": true,
                                        "treatBooleanAsNumber": false
                                    }
                                }
                            },
                            "inputs": [
                                {
                                    "referenceName": "sourcefiles",
                                    "type": "DatasetReference"
                                }
                            ],
                            "outputs": [
                                {
                                    "referenceName": "targetfiles",
                                    "type": "DatasetReference",
                                    "parameters": {
                                        "folder_name": {
                                            "value": "Folder_@{item()}",
                                            "type": "Expression"
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                }
            }
        ],
        "annotations": [],
        "lastPublishTime": "2024-03-11T13:35:47Z"
    },
    "type": "Microsoft.DataFactory/factories/pipelines"
}