Read a fille based based on a path derived from the column of another file in Azure Data Factory

43 Views Asked by At

I have this is tsv file in Azure Data Lake which contains two main columns among other columns:

Id ResponsePath
1 response path for id 1
2 response path for id 2

The response path for each record is a path to json file which is a also stored in Azure Data Lake.

I want to iterate over each record in the file, read the json stored in the file using the response path for that record, extract a property from this json and add it as a column along with the data from original file tsv file and upload this output to another data store.

So the end result would look something like this:

Id ResponsePath trace id
1 response path for id 1 61sd3414e
2 response path for id 2 6153eidkl

How do I do this is Azure Data Factory?

I tried setting up a lookup to fetch the column but the I cannot query for files in ADLS under the lookup module. I was thinking about using some data flow but have no idea how to set it up for this purpose since I would have to create a source based on a coulmn from another file.

1

There are 1 best solutions below

0
Aswin On

You can use Lookup activity to read the tsv file and then use foreach activity to iterate over each row in tsv file. Inside foreach activity, use dataflow activity to get the field value from Json by passing the ResponsePath as a dataflow parameter.

enter image description here

  • In Dataflow, take the source transformation (source1) and read the file from the dataflow parameter.
  • Take the flatten transformation and derived column transformation to flatten the Json and fetch the data from the Json file.
  • Using derived column transformation, create a new column and store the ResponsePath value in that.
  • Take another source transformation(source2) and read the data from tsv file.
  • Take the join transformation and join source2 and derived column transformation output basis the ResponsePath column
  • Take only required columns from the join transformation using Select Transformation
  • Then take the sink transformation with the same dataset of source2. This will make sure when next iteration is running, data from json will be inserted in the same file.

Dataflow inside the For-each activity: enter image description here