I am loading a CSV file with pipe as delimeter which is in blob location and it is loaded to staging table using SP activity in synapse pipeline

Question

I am loading a CSV file with pipe as delimeter which is in blob location and it is loaded to staging table using SP activity in synapse pipeline

58 Views Asked by aswin simhadri At 12 March 2024 at 06:28

The pipeline is meta data driven the error message is '[Yesterday 22:20] mruti Ranjan Dehury Code": 21104, "Message": "ErrorCode=DelimitedTextMoreColumnsThanDefined,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error found when processing 'Csv/Tsv Format Text' source 'PomOP_PRF_Status20240311_033449-00001.txt' with row number 11: found more columns than expected column count 14.,Source=Microsoft.DataTransfer.Common,'

I have figured out the bug as It is pipe separted in the source data it self it has a pipe in a column value. below is the sample of it

this is the value for that row 2307906|1/17 - Bridge product may need to be authorized for next infusion1st Infusion 11/30/2023 | PA Approved 11/1/2023 and it is pipe separated and we have inserted delimeter in meta data table as '|' but the below part is one column value but it is taking as two as it has a pipe but it is not a separator 1/17 - Bridge product may need to be authorized for next infusion1st Infusion 11/30/2023 | PA Approved 11/1/2023

It has to be handled in pipeline as we cant alter the source file

Original Q&A

There are 1 best solutions below

**Aswin** · Answer 1 · 2024-03-12T10:07:12.170000

Since the issue stems from the source data containing a pipe delimiter character within a column value, it causes the pipeline to interpret it as a column separator and split the column into two. Since you cannot alter the source file, you can handle this issue in the following ways.

One way to handle this issue is to use a file format that supports escaping of delimiter characters within column values. For example, you can use the pipe delimited file format and enclose the column values in double quotes. Then, you can modify the metadata table to use the file and specify the double quote character as the text qualifier.

Sample data:

"col1"|"col2"|"col3"
"A"|"1"|"3"
"B|C"|"2"|"3"

In this sample data, row 3 has a pipe symbol within the data itself. When data is enclosed within double quotes, you can use double quotes as the quote character.

If you don't have quote enclosed data, then you can follow the approach below.

The other way to handle this issue is to use the Fault tolerance feature in the Copy activity. This feature allows you to detect, skip, and log incompatible tabular data during the copy process. The rows that contain the expected number of columns are copied successfully to the sink store, while the rows that contain more than the expected number of columns are detected as incompatible and are skipped and logged as per the configuration.

In settings of copy activity, select skip incompatible rows as fault tolerance
Check the Enable logging check box.
Give the storage connection name from the drop down, to log the skipped rows.

You can fix the logged incompatible rows in the storage account manually or through some other process and reprocess them through the pipeline.

I am loading a CSV file with pipe as delimeter which is in blob location and it is loaded to staging table using SP activity in synapse pipeline

There are 1 best solutions below

Related Questions in AZURE-DATA-FACTORY

Related Questions in ETL

Related Questions in AZURE-SYNAPSE

Trending Questions

Popular # Hahtags

Popular Questions