Handling inconsistent record delimeters in ssis

58 Views Asked by At

I have a "fun" flat file I need to import. It's mostly fixed width with every record consisting of 516 characters, except the last record is only 514 and when importing as fixed width it skips that last record because of the missing 2 characters. (FYI there is no useful data in the final characters of each record so I don't even include them in the FF source).

The final characters are a record delimiter which would enable ragged right, except for two hiccups. First, the delimiter is usually "B", which can be problematic given we have things like names, addresses, and emails in the record. Secondly, I said it's usually "B", but sometimes it's inexplicably a double quotation mark. Simple example below, each record is 22 but the last is only 20

'George    Bush      B Berry     Goldwater B Silver    Surfer    " Mork      Mindy     '

How would you handle importing this? As a possible work around, can you add two spaces to the last line of the incoming flat file source data without modifying the actual file?

1

There are 1 best solutions below

0
John Cappelletti On

Just a thought... Normalize your delimiters.

This assumes your source file does not exceed 2GB.

[NewVal] can be parsed as necessary

Example

Declare @S varchar(max) = 'George    Bush      B Berry     Goldwater B Silver    Surfer    " Mork      Mindy     '

Select NewVal = trim(Value)
      ,Seq    = ordinal
 From  string_split(replace(replace(@S,' B ','|'),' " ','|'),'|',1)

Or another OPTION if truly fixed width

Select NewVal = substring(@S,n*22,20)
      ,Seq    = N+1
 From (Select Top 5000 N=-1+Row_Number() Over (Order By 1/0) From master..spt_values n1 )  A
 Where N<=len(@S)/22

Results

NewVal               Seq
George    Bush       1
Berry     Goldwater  2
Silver    Surfer     3
Mork      Mindy      4

I should add... to Import your source file

Declare @S varchar(max); 
Select @S = BulkColumn FROM  OPENROWSET(BULK 'c:\working\testdata.txt', SINGLE_BLOB) x;