How to get files to ignore newline error when loading into redshift from s3

586 Views Asked by At

I have a bunch of files that copy from s3 to redshift on a schedule everyday and I'm getting the error Missing newline: Unexpected character 0x3a found at location 48 (character type and location vary) on the last line of many of the files. Despite the fact that these errors point to different characters within the raw line, I believe it has to do with the fact that the files I'm using have no newline character at the end of the file (in the last line of the file). These files don't all copy separately, the copy statement uses a prefix to mass copy the files over from s3 to redshift:

COPY schema.table FROM 
's3://bucket/prefix'  
iam_role 'iam_role_specfic' delimiter '|' 
`escape IGNOREBLANKLINES TRIMBLANKS BLANKSASNULL ACCEPTINVCHARS EMPTYASNULL` TRUNCATECOLUMNS FILLRECORD null as '\0'

files in S3 bucket:
prefix_25-character-guid-1234567
prefix_25-character-guid-2345678
prefix_25-character-guid-3456789

I think the fact that s3 reads these not as separate files but as one big mass of data, coupled with the fact that there is no newline character at the end of each file (as comfirmed by viewing the files in a text editor) is causing the first and last lines of two files that copy next to each other to smash together. Thus, an error is caused in the last line of every individual file that is copied over.

I'm wondering how to avoid this. I.e. what should I tell s3 (in the copy statement shown above) to allow it to deal with this?

0

There are 0 best solutions below