best way to write csv.gz file and upload to s3 bucket

163 Views Asked by At

I need to query an s3 inventory report some_name.csv.gz file, filter its contents and change some columns and then replace it (with the same name some_name.csv.gz)with the updated version.

All of this is to be done in a python lambda.

For the query/filter part I am using s3.select_object_content

s3 = boto3.client('s3')
resp = s3.select_object_content(
   Bucket=bucket_name,
   Key=filename,
   ExpressionType='SQL',
   Expression="select * from s3object s where s._6 like '%NOT-SSE%'",
   InputSerialization={'CSV':{'FileHeaderInfo': 'None'},'CompressionType': 'GZIP'},
   OutputSerialization = {'CSV': {}}
)

for the write part I am looping through every record and writing it to a file after some potential modifications to the record

records = []
with open('csvfile.csv','wb') as file:
  for event in resp['Payload']:
   if 'Records' in event:
      rec = event['Records']['Payload'].decode('utf-8')
      # change one of the columns if it is empty
      row = rec.split(',')
      if row[2] == '':
         row[2] = 'NOT_EMPTY'
         rec = ''.join(row)
      file.write(rec)
      file.write('\n')  

# gzip file

# upload it back to s3  

Is it a bad idea to write the file line by line? (what if the file is very large) Are there better/more scalable alternatives?

0

There are 0 best solutions below