Python CSV: Append data from S3, duplicate entries

44 Views Asked by At

FYI I am a complete Python novice. I have a for loop that is extracting some object info from an S3 bucket and populating it into a csv file. For every object for which the details are retrieved, I need that data to be populated to a csv. My issue is I am getting duplicate entries in the csv. What I am expecting in the csv is:

account_id;arn
key1;body1
key2;body2
key3;body3
... (until the loop runs through all objects in that folder)

But what I am getting is:

account_id;arn
key1;body1
account_id;arn
key1;body1
account_id;arn
key2;body2
account_id;arn
key1;body1
account_id;arn
key2;body2
account_id;arn
key3;body3

Also every time I run the script, it keeps adding the old data which is kind of multiplying the problem.

My current piece of code is:

for objects in my_bucket.objects.filter(Prefix="folderpath"):
    key = objects.key
    body = objects.get()['Body'].read()
    field = ["account_id","arn"]
    data = [
        [key, body]
    ]
    with open("my_file.csv", "a") as f:
    writer = csv.writer(f, delimiter=";", lineterminator="\\n")
    writer.writerow(field)
    writer.writerows(data)
2

There are 2 best solutions below

2
Kayvan Shah On
import csv

# Assuming `my_bucket` and `folderpath` are defined earlier

# Open the CSV file in write mode
with open("my_file.csv", "w", newline="") as f:
    writer = csv.writer(f)

    # Write header row once at beginning of file
    writer.writerow(["account_id", "arn"])

    # Create a list to store content for all rows
    data = []

    # Iterate over objects in the S3 bucket
    for objects in my_bucket.objects.filter(Prefix="folderpath"):
        key = objects.key
        body = objects.get()["Body"].read()

        # Append the row
        data.append([key, body])

    # Write all the data at end in a single I/O operation
    writer.writerows(data)
1
wombat On

It's much easier if you use the csv module in Python.

Start by defining your headers and preparing your csv file like so

import csv

with open('names.csv', 'w', newline='') as csvfile:
    fieldnames = ['account_id', 'arn']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    
    for objects in my_bucket.objects.filter(Prefix="folderpath"):
        key = objects.key
        body = objects.get()['Body'].read()
        
        writer.writerow({'account_id': key, 'arn': body})