Python CSV: Append data from S3, duplicate entries

44 Views Asked by Shray Sharan At 18 March 2024 at 21:38

FYI I am a complete Python novice. I have a for loop that is extracting some object info from an S3 bucket and populating it into a csv file. For every object for which the details are retrieved, I need that data to be populated to a csv. My issue is I am getting duplicate entries in the csv. What I am expecting in the csv is:

account_id;arn
key1;body1
key2;body2
key3;body3
... (until the loop runs through all objects in that folder)

But what I am getting is:

account_id;arn
key1;body1
account_id;arn
key1;body1
account_id;arn
key2;body2
account_id;arn
key1;body1
account_id;arn
key2;body2
account_id;arn
key3;body3

Also every time I run the script, it keeps adding the old data which is kind of multiplying the problem.

My current piece of code is:

for objects in my_bucket.objects.filter(Prefix="folderpath"):
    key = objects.key
    body = objects.get()['Body'].read()
    field = ["account_id","arn"]
    data = [
        [key, body]
    ]
    with open("my_file.csv", "a") as f:
    writer = csv.writer(f, delimiter=";", lineterminator="\\n")
    writer.writerow(field)
    writer.writerows(data)

Original Q&A

There are 2 best solutions below

Kayvan Shah On 18 March 2024 at 21:55

import csv

# Assuming `my_bucket` and `folderpath` are defined earlier

# Open the CSV file in write mode
with open("my_file.csv", "w", newline="") as f:
    writer = csv.writer(f)

    # Write header row once at beginning of file
    writer.writerow(["account_id", "arn"])

    # Create a list to store content for all rows
    data = []

    # Iterate over objects in the S3 bucket
    for objects in my_bucket.objects.filter(Prefix="folderpath"):
        key = objects.key
        body = objects.get()["Body"].read()

        # Append the row
        data.append([key, body])

    # Write all the data at end in a single I/O operation
    writer.writerows(data)

wombat On 18 March 2024 at 21:56

It's much easier if you use the csv module in Python.

Start by defining your headers and preparing your csv file like so

import csv

with open('names.csv', 'w', newline='') as csvfile:
    fieldnames = ['account_id', 'arn']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    
    for objects in my_bucket.objects.filter(Prefix="folderpath"):
        key = objects.key
        body = objects.get()['Body'].read()
        
        writer.writerow({'account_id': key, 'arn': body})

Python CSV: Append data from S3, duplicate entries

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in AMAZON-WEB-SERVICES

Related Questions in CSV

Related Questions in BOTO3

Trending Questions

Popular # Hahtags

Popular Questions