Bulk insert, update values mongodb

282 Views Asked by At

I run a bulk insert cron job everyday. But some values get missed and when I rerun the data, the values are added to the existing data rather than updating. Is there a way to do an insert only documents that have not yet been inserted.

My code:

query = bigQuery.get_data(query)
bulk = col.initialize_unordered_bulk_op()

for i, row in enumerate(query):
    bulk.insert({
        'date': str(row['day_dt']),
        'dt': datetime.strptime(str(row['day_dt']), '%Y-%m-%d'),
        'site': row['site_nm'],
        'val_counts': row[8]
    })

bulk_result = bulk.execute()

Right now, it re-inserts all the values every time the query runs. Is there a way to only add values that have not yet been added.

1

There are 1 best solutions below

0
securisec On

I obviously dont fully know your data structure, and not fully clear on what you are trying to do, but I think this should do.

query = bigQuery.get_data(query)

new_things = []
for i, row in enumerate(query):
    if not col.find_one(your_query): # make sure that the document does not exist already
        # add data to an array
        new_things.append({
        'date': str(row['day_dt']),
        'dt': datetime.strptime(str(row['day_dt']), '%Y-%m-%d'),
        'site': row['site_nm'],
        'val_counts': row[8]
    })

# use insert_many to insert all the documents
bulk_result = col.insert_many(newthings)

Check the comments next to the code for explanation. If you are a noob as you mentioned, i would stick to the simpler way of doing things and scale your code as your experience grows.