MongoDB - mongoimport duplicate _id in JSON array

218 Views Asked by At

I would like to ask you for help. I encountered a problem where, when I'm importing JSON into mongodb via compass, it throws a duplicate _id error. Therefore, I tried to go to the terminal and go through mongoimport, which runs successfully and informs me that each document was imported without error, but I see that the documents are missing. Can you give me some advice on how to solve this problem?

This is terminal command in windows cmd

mongoimport D:\DimplomaThesis_data\transfer_json\180000-190000.json -d diplomovka -c transfer --jsonArray --stopOnError --maintainInsertionOrder --upsertFields _id

This is structure of record in JSON array:

{
  "_id":"5d6566d086dc8b72382bc376",
  "name":"Peter",
  "surname":"Zubrík",
  "titles":{
    "before":"",
    "after":""
  },
  "sex":"M",
  "citizenship":"SVK",
  "birthyear":1991,
  "age":31,
  "transfer":{
    "source_ppo":"tj-polana-siba.futbalnet.sk",
    "org_profile_id":"sportovnik-klub-fc-mukarov.futbalnet.sk",
    "org_id":"5d5d3974eccb8850917918cd",
    "sector":{
      "_id":"sport:futbal:futbal",
      "category":"sport",
      "itemId":"futbal",
      "sectorId":"futbal"
    },
    "competence_type":"player",
    "transfer_type":"transfer",
    "issfMoveType":"PWP",
    "date_from":"2014-05-09T00:00:00.000Z",
    "date_to":null,
    "_id":"62e6d12c0ae29819010f611f",
    "org_profile_name":"Sportovník klub FC Mukařov",
    "org_name":"Sportovník klub FC Mukařov",
    "source_ppo_name":"TJ Poľana Šiba"
  },
  "issfId":"1208658"
}

"_id":"5d6566d086dc8b72382bc376" this could have multiple records in array same. I download data from APIs, around 30 JSON each contain 10.000 records. Ideally import all document to mongodb and next create pipeline in compass.

1

There are 1 best solutions below

0
Hextall_25 On

I found solution for my problem.

I need to use python for creating compound_id (new primary key - unique identifier for each record in array (json)).

this code work for me:

# Load the JSON data from the file
with open("250000-260000", "r", encoding="utf-8") as f:
    data = json.load(f)
    
# Modify the data to include the compound_key and player_id fields
for doc in data:
    doc["player_id"] = doc["_id"]
    doc["compound_key"] = doc["player_id"] + "_" + doc["transfer"]["date_from"]
    doc["_id"] = doc["compound_key"]

# Save the modified data to a new JSON file
with open("26.json", "w") as f:
    json.dump(data, f, indent=2)

Basically I created new modify json file and this file I import through Mongo Compass where import finish with 0 error (error duplicate _id)