How come mongodb timeseries have duplicate _id

560 Views Asked by At

I just realized mongodb timeseries two or more documents can have the same ID.

Is that normal?

enter image description here

enter image description here

1

There are 1 best solutions below

0
On

I couldn't find it mentioned explicitly in the documentation, but there is no autogenerated unique index on _id field for time series collections. From here,

Time series collections behave like normal collections. You can insert and query your data as you normally would. MongoDB treats time series collections as writable non-materialized views backed by an internal collection. When you insert data, the internal collection automatically organizes time series data into an optimized storage format. When you create a time series collection, MongoDB automatically creates an internal clustered index on the time field.

Example:

db.createCollection(
    "weather",
    {
       timeseries: {
          timeField: "timestamp",
          metaField: "metadata",
          granularity: "hours"
       }
    }
)
    
db.weather.insertMany( [
   {
      "metadata": "temperature",
      "timestamp": ISODate("2022-05-18T00:00:00.000Z"),
      "temp": 12
   },
   {
      "metadata": "temperature",
      "timestamp": ISODate("2022-05-18T02:00:00.000Z"),
      "temp": 11
   },
   {
      "metadata": "temperature",
      "timestamp": ISODate("2022-05-18T04:00:00.000Z"),
      "temp": 9
   }
])

Now, if you query system.buckets.weather collection,

db.getCollection('system.buckets.weather').find({})

{
    "_id" : ObjectId("62843700f921421b34e56d1f"),
    "control" : {
        "version" : 1,
        "min" : {
            "_id" : ObjectId("63b7a7460a8571fbefcb480b"),
            "timestamp" : ISODate("2022-05-17T17:30:00.000-06:30"),
            "temp" : 9.0
        },
        "max" : {
            "_id" : ObjectId("63b7a7460a8571fbefcb480d"),
            "timestamp" : ISODate("2022-05-17T21:30:00.000-06:30"),
            "temp" : 12.0
        }
    },
    "meta" : "temperature",
    "data" : {
        "timestamp" : {
            "0" : ISODate("2022-05-17T17:30:00.000-06:30"),
            "1" : ISODate("2022-05-17T19:30:00.000-06:30"),
            "2" : ISODate("2022-05-17T21:30:00.000-06:30")
        },
        "_id" : {
            "0" : ObjectId("63b7a7460a8571fbefcb480b"),
            "1" : ObjectId("63b7a7460a8571fbefcb480c"),
            "2" : ObjectId("63b7a7460a8571fbefcb480d")
        },
        "temp" : {
            "0" : 12.0,
            "1" : 11.0,
            "2" : 9.0
        }
    }
}

This comment says this,

The primary key index of a Time Series collection is an automatically created clustered index on a server generated unique _id value for a group of documents with a unique metaField for a time span. This index and value can be seen in the corresponding system.buckets.foo collection. The _id of the document cannot currently be indexed and cannot be the primary key index for a Time Series collection like a regular collection.