We version most of our collections in Mongodb. The selected versioning mechanism is as follows:
{ "docId" : 174, "v" : 1, "attr1": 165 } /*version 1 */
{ "docId" : 174, "v" : 2, "attr1": 165, "attr2": "A-1" }
{ "docId" : 174, "v" : 3, "attr1": 184, "attr2" : "A-1" }
So, when we perform our queries we always need to use the aggregation framework in this way to ensure get latest versions of our objects:
db.docs.aggregate( [
{"$sort":{"docId":-1,"v":-1}},
{"$group":{"_id":"$docId","doc":{"$first":"$$ROOT"}}}
{"$match":{<query>}}
] );
The problem with this approach is once you have done your grouping, you have a set of data in memory which has nothing to do with your collection and thus, your indexes cannot be used.
As a result, the more documents your collection has, the slower the query gets.
Is there any way to speed this up?
If not, I will consider to move to one of the approaches defined in this good post: http://www.askasya.com/post/trackversions/
Just in order to complete this question, we went with option 3: one collection to keep latest versions and one collection to keep historical ones. It is introduced here: http://www.askasya.com/post/trackversions/ and some further description (with some nice code snippets) can be found in http://www.askasya.com/post/revisitversions/.
It has been running in production now for 6 months. So far so good. Former approach meant we were always using the aggregate framework which moves away from indexes as soon as you modify the original schema (using $group, $project...) as it doesn't match anymore the original collection. This was making our performance terrible as the data was growing.
With the new approach though the problem is gone. 90% of our queries goes against latest data and this means we target a collection with a simple
ObjectIdas identifier and we do not require aggregate framework anymore, just regular finds.Our queries against historical data always include
idandversionso by indexing these (we include both as_idso we get it out of the box), reads towards those collections are equally fast. This is a point though not to overlook. Read patterns in your application are crucial when designing how your collections/schemas should look like in MongoDB so you must ensure you know them when taking such decisions.