MongoDB Aggregation query running very slow

629 Views Asked by jbernal At 03 November 2016 at 17:24

We version most of our collections in Mongodb. The selected versioning mechanism is as follows:

{  "docId" : 174, "v" : 1,  "attr1": 165 }   /*version 1 */
{  "docId" : 174, "v" : 2,  "attr1": 165, "attr2": "A-1" } 
{  "docId" : 174, "v" : 3,  "attr1": 184, "attr2" : "A-1" }

So, when we perform our queries we always need to use the aggregation framework in this way to ensure get latest versions of our objects:

db.docs.aggregate( [  
    {"$sort":{"docId":-1,"v":-1}},
    {"$group":{"_id":"$docId","doc":{"$first":"$$ROOT"}}}
    {"$match":{<query>}}
] );

The problem with this approach is once you have done your grouping, you have a set of data in memory which has nothing to do with your collection and thus, your indexes cannot be used.

As a result, the more documents your collection has, the slower the query gets.

Is there any way to speed this up?

If not, I will consider to move to one of the approaches defined in this good post: http://www.askasya.com/post/trackversions/

Original Q&A

There are 1 best solutions below

jbernal On 21 January 2018 at 11:47 BEST ANSWER

Just in order to complete this question, we went with option 3: one collection to keep latest versions and one collection to keep historical ones. It is introduced here: http://www.askasya.com/post/trackversions/ and some further description (with some nice code snippets) can be found in http://www.askasya.com/post/revisitversions/.

It has been running in production now for 6 months. So far so good. Former approach meant we were always using the aggregate framework which moves away from indexes as soon as you modify the original schema (using $group, $project...) as it doesn't match anymore the original collection. This was making our performance terrible as the data was growing.

With the new approach though the problem is gone. 90% of our queries goes against latest data and this means we target a collection with a simple ObjectId as identifier and we do not require aggregate framework anymore, just regular finds.

Our queries against historical data always include id and version so by indexing these (we include both as _id so we get it out of the box), reads towards those collections are equally fast. This is a point though not to overlook. Read patterns in your application are crucial when designing how your collections/schemas should look like in MongoDB so you must ensure you know them when taking such decisions.

MongoDB Aggregation query running very slow

There are 1 best solutions below

Related Questions in MONGODB

Related Questions in AGGREGATION-FRAMEWORK

Related Questions in DOCUMENT-VERSIONING

Trending Questions

Popular # Hahtags

Popular Questions