ESR rule for mongoDB: What if I want to sort on my range index?

Question

ESR rule for mongoDB: What if I want to sort on my range index?

154 Views Asked by zozo At 30 August 2023 at 09:26

I am planning to include indexes to make my mongodb queries faster, currently working on Springboot.

I have variables userType and creationStamp (date and time in epoch milli while saving the document in collection). Currently my collections have no indexes and all queries have these two common variables, default sorting being done on creationStamp- which makes the query very slow as all its values are unique.

I have to introduce a new variable date (which won’t have timestamp). My idea is to use a compound index with order {userType, date} as range follows equality but I’m not sure how sorting would be done in this case. Are the documents sorted by the order they are saved in the collection? If not, how should I approach this problem if I want the sorted order to be in descending order of date and time of getting saved in the collection? Can I just fetch the documents on basis of userType and date but sort on basis of creationStamp?

Thank you.

Original Q&A

There are 1 best solutions below

**user20042973** · Answer 1 · 2023-08-31T18:54:22.330000

I think there are a few misconceptions which are making the question more complex than it needs to be. Let's start with the latest comment:

I ran explain in mongodb compass and it took 35ms w/o indexing while it takes up to a minute sometimes to show on UI, so this must be a different problem.

We need to take care here to make sure that we are comparing apples to apples. One of the responsibilities of explain plans is to report how the database will execute a query. But they don't always actually run that plan as part of the command. If there are no indexes on the collection then there aren't any options for execution apart from doing the collection scan.

Relatedly, even when explain is run at an elevated verbosity to gather execution statistics it is still not representative of the overall work that is performed in order to populate a UI with information from the database. Notably absent in that regard is the time it takes to transmit the information across the network from the database to the client.

I doubt that your system is able to scan and sort 5 million documents in 35 milliseconds. Even if so, that's not a scalable approach long term.

... default sorting being done on creationStamp- which makes the query very slow as all its values are unique. I wanted to add date so that I can index on it as creationStamp would be unique for every record creationStamp would be unique for each record. Filtering on it would result in looping through every record for that user type while there would be records under each date making it easier to pick up- at least this is my understanding

You keep focusing on the fact that the creationStamp field effectively contains values that are effectively unique for each document in the collection like it is a bad or inefficient for query execution, but I don't think that's really the case.

Keep in mind that an index is nothing more than an ordered data structure. Moreover unless it is a partial index then it will have an entry for every single document in the collection. The most notable consideration that the cardinality of the indexed field has here is in regards to the size of the index. Fewer distinct values typically result in prefix compression being more effective, but that consideration is not pertinent to your question about performance.

What the database is going to do is walk the index from the root down to leaf nodes, and then sequentially read those leaf nodes until it reaches the end of its range. This is going to happen the same whether you are using a date field that only has granularity of a day or if you are using something much more precise. It is the same amount of work both ways.

Furthermore, you want to be able to use the index to sort your query results. If you don't do this then the database will have to do much more work in order to execute your query. The only way to do this is to have the field that you are sorting on as a key in the index definition.

Recommendation

All of this said, here is what I would suggest. Create an index on { userType: 1, creationStamp: 1} and generate the .explain("executionStats") output for your query. The executionStats section should show that the number of documents returned is the same as the number of index keys examined as well as documents examined. There should also not be a SORT stage present. If so, then you have a perfectly optimized query. If not then you should create a new question with that explain output and we could explore it further.

ESR rule for mongoDB: What if I want to sort on my range index?

There are 1 best solutions below

Recommendation

Related Questions in MONGODB

Related Questions in SPRING-BOOT

Related Questions in SORTING

Related Questions in QUERY-OPTIMIZATION

Related Questions in MONGODB-INDEXES

Trending Questions

Popular # Hahtags

Popular Questions