I am planning to include indexes to make my mongodb queries faster, currently working on Springboot.
I have variables userType and creationStamp (date and time in epoch milli while saving the document in collection). Currently my collections have no indexes and all queries have these two common variables, default sorting being done on
creationStamp-
which makes the query very slow as all its values are unique.
I have to introduce a new variable date (which won’t have timestamp). My idea is to use a compound index with order {userType, date} as range follows equality but I’m not sure how sorting would be done in this case.
Are the documents sorted by the order they are saved in the collection?
If not, how should I approach this problem if I want the sorted order to be in descending order of date and time of getting saved in the collection?
Can I just fetch the documents on basis of userType and date but sort on basis of creationStamp?
Thank you.
I think there are a few misconceptions which are making the question more complex than it needs to be. Let's start with the latest comment:
We need to take care here to make sure that we are comparing apples to apples. One of the responsibilities of explain plans is to report how the database will execute a query. But they don't always actually run that plan as part of the command. If there are no indexes on the collection then there aren't any options for execution apart from doing the collection scan.
Relatedly, even when explain is run at an elevated verbosity to gather execution statistics it is still not representative of the overall work that is performed in order to populate a UI with information from the database. Notably absent in that regard is the time it takes to transmit the information across the network from the database to the client.
I doubt that your system is able to scan and sort 5 million documents in 35 milliseconds. Even if so, that's not a scalable approach long term.
You keep focusing on the fact that the
creationStampfield effectively contains values that are effectively unique for each document in the collection like it is a bad or inefficient for query execution, but I don't think that's really the case.Keep in mind that an index is nothing more than an ordered data structure. Moreover unless it is a partial index then it will have an entry for every single document in the collection. The most notable consideration that the cardinality of the indexed field has here is in regards to the size of the index. Fewer distinct values typically result in prefix compression being more effective, but that consideration is not pertinent to your question about performance.
What the database is going to do is walk the index from the root down to leaf nodes, and then sequentially read those leaf nodes until it reaches the end of its range. This is going to happen the same whether you are using a
datefield that only has granularity of a day or if you are using something much more precise. It is the same amount of work both ways.Furthermore, you want to be able to use the index to sort your query results. If you don't do this then the database will have to do much more work in order to execute your query. The only way to do this is to have the field that you are sorting on as a key in the index definition.
Recommendation
All of this said, here is what I would suggest. Create an index on
{ userType: 1, creationStamp: 1}and generate the.explain("executionStats")output for your query. TheexecutionStatssection should show that the number of documents returned is the same as the number of index keys examined as well as documents examined. There should also not be aSORTstage present. If so, then you have a perfectly optimized query. If not then you should create a new question with that explain output and we could explore it further.