Does `fetch_page()` just not guarantee a minimum amount of results when used with `filter()`?

29 Views Asked by At

In the documentation we have:

page_size: At most this many results will be returned.

It looks like when using filter along with fetch_page, it doesn't return a minimum of results, even though there are more results which actually match the query. Is that really the case?

Is it possible for fetch_page to result zero results, even though if keep going by continuing from the returned cursor, we'll find more results eventually?

And, if that's the case, and I need a minimum amount of results, does it mean that I have to "manually" accumulate results until I get to the desired number of entries? Or is there are a feature in NDB which will allow me to "automatically" accumulate results until I have a certain minimum number of results?

Here's the code in question I'm using:

results, cursor, more = (cls.query(keys_only=True)
                         .filter(cls.user_id == user_id)
                         .filter(cls.expired == False)
                         .order(ordering)
                         .fetch_page(batch_size, start_cursor=start_cursor))

In my test environment most of the results saved in the database don't match the filters, but there are still quite a few that do, and don't appear

1

There are 1 best solutions below

0
Jim Morrison On

Starting with your query, it either uses a specific composite index or follows a merge join algorithm. The performance of merge join queries is described at https://cloud.google.com/datastore/docs/concepts/optimize-indexes#index_merge_performance .

As noted in that doc, it's possible for the query to match fewer than batch_size results in the RPC deadline, and thus return without batch_size results.

If the RPC successfully returns and has zero results, more should be false. If the RPC can't find any results, but isn't done scanning, it may return an error.

If you really need batch_size results you should verify that you get have batch_size results by issuing your query multiple times, and by updating start_cursor on every call. You should also use a few indexes as possible for serving your query.

The full document at https://cloud.google.com/datastore/docs/concepts/optimize-indexes should be helpful for you.