How do I use Google Datastore MultiQueryBuilder to load all Entities of a Kind?

86 Views Asked by Joshua Fox At 17 May 2016 at 14:34

I need to bulk-load all entities in a table. (They need to be in memory rather than loaded as-needed, for high-speed on-demand graph-traversal algorithms.)

I need to parallelize this for speed in loading. So, I want to run multiple queries in parallel threads, each pulling approx. 800 entities from the database.

QuerySplitter serves this purpose, but we are running on Flexible Environment and so are using the Appengine SDK rather than the Client libraries.

MapReduce has been mentioned, but that is not aimed at simple dataloading into memory. Memcache is somewhat relevant, but for high speed access I need all these objects in a dense network in the RAM of my own app's JVM.

MultiQueryBuilder might do this. It offers parallelism in running parts of a query in parallel.

Whichever of these three approaches, or some other approach, is used, the hardest part is to define filters or some other form of spilts that roughly partition the table (the Kind) into chunks of 800 or so entities? I would create filters that say "objects 1 through 800", "801 through 1600,...", but I know that that is impractical. So, how does one do it?

Original Q&A

There are 1 best solutions below

speedplane On 25 May 2016 at 05:00

I solved a similar problem by partitioning the entities into random groups.

I added a float property to each datastore entity, and assigned it a random number between 0 and 1 every time I saved the entity. Then, when launching the N threads to do the work on various datastore entities, I had each thread work over a query of 1/N of the entities. For example, thread 0 would handle all entities which had its random property set between 0 and 1/N. Thread 2 would handle all entities that had their random property between 1/N and 2/N, etc.

The downside to this is that it is not entirely deterministic and you need to add a new property to your datastore entities. The upside is that it can easily scale to millions of entities and threads and you generally get an even distribution of work across the threads.

How do I use Google Datastore MultiQueryBuilder to load all Entities of a Kind?

There are 1 best solutions below

Related Questions in GOOGLE-APP-ENGINE

Related Questions in GOOGLE-CLOUD-DATASTORE

Related Questions in GOOGLE-MANAGED-VM

Trending Questions

Popular # Hahtags

Popular Questions