Google Cloud Bigtable Row Count

186 Views Asked by At

I have a use case to count number of rows in Bigtable using rowkey prefix. I am using Google Bigtable Java client, with the current implementation my API taking well over 3 minutes to count 15M records, I am expecting for some days it could be 50M. I am looking to optimize the query and better solution.

I have only 1 node in my sandbox and running on HDD storage, I am planning to use SSD and use more nodes here. I want it to be performed better

Update:

    BigtableDataClient dataClient
            = BigtableDataClient.create(projectId, instanceId);
    // Limit parallelism of concurrent requests
    Semaphore semaphore = new Semaphore(100);
    // Use strip filter as we just need the rowkeys
    Filters.Filter stripFfilter = Filters.FILTERS.value().strip();
    Query myQuery = Query.create(tableId).prefix(prefix).filter(stripFfilter);

    List<KeyOffset> keyOffsets = null;
    try {
        keyOffsets = dataClient.sampleRowKeysCallable().call(tableId);

        List<Query> queryShards = myQuery.shard(keyOffsets);
        CountDownLatch taskTracker = new CountDownLatch(queryShards.size());
        List<Throwable> errors = Collections.synchronizedList(new ArrayList<>());
        AtomicLong totalCount = new AtomicLong();

        for (Query subQuery : queryShards) {
            semaphore.acquire();
            dataClient.readRowsAsync(subQuery, new ResponseObserver<>() {
                long subCount = 0;

                @Override
                public void onStart(StreamController controller) {

                }

                @Override
                public void onResponse(Row response) {
                    subCount++;
                }

                @Override
                public void onError(Throwable t) {
                    errors.add(t);
                    taskTracker.countDown();
                    semaphore.release();
                }

                @Override
                public void onComplete() {
                    totalCount.addAndGet(subCount);
                    taskTracker.countDown();
                    semaphore.release();
                }
            });
        }
        taskTracker.await();
    } catch (InterruptedException e) {
        throw new RuntimeException(e);
    }
0

There are 0 best solutions below