I enabled spark.sql.thriftServer.incrementalCollect in my Thrift server (Spark 3.1.2) to prevent OutOfMemory exceptions. This worked fine, but my queries are really slow now. I checked the logs and found that Thrift is querying batches of 10.000 rows.
INFO SparkExecuteStatementOperation: Returning result set with 10000 rows from offsets [1260000, 1270000) with 169312d3-1dea-4069-94ba-ec73ac8bef80
My hardware would be able to handle 10x-50x of that.
This issue and this documentation page suggest setting spark.sql.inMemoryColumnarStorage.batchSize, but that didn't work.
Is it possible to configure the value?
The spark.sql.inMemoryColumnarStorage.batchSize is for caching not for the fetchSize per incrementalload. Read the spark thrift code in open source repo to check exact usage.