How to set row batch size for incrementalCollect in Apache Spark Thrift server?

239 Views Asked by fokoenecke At 21 September 2021 at 13:24

I enabled spark.sql.thriftServer.incrementalCollect in my Thrift server (Spark 3.1.2) to prevent OutOfMemory exceptions. This worked fine, but my queries are really slow now. I checked the logs and found that Thrift is querying batches of 10.000 rows.

INFO SparkExecuteStatementOperation: Returning result set with 10000 rows from offsets [1260000, 1270000) with 169312d3-1dea-4069-94ba-ec73ac8bef80

My hardware would be able to handle 10x-50x of that. This issue and this documentation page suggest setting spark.sql.inMemoryColumnarStorage.batchSize, but that didn't work.

Is it possible to configure the value?

Original Q&A

There are 1 best solutions below

RockSolid On 27 March 2024 at 17:40

The spark.sql.inMemoryColumnarStorage.batchSize is for caching not for the fetchSize per incrementalload. Read the spark thrift code in open source repo to check exact usage.

How to set row batch size for incrementalCollect in Apache Spark Thrift server?

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in THRIFT

Related Questions in SPARK-THRIFTSERVER

Trending Questions

Popular # Hahtags

Popular Questions