How to determine the practical character limit of a Livy request to submit Spark batch job sent from Airflow?

59 Views Asked by At

I am trying to figure out the upper limit (practical estimate is fine) of characters/bytes passed on as args to the LivyOperator in Airflow.

The operator eventually uses args(: Sequence[str | int | float]) as part of the spark_params in the POST request: self.get_hook().post_batch(**self.spark_params).

As there are a lot of steps involved until the Spark cluster receives the original args as arguments, I find it difficult to determine a practical limit when working in Airflow (e.g. pass on de-serialized JSON as part of args).

Assumptions on available sys memory: server running Airflow 1-3 GB, server running Livy 5-10 GB

My thoughts on limits involved so far:

  • inside python: actual value I want to know the limit of -> single str -> python str limit -> dependent on py installation, but in this case practically limited by memory
  • server hosting Livy: POST request (see What is the size limit of an HTTP POST request?) -> best guess: kB range
  • Spark (invoked by Livy obviously) specific limits -> ?
  • server specific cmd line arg limits -> best guess: low GB range

Seems like the POST request is the potential bottleneck, but maybe I'm missing something entirely... Any practical advice highly appreciated.

0

There are 0 best solutions below