I have used Pandas UDF and I am aware they use pyarrow to avoid object translation between JVM and the Python interpreter running in the workers. I am also aware of the property spark.sql.execution.arrow.enabled that can be set to true in order to optimize interactions between Pandas DF and Spark DF. My questions are:
- Are there still conventional UDFs not affected by pyarrow in pyspark 3.5.0?
- Is it necessary to set
spark.sql.execution.arrow.enabledtotruein order to use Pandas UDF and avoid converting JVM objects <-> Python objects? - When the property is enabled, does it mean that even with conventional UDFs, there is no conversion between Java and Python? Or the improvement is only for Pandas UDF specifically, but not for general UDFs? (this is basically a re-word of my first question)