Spark-SQL Query Hints for Join Performance Improvement

1.5k Views Asked by At

I have recently been introduced to SparkSQL. We use Spark 2.4. I recently found out that SparkSQL query supports the following hints for its Join strategies:

  • BROADCAST hint
  • MERGE hint
  • SHUFFLE_HASH hint

Unfortunately, I have not found any online materials which elaborately discuss these hints and their application scenarios. I wish to learn some tips regarding when to use these hints in a query Join for improving query performance.

Can anyone explain with some examples. Any help is appreciated. Thanks

1

There are 1 best solutions below

0
Jax Ma On
  1. Broadcast join is very high performance join with sending data of the small table to every executor to execute a map-side join . here is the configuration :spark.sql.autoBroadcastJoinThreshold
  2. Sort-merge join is a default join choice after spark 2.3

there are some post ,Hope it help you: Spark SQL Joins Sort-Merge Join