Airflow: Use LivyBatchOperator for submitting pyspark applications in yarn

2.9k Views Asked by At

I have encountered something called LivyBatchOperator but unable to find a very good example for it to submit pyspark applications in airflow. Any info on this would really be appreciated. Thanks in advance.

1

There are 1 best solutions below

3
Abdul On BEST ANSWER

I come across this blog post which can help you to walk through available options on Airflow + Spark.

Here is an example of LivyBatchOperator and here is on how to install airflow-livy-operators.

I would recommend below options :

  1. AWS EMR : Use EmrAddStepsOperator
  2. Regular Spark Cluster : Use above mechanism to set up Livy operators in airflow. This will give you a slick configuration from the airflow servers perspective as well as using Livy in front of spark cluster.

Let me know your response !