Schedule airflow dag with delay

142 Views Asked by At

I’m trying to create an airflow dag that runs an sql query to get all of yesterday’s data, but I want the execution date to be delayed from the data_interval_end.

So the data interval is ending at midnight, but it takes few hours for the data itself to be ready for querying. This is why I want the dag to run only after 4 hours.

For example:

data_interval_start = 2022-01-01 00:00:00
data_interval_end = 2022-01-02 00:00:00
wanted dag execution = 2022-01-02 00:04:0

How can I achieve this? Thanks!

So far I just adjusted the sql query itself with date_trunc, but I hope there is a solution to keep the query without this function.

1

There are 1 best solutions below

1
S N On

Instead of delaying by fixed time, you may use BranchSQLOperator, it has follow_task_ids_if_true and follow_task_ids_if_false. If you use fixed time window, it might run even in the cases where your data is not ready.

operator = BranchSQLOperator(
task_id="check_data_presence_task",
conn_id="sql_connection_id",
sql="SELECT count(1) FROM my_table where date>=today_date",
follow_task_ids_if_true="success_task_id",
follow_task_ids_if_false="fail_task_id",
dag=dag

)