Automatically remove dag from airflow UI not present in dagbag

930 Views Asked by At

Scenario

I have a python file which creates multiple dags(Dynamic dag). This file fetches some data from an API and say 100 dags are created based on 100 rows from the API response.

Issue

When the API response changes, say now 90 rows are coming then 10 dags are removed from dagbag since dyamic dag file is not creating those dags, however those dags are still present on airflow UI. Also sometimes I see certain task of these dags in scheduled state(since code of the dag is not present in dagbag, so they can't go to running state) which I have to manually kill and then pause the dag.

Looking for?

I wanted to know if there is any way(config or otherwise) using which I can make sure if a dag is not present in dagbag then it doesn't show up on airflow AI until it's response added back in API again and nor did it tasks mess up the stats on airflow. I am using airflow-2.3.2

1

There are 1 best solutions below

1
Hussein Awala On

Every dag_dir_list_interval, the DagFileProcessorManager list the scripts in the dags folder, then if the script is new or processed since more than the min_file_process_interval, it creates DagFileProcessorProcess for the file to process it and generate the dags.

At this moment, DagFileProcessorProcess will call the API and get the dags ids, then update the dag bag.

But the dag records (runs, tasks, tags, ...) will stay in the Metastore, and they can be deleted by UI, API or CLI:

# API
curl -X DELETE <airflow url>/api/v1/dags/<dag id>
# CLI
airflow dags delete <dag id>

Why the dags are not deleted automatically when they disappear from dagbag?

Suppose you have some dags created dynamically based on a config file stored in S3 and there is a network problem or a bug in the new release, or you have a problem with the volume which contains the dags files, in this case, if the DagFileProcessorManager detects the difference between the Metastore and the local dagbag, then deletes these dags, there will a big problem where you will loss the history of your dags.
Instead, Airflow keeps the data, to let you decide if you want to delete them.

Can you delete the dags dynamically?

You can create an hourly dag with a task which fill a dagbag locally, and load the Metastore dagbag, then delete the dags which appear in the Metastore dagbag and not the local dagbag.

But do these removed dags remain visible in the UI?

The answer is no, they are marked as deactivated after deactivate_stale_dags_interval which is 1 min by default, this deactivated/activated notion can solve the first problem I mentioned above, where only the activated dags are visible on the UI. Then when the network/volume issue is solved, the DagFileProcessorManager will create the dags, and marked them as activated in the Metastore.

So if your goal is just hiding the deleted dags from the UI, you can check what do you have as value for deactivate_stale_dags_interval and decrease the value, but if you want to completely delete the dag, you need to do it manually or using a dag which run the manual commands/API request.