What Files are Processed by Airflow to Load DAGs

It is important to know the source file(s) selection process of Airflow that are read to load the DAGs. This process is implemented by the DagFileProcessorProcess, triggered by the DagFileProcessorManager. Let’s go through each step that is involved in creating the file paths list:

First, the path or directory is resolved from where Airflow (or DagFileProcessorProcess) will decide to read the files to be parsed/processed for the loading of DAGs. This path can come from two places:
- If running the DAGProcessorManager as a standalone process (outside the scheduler) via the airflow dag-processor command, then the directory passed to the -S|--subdir option is considered.
- If either no option is passed to airflow dag-processor --subdir or the command itself is not used at all, i.e., the standalone processor is not used (in this case the DAG processing happens as part of the scheduler command/process), then the path specified for dags_folder under the [core] section in airflow.cfg (configuration file) is used.
Out of all the files present in the path resolved in (1), only files that have .py extension are considered to be processed, i.e., only Python files are processed.
Out of all the files shortlisted (2), i.e., Python files present in the DAGs folder, only files that contain the words/strings dag AND airflow are considered to be processed. Although this operation can be disabled by turning off the dag_discovery_safe_mode=False config option.
Out of the final set of files shortlisted, all those that match (regexp or glob) patterns specified in .airflowignore are excluded from processing.

The steps above filter out the file paths that Airflow finally reads to process and create DAG objects that are used by all the different components – Scheduler, Web Server and Workers.

Bonus: There are some optimizations that Airflow does to ensure it does “minimal” parsing of files. For instance, what’s the point in parsing a DAG file that has not been modified? Read this article to learn more about them.

CodingShower

CodingShower

All Things Software

What Files are Processed by Airflow to Load DAGs

Leave a Reply Cancel reply