What Files are Processed by Airflow to Load DAGs
It is important to know the source file(s) selection process of Airflow that are read to load the DAGs. This process is implemented by the DagFileProcessorProcess
, triggered by the DagFileProcessorManager
. Let’s go through each step that is involved in creating the file paths list:
- First, the path or directory is resolved from where Airflow (or
DagFileProcessorProcess
) will decide to read the files to be parsed/processed for the loading of DAGs. This path can come from two places:- If running the
DAGProcessorManager
as a standalone process (outside the scheduler) via theairflow dag-processor
command, then the directory passed to the-S|--subdir
option is considered. - If either no option is passed to
airflow dag-processor --subdir
or the command itself is not used at all, i.e., the standalone processor is not used (in this case the DAG processing happens as part of the scheduler command/process), then the path specified fordags_folder
under the[core]
section inairflow.cfg
(configuration file) is used.
- If running the
- Out of all the files present in the path resolved in (1), only files that have
.py
extension are considered to be processed, i.e., only Python files are processed. - Out of all the files shortlisted (2), i.e., Python files present in the DAGs folder, only files that contain the words/strings
dag
ANDairflow
are considered to be processed. Although this operation can be disabled by turning off thedag_discovery_safe_mode=False
config option. - Out of the final set of files shortlisted, all those that match (
regexp
orglob
) patterns specified in.airflowignore
are excluded from processing.
The steps above filter out the file paths that Airflow finally reads to process and create DAG objects that are used by all the different components – Scheduler, Web Server and Workers.
Bonus: There are some optimizations that Airflow does to ensure it does “minimal” parsing of files. For instance, what’s the point in parsing a DAG file that has not been modified? Read this article to learn more about them.