Airflow Scheduler Parallelism (Concurrency)

Airflow allows configuring and tuning the parallelism/concurrency of tasks at various levels (like DAGs) and via different approaches (like Pools). But at the top level, the father of all knobs provided that controls the overall number of task instances that the entire cluster can run simultaneously (running state), is the scheduler’s (or more appropriately the […]

Which Airflow Components Process the DAG Files

The component responsible for reading our Python files that contains DAGs and creating internal DAG objects out of our code for execution has been the Scheduler itself for a pretty long time. But with Airflow 2.3.0 a lot of code had been refactored and this entire processing was separated into a component called DagFileProcessorManager. DagFileProcessorManager […]

What Files are Processed by Airflow to Load DAGs

It is important to know the source file(s) selection process of Airflow that are read to load the DAGs. This process is implemented by the DagFileProcessorProcess, triggered by the DagFileProcessorManager. Let’s go through each step that is involved in creating the file paths list: First, the path or directory is resolved from where Airflow (or […]