Airflow Control Celery Worker Parallelism/Concurrency with worker_concurrency
When you start a celery worker in Airflow with the airflow celery worker
command, you can specify the --concurrency|-c
option to define how many worker processes you need.
$ airflow celery worker --concurrency 6
What does this really mean though? It’s an option that is simply passed on to Celery when you are using Celery to execute your Airflow tasks (CeleryExecutor). The value you pass is basically the number of child processes Celery will fork to process Airflow’s tasks. That means in the command above, a single worker can execute up to 6
task instances concurrently (or parallelly) at a given time.
If no value is passed to the command above, then the default specified (16
) for worker_concurrency
under the [celery]
section in the Airflow configuration file (airflow.cfg
) is used. Feel free to change it to a different value to tune how many tasks you want each worker to handle concurrently:
[celery]
worker_concurrency = 32
...
If you don’t have the option to modify the configuration file directly, set the AIRFLOW__CELERY__WORKER_CONCURRENCY
environment variable to the required value.
Note: A higher number may require you to provision additional CPU and memory resources for your workers.