Airflow Control Celery Worker Parallelism/Concurrency with worker_concurrency

When you start a celery worker in Airflow with the airflow celery worker command, you can specify the --concurrency|-c option to define how many worker processes you need.

$ airflow celery worker --concurrency 6

What does this really mean though? It’s an option that is simply passed on to Celery when you are using Celery to execute your Airflow tasks (CeleryExecutor). The value you pass is basically the number of child processes Celery will fork to process Airflow’s tasks. That means in the command above, a single worker can execute up to 6 task instances concurrently (or parallelly) at a given time.

If no value is passed to the command above, then the default specified (16) for worker_concurrency under the [celery] section in the Airflow configuration file (airflow.cfg) is used. Feel free to change it to a different value to tune how many tasks you want each worker to handle concurrently:

worker_concurrency = 32

If you don’t have the option to modify the configuration file directly, set the AIRFLOW__CELERY__WORKER_CONCURRENCY environment variable to the required value.

Note: A higher number may require you to provision additional CPU and memory resources for your workers.

Leave a Reply

Your email address will not be published. Required fields are marked *