

TLDR Airflow has components that can use a message broker to schedule a task by pushing it into a queue, allowing them to be picked up by task workers. If you are curious about how airflow executes these tasks, do checkout the architecture of Apache Airflow. A task in airflow is a basic unit of execution and forms the nodes of the DAG. One of the key reasons why Airflow is so powerful is it’s abstraction of a task the ability to stitch them together to form dependencies and run them across a cluster of machines. At Locale.ai, we use Airflow extensively to orchestrate our data pipelines and user-facing workflows (shameless plug, check out our new feature - workflows). Notice the arrows around the tasks that signify the dependencies between them. A typical DAG in Airflow with inter-dependent tasks Hence, Airflow models these dependencies as a Directed Acyclic Graph (DAG), such as the one shown below. Note that these jobs are different from vanilla CRONs because they have a dependency on the preceding job being executed successfully.

For instance, a workflow can comprise of the following sequential jobs:
#Airflow kubernetes worker series#
These jobs can also be scheduled to run periodically.Ī typical example could be a series of interdependent jobs that ingest data into a datastore.

A workflow in this context is a series of jobs that need to be executed in a certain topological order. Airflow is an open-source tool that lets you orchestrate workflows.
