Developing for Airflow

Developing Locally

A full Airflow deployment may be started locally using docker-compose in the root of the repository.

docker-compose up -d

This will build the single Dockerfile defined in the root of the repository and deploy several different copies of it, each serving a different role. This is done to immitate the deployed configuration that uses asynchronous wokers, as opposed to the standard sequential executor which is is free from many of caveats.

The services that are brought up for the local compose are as follows:

  • postgres - Saves task run state and other persistance

  • redis - Stores the task queue

  • webserver - The web interface for monitoring and administrating DAGs

  • scheduler - The airflow scheduler that will periodically scan for tasks to add to the queue

  • worker - The worker that will consume tasks from the queue and perform the actual operations

  • flower - a tool for inspecting the work queue

Development Connections

DAGs are often pipelines that are built to integrate various data environments, and thus having data sources and destinations are essential to developing and running DAGs. For security reasons, many internal data sources and destinations are not available to a developer’s local machine, as they would be in a production environment. This results in the developer needing to find ways to provide data to the DAG and a place to store and analyze the results. A common method to accomplish this involves setting up or obtaining access to sources and destinations that the developer has access to via altered connections.

Connections required for development, eg: an API token or local database credentials, may be configured through the .env file. This is where standard airflow connections may be defined and picked up from airflow. They have the form:

AIRFLOW_CONN_MY_CONN=https://user:pass@url

Because often these connections will have secret information, the .env file should not be committed and is listed in the .gitignore by default.

Deployment Connections

Although a connection may be configured and used locally, it does not mean that it is available once the DAG is deployed properly. Work will need to be done with the devops team to ensure that the required connections and credentials are available in the deployment environments.