Developing for Airflow¶
Developing Locally¶
A full Airflow deployment may be started locally using docker-compose in the root of the repository.
docker-compose up -d
This will build the single Dockerfile defined in the root of the repository and deploy several different copies of it, each serving a different role. This is done to immitate the deployed configuration that uses asynchronous wokers, as opposed to the standard sequential executor which is is free from many of caveats.
The services that are brought up for the local compose are as follows:
postgres - Saves task run state and other persistance
redis - Stores the task queue
webserver - The web interface for monitoring and administrating DAGs
scheduler - The airflow scheduler that will periodically scan for tasks to add to the queue
worker - The worker that will consume tasks from the queue and perform the actual operations
flower - a tool for inspecting the work queue
Development Connections¶
DAGs are often pipelines that are built to integrate various data environments, and thus having data sources and destinations are essential to developing and running DAGs. For security reasons, many internal data sources and destinations are not available to a developer’s local machine, as they would be in a production environment. This results in the developer needing to find ways to provide data to the DAG and a place to store and analyze the results. A common method to accomplish this involves setting up or obtaining access to sources and destinations that the developer has access to via altered connections.
Connections required for development, eg: an API token or local database
credentials, may be configured through the .env
file.
This is where standard airflow connections may be defined
and picked up from airflow.
They have the form:
AIRFLOW_CONN_MY_CONN=https://user:pass@url
Because often these connections will have secret information, the .env
file should not be committed and is listed in the .gitignore
by default.
Deployment Connections¶
Although a connection may be configured and used locally, it does not mean that it is available once the DAG is deployed properly. Work will need to be done with the devops team to ensure that the required connections and credentials are available in the deployment environments.