Connections

A valuable concept in Airflow is Connections. This allows DAGs in Airflow to access various data resources stored in Kids First.

Bootstrapping Connections

During startup of an Airflow docker container, a script will run that will interpret all environment variables prefixed with AIRFLOW_ that have what appears to be a secret path, secret/aws/kf-airflow/*, and attempt to load that path’s value from vault. These secrets are then written back to the original environment variable allowing them to be accessed by airflow.

For example:

AIRFLOW_CONNECTION_MY_DB=secret/aws/kf-airflow/my_db

Will, upon container start, be translated to:

AIRFLOW_CONNECTION_MY_DB=postgres://user:pass@myhost:5432/mydb

Or a similar connection string that has been stored in vault at the given path.

Adding New Connections

If new connections need to be added, the connection string will need to be inserted into vault under the secret/aws/kf-airflow/ path. This path then needs to be added to the container definitions inside the Airflow config repository.

If the connection that needs to be added does not condense into a simple connection string, some additional work may need to be done to set it up properly in the container.

Using Connections

Airflow will attempt to load a connection from the environment when specified. For instance, if the variable AIRFLOW_CONN_MY_POSTGRES is in the environment, then the following task will load and use it:

task = PostgresOperator(
    "SELECT count(*) FROM users;",
    postgres_conn_id="my_postgres",
    database="profiles",
    dag=dag)