Variables

Variables are useful for sharing parameters between DAGs. Say that there is a DAG with a task that stages api output in S3:

def upload(**kwargs):
    s3 = S3Hook()
    s3.load_file('./output.json',
                 'api/{{ today_ds }}.json',
                 bucket_name='staging-data')

task = PythonOperator(
    python_callable=upload,
    task_id="stage_api",
    dag=dag)

Perhaps, there will be another task in another DAG that will want to use this data:

def download(**kwargs):
    s3 = S3Hook()
    obj = s3.get_key('api/{{ today_ds }}.json',
                     bucket_name='staging-data')
    obj.download_file('./api_output.json')

task = PythonOperator(
    python_callable=download,
    task_id="download_api_data",
    dag=dag)

The problem introduced here is that, if the first DAG ever changes its output location, say from api/{{ today_ds }}.json to api/users/{{ today_ds }}.json, the second task would still attempt to to load the file from the old location. To avoid having to update all the tasks that rely on this output location, we can instead make it a variable. This way, we can simply change the value of the output location and know that all tasks referencing that variable will use it next time it is run. Our tasks utilizing variables now look something like this:

key = Variable('output_key')
bucket = Variable('output_bucket')

def upload(**kwargs):
    s3 = S3Hook()
    s3.load_file('./output.json', key, bucket_name=bucket)

task = PythonOperator(
    python_callable=upload,
    task_id="stage_api",
    dag=dag)

def download(**kwargs):
    s3 = S3Hook()
    obj = s3.get_key(key, bucket_name=bucket)
    obj.download_file('./api_output.json')

task = PythonOperator(
    python_callable=download,
    task_id="download_api_data",
    dag=dag)

Variable Definitions

Caution

Variables should not contain secrets. Use connections for that.

Variables may be set through the admin, however, because Airflow is to be deployed with near-identical configurations accross environments, we require that variables be set inside the files in the variables/ directory. These files will be loaded into Airflow upon container start, depending on the current environment. This allows variables to be tracked in our git flow and let’s us bootstrap a new deployment without manually configuring all of the settings.

Common Variables

Common variables that are consistent accross all environments may be set inside the common.json variable file. These are things like slack channels or external accounts that only ever have one value.

Environment Specific Variables

These variables change depending on the deployment environment: dev, qa, or prd. These are often values such as bucket locations or service names.