Data Warehouse

The primary purpose of airflow is to manage data intake into the warehouse. The warehouse serves as a central location to store data important to analytics and reporting.

Warehouse data flow

Staging

The staging step copies data from source systems with minimal cleaning and stores them in the staging system. Currently, the staging system consists of files in s3 and a relational database. The act of staging ensures that source systems are only queried once for their data as they are often external and may be slower and costlier than obtaining the data from an internal system. Once data has been staged, other processes are able to utilize that data by various transforms and integrations with other data to provide the necessary structure needed for analysis.

Warehouse

The warehouse is a database that services the various analytics and monitoring tools. It may be populated with data derived from the staging system through roll-ups, augmentation with other data, cleaning, or simply copied directly.