Logs

Ingest Log

The ingest pipeline logs messages to the console and also writes the same messages to a log file. By default the log file is stored at: <ingest package dir>/logs/ingest.log

The output directory for logs can be changed by setting the log_dir attribute in the ingest package config file.

The name of the log file can be changed to include the ingest execution timestamp by setting the overwrite_log attribute to False. The default is overwrite_log=True which causes the log to be written to a file called ingest.log that gets overwritten each time the ingest pipeline runs. If overwrite_log=False, then a new log file will be written each time the ingest pipeline runs, and it will follow this naming pattern: ingest_<ISO 8601 timestamp>.log

Adding Custom Logs

In addition to the logs generated by the ingest library, you can also add custom log messages to your extract configs and transform module. Instead of print() statements, you can have your custom log messages be delivered in the same format as the rest of the logs.

To configure logging, import the logging module and then get the logger defined by the ingest library. After this, you can log messages.

For example, say in a do_after_read() in an extract config, my_extract_config.py, you’d like to log the status of some data manipulation task:

import logging

logger = logging.getLogger(__name__)

def do_after_read(df):
    logger.debug("About to transpose the source data")
    new_df = df.T # Transpose the source data
    logger.info("Source data transposed")
    return new_df

When --log_level is set to info, the above operation would print to the terminal and log file:

2023-02-28 17:02:25,276 - my_extract_config - Thread: MainThread - INFO - Source data transposed

Stage Outputs

Ingest stages also write their own output to a directory that follows this path pattern: <ingest package dir>/output/<stage>