Logs¶
Ingest Log¶
The ingest pipeline logs messages to the console and also writes the same
messages to a log file. By default the log file is stored at:
<ingest package dir>/logs/ingest.log
The output directory for logs can be changed by setting the log_dir
attribute in the ingest package config file.
The name of the log file can be changed to include the ingest execution
timestamp by setting the overwrite_log
attribute to False. The default
is overwrite_log=True
which causes the log to be written to a file
called ingest.log
that gets overwritten each time the ingest pipeline
runs. If overwrite_log=False
, then a new log file will be written each time
the ingest pipeline runs, and it will follow this naming pattern:
ingest_<ISO 8601 timestamp>.log
Adding Custom Logs¶
In addition to the logs generated by the ingest library, you can also add
custom log messages to your extract configs and transform module. Instead of
print()
statements, you can have your custom log messages be delivered in
the same format as the rest of the logs.
To configure logging, import the logging
module and then get the logger
defined by the ingest library. After this, you can log messages.
For example, say in a do_after_read()
in an extract config,
my_extract_config.py
, you’d like to log the status of some data
manipulation task:
import logging
logger = logging.getLogger(__name__)
def do_after_read(df):
logger.debug("About to transpose the source data")
new_df = df.T # Transpose the source data
logger.info("Source data transposed")
return new_df
When --log_level
is set to info
, the above operation would print to
the terminal and log file:
2023-02-28 17:02:25,276 - my_extract_config - Thread: MainThread - INFO - Source data transposed
Stage Outputs¶
Ingest stages also write their own output to a directory that follows this path
pattern: <ingest package dir>/output/<stage>