Data That Only Makes Sense In Relation To Another File

Scenario

The source_data_url gives you access to one data file, but you need to access one or more other files to make sense of your source data.

File Fetching Solution

You can access the file fetching methods yourelf to load the data from as many secondary files as you want. For instance, for a secondary TSV file, you can do:

from kf_lib_data_ingest.common.file_retriever import FileRetriever
from kf_lib_data_ingest.common.io import read_df

file_df = read_df(
    FileRetriever().get("<URL for your TSV file>")
)

Keep in mind that in_col arguments in the operations list can only read from either the primary source data file referenced by source_data_url or the DataFrame returned by the do_after_read function.