Multiple Columns of The Same Kind of Thing

Scenario

You want to extract specimen IDs and file names from data that looks like this:

Specimen

BAM File

Index File

S1

S1.bam

S1.index

S2

S2.bam

S2.index

Note that there are two columns with file names in them.

Correct Output

BIOSPECIMEN.ID

GENOMIC_FILE.FILE_NAME

S1

S1.bam

S2

S2.bam

S1

S1.index

S2

S2.index

Solution with Parallel Column Stacking

Use Parallel Column Stacking to send both of the file columns to “File Name”:

operations = [
    keep_map(in_col="Specimen", out_col=CONCEPTS.BIOSPECIMEN.ID),
    keep_map(in_col="BAM File", out_col=CONCEPTS.GENOMIC_FILE.FILE_NAME),
    keep_map(in_col="Index File", out_col=CONCEPTS.GENOMIC_FILE.FILE_NAME)
]

Solution with Multiple Extractions

Do separate extractions. One with:

operations = [
    keep_map(in_col="Specimen", out_col=CONCEPTS.BIOSPECIMEN.ID),
    keep_map(in_col="BAM File", out_col=CONCEPTS.GENOMIC_FILE.FILE_NAME),
]

And one with:

operations = [
    keep_map(in_col="Specimen", out_col=CONCEPTS.BIOSPECIMEN.ID),
    keep_map(in_col="Index File", out_col=CONCEPTS.GENOMIC_FILE.FILE_NAME),
]

And then stack them later in your Transform function.