Values Embedded in Other Values

Scenario

You have a column of file names that secretly also contain other values like specimen ID and sequencing library.

File Name

BSID1234_SL1234.bam

BSID1235_SL1235.bam

BSID1236_SL1236.bam

BSID1237_SL1237.bam

Correct Output

BIOSPECIMEN.ID

SEQUENCING.LIBRARY_NAME

GENOMIC_FILE.FILE_NAME

BSID1234

SL1234

BSID1234_SL1234.bam

BSID1235

SL1235

BSID1235_SL1235.bam

BSID1236

SL1236

BSID1236_SL1236.bam

BSID1237

SL1237

BSID1237_SL1237.bam

Solution with Regular Expressions

operations = [
    keep_map(
        in_col="File Name",
        out_col=CONCEPT.GENOMIC_FILE.FILE_NAME
    ),
    value_map(
        in_col="File Name",
        out_col=CONCEPT.BIOSPECIMEN.ID,
        m=r'^([^_]+).+$'
    ),
    value_map(
        in_col="File Name",
        out_col=CONCEPT.SEQUENCING.LIBRARY_NAME,
        m=r'^[^_]+_(.+)$'
    )
]

Solution with Functions

operations = [
    keep_map(
        in_col="File Name",
        out_col=CONCEPT.GENOMIC_FILE.FILE_NAME
    ),
    value_map(
        in_col="File Name",
        out_col=CONCEPT.BIOSPECIMEN.ID,
        m=lambda x: x.split('_')[0]
    ),
    value_map(
        in_col="File Name",
        out_col=CONCEPT.SEQUENCING.LIBRARY_NAME,
        m=lambda x: x.split('_')[1].split('.')[0]
    )
]