Values Embedded in Other Values¶
Scenario¶
You have a column of file names that secretly also contain other values like specimen ID and sequencing library.
File Name |
---|
BSID1234_SL1234.bam |
BSID1235_SL1235.bam |
BSID1236_SL1236.bam |
BSID1237_SL1237.bam |
Correct Output¶
BIOSPECIMEN.ID |
SEQUENCING.LIBRARY_NAME |
GENOMIC_FILE.FILE_NAME |
---|---|---|
BSID1234 |
SL1234 |
BSID1234_SL1234.bam |
BSID1235 |
SL1235 |
BSID1235_SL1235.bam |
BSID1236 |
SL1236 |
BSID1236_SL1236.bam |
BSID1237 |
SL1237 |
BSID1237_SL1237.bam |
Solution with Regular Expressions¶
operations = [
keep_map(
in_col="File Name",
out_col=CONCEPT.GENOMIC_FILE.FILE_NAME
),
value_map(
in_col="File Name",
out_col=CONCEPT.BIOSPECIMEN.ID,
m=r'^([^_]+).+$'
),
value_map(
in_col="File Name",
out_col=CONCEPT.SEQUENCING.LIBRARY_NAME,
m=r'^[^_]+_(.+)$'
)
]
Solution with Functions¶
operations = [
keep_map(
in_col="File Name",
out_col=CONCEPT.GENOMIC_FILE.FILE_NAME
),
value_map(
in_col="File Name",
out_col=CONCEPT.BIOSPECIMEN.ID,
m=lambda x: x.split('_')[0]
),
value_map(
in_col="File Name",
out_col=CONCEPT.SEQUENCING.LIBRARY_NAME,
m=lambda x: x.split('_')[1].split('.')[0]
)
]