Multiple Meltable Groups

This can get tricky. There are many ways to deal with this, each with their own pros and cons.

Scenario

You want to create KFDRC dataservice Participant and Phenotype entities from data that looks like this:

Participant

Age

Mass

Cleft Ear

Mass/Cleft Age

Tennis Fingers

Tennis Age

P1

70

yes

yes

60

no

65

P2

80

no

yes

80

no

45

Note that there two different groups of phenotype measurements recorded at different ages.

Correct Output

Participant

Age

Phenotype Name

Phenotype Observed

Phenotype Age

P1

70

Mass

yes

60

P2

80

Mass

no

80

P1

70

Cleft Ear

yes

60

P2

80

Cleft Ear

yes

80

P1

70

Tennis Fingers

no

65

P2

80

Tennis Fingers

no

45

Non-Melting Solution

You can avoid melting and use basic Parallel Column Stacking with a triplet of operations for each phenotype. Just be careful about operation order.

operations = [
    # participants
    keep_map(
        in_col="Participant",
        out_col=CONCEPTS.PARTICIPANT.ID
    ),
    keep_map(
        in_col="Age",
        out_col=CONCEPTS.PARTICIPANT.AGE
    ),

    # massless phenotypes
    keep_map(
        in_col="Mass/Cleft Age",
        out_col=CONCEPTS.PHENOTYPE.AGE_AT_OBSERVATION
    ),
    constant_map(
        m="Mass",
        out_col=CONCEPTS.PHENOTYPE.NAME
    ),
    value_map(
        in_col="Mass",
        out_col=CONCEPTS.PHENOTYPE.OBSERVED,
        m={
           "yes": constants.PHENOTYPE.OBSERVED.POSITIVE,
           "no": constants.PHENOTYPE.OBSERVED.NEGATIVE
        }
    )

    # cleft elbow phenotypes
    keep_map(
        in_col="Mass/Cleft Age",
        out_col=CONCEPTS.PHENOTYPE.AGE_AT_OBSERVATION
    ),
    constant_map(
        m="Cleft Ear",
        out_col=CONCEPTS.PHENOTYPE.NAME
    ),
    value_map(
        in_col="Cleft Ear",
        out_col=CONCEPTS.PHENOTYPE.OBSERVED,
        m={
           "yes": constants.PHENOTYPE.OBSERVED.POSITIVE,
           "no": constants.PHENOTYPE.OBSERVED.NEGATIVE
        }
    )

    # tennis finger phenotypes
    keep_map(
        in_col="Tennis Age",
        out_col=CONCEPTS.PHENOTYPE.AGE_AT_OBSERVATION
    ),
    constant_map(
        m="Tennis Fingers",
        out_col=CONCEPTS.PHENOTYPE.NAME
    ),
    value_map(
        in_col="Tennis Fingers",
        out_col=CONCEPTS.PHENOTYPE.OBSERVED,
        m={
           "yes": constants.PHENOTYPE.OBSERVED.POSITIVE,
           "no": constants.PHENOTYPE.OBSERVED.NEGATIVE
        }
    )
]

As long as you consistently put the grouped operations together, the result should be correct. If, however, you were to swap the two value_map operations with each other, then you would be associating the wrong observation to each phenotype.

Melting Solution

Use melt, and wrap the melt groups in Nested Operations Sublists to safely navigate the column-length consequences of having multiple columns melted together and then grouped with another shorter column which is then lengthened by Parallel Column Stacking from another melt group.

operations = [
    # participants
    keep_map(
        in_col="Participant",
        out_col=CONCEPTS.PARTICIPANT.ID
    ),
    keep_map(
        in_col="Age",
        out_col=CONCEPTS.PARTICIPANT.AGE
    ),

    # mass/cleft phenotypes group
    [
        keep_map(
            in_col="Mass/Cleft Age",
            out_col=CONCEPT.PHENOTYPE.EVENT_AGE_DAYS
        ),
        melt_map(
            var_name=CONCEPT.PHENOTYPE.NAME,
            map_for_vars={
                "Mass": "Mass",
                "Cleft Ear": "Cleft Ear"
            },
            value_name=CONCEPT.PHENOTYPE.OBSERVED,
            map_for_values={
                "yes": constants.PHENOTYPE.OBSERVED.POSITIVE,
                "no": constants.PHENOTYPE.OBSERVED.NEGATIVE
            }
        )
    ],

    # tennis fingers phenotype group
    [
        keep_map(
            in_col="Tennis Age",
            out_col=CONCEPT.PHENOTYPE.EVENT_AGE_DAYS
        ),
        melt_map(
            var_name=CONCEPT.PHENOTYPE.NAME,
            map_for_vars={
                "Tennis Fingers": "Tennis Fingers"
            },
            value_name=CONCEPT.PHENOTYPE.OBSERVED,
            map_for_values={
                "yes": constants.PHENOTYPE.OBSERVED.POSITIVE,
                "no": constants.PHENOTYPE.OBSERVED.NEGATIVE
            }
        )
    ],

Caution

Without nested sublists clustering the operations into groups, the result would be wrong.

Melting Solution With Multiple Smaller Extracts

Nothing says that you need to extract the whole file all at once. You can also choose to virtually divide the source data into simple chunks, which is another way of dealing with the complex length stacking problem:

Participant

Age

P1

70

P2

80

operations = [
    keep_map(
        in_col="Participant",
        out_col=CONCEPTS.PARTICIPANT.ID
    ),
    keep_map(
        in_col="Age",
        out_col=CONCEPTS.PARTICIPANT.AGE
    )
]

and

Participant

Mass/Cleft Age

Mass

Cleft Ear

P1

60

yes

yes

P2

80

no

yes

operations = [
    keep_map(
        in_col="Participant",
        out_col=CONCEPTS.PARTICIPANT.ID
    ),
    keep_map(
        in_col="Mass/Cleft Age",
        out_col=CONCEPT.PHENOTYPE.EVENT_AGE_DAYS
    ),
    melt_map(
        var_name=CONCEPT.PHENOTYPE.NAME,
        map_for_vars={
            "Mass": "Mass",
            "Cleft Ear": "Cleft Ear"
        },
        value_name=CONCEPT.PHENOTYPE.OBSERVED,
        map_for_values={
            "yes": constants.PHENOTYPE.OBSERVED.POSITIVE,
            "no": constants.PHENOTYPE.OBSERVED.NEGATIVE
        }
    )
]

and

Participant

Tennis Age

Tennis Fingers

P1

65

no

P2

45

no

operations = [
    keep_map(
        in_col="Participant",
        out_col=CONCEPTS.PARTICIPANT.ID
    ),
    keep_map(
        in_col="Tennis Age",
        out_col=CONCEPT.PHENOTYPE.EVENT_AGE_DAYS
    ),
    melt_map(
        var_name=CONCEPT.PHENOTYPE.NAME,
        map_for_vars={
            "Tennis Fingers": "Tennis Fingers"
        },
        value_name=CONCEPT.PHENOTYPE.OBSERVED,
        map_for_values={
            "yes": constants.PHENOTYPE.OBSERVED.POSITIVE,
            "no": constants.PHENOTYPE.OBSERVED.NEGATIVE
        }
    )
]