========================
Multiple Meltable Groups
========================

This can get tricky. There are many ways to deal with this, each with their
own pros and cons.

Scenario
========

You want to create KFDRC dataservice Participant and Phenotype entities from
data that looks like this:

.. csv-table::
    :header: "Participant", "Age", "Mass", "Cleft Ear", "Mass/Cleft Age", "Tennis Fingers", "Tennis Age"

    P1, 70, yes, yes, 60, no, 65
    P2, 80, no, yes, 80, no, 45

Note that there two different groups of phenotype measurements recorded at
different ages.

Correct Output
==============

.. csv-table::
    :header: "Participant", "Age", "Phenotype Name", "Phenotype Observed", "Phenotype Age"

    P1, 70, Mass, yes, 60
    P2, 80, Mass, no, 80
    P1, 70, "Cleft Ear", yes, 60
    P2, 80, "Cleft Ear", yes, 80
    P1, 70, "Tennis Fingers", no, 65
    P2, 80, "Tennis Fingers", no, 45

Non-Melting Solution
====================

You can avoid melting and use basic :ref:`Column-Stacking` with a triplet of
operations for each phenotype. Just be careful about operation order.


.. code-block:: Python

    operations = [
        # participants
        keep_map(
            in_col="Participant",
            out_col=CONCEPTS.PARTICIPANT.ID
        ),
        keep_map(
            in_col="Age",
            out_col=CONCEPTS.PARTICIPANT.AGE
        ),

        # massless phenotypes
        keep_map(
            in_col="Mass/Cleft Age",
            out_col=CONCEPTS.PHENOTYPE.AGE_AT_OBSERVATION
        ),
        constant_map(
            m="Mass",
            out_col=CONCEPTS.PHENOTYPE.NAME
        ),
        value_map(
            in_col="Mass",
            out_col=CONCEPTS.PHENOTYPE.OBSERVED,
            m={
               "yes": constants.PHENOTYPE.OBSERVED.POSITIVE,
               "no": constants.PHENOTYPE.OBSERVED.NEGATIVE
            }
        )

        # cleft elbow phenotypes
        keep_map(
            in_col="Mass/Cleft Age",
            out_col=CONCEPTS.PHENOTYPE.AGE_AT_OBSERVATION
        ),
        constant_map(
            m="Cleft Ear",
            out_col=CONCEPTS.PHENOTYPE.NAME
        ),
        value_map(
            in_col="Cleft Ear",
            out_col=CONCEPTS.PHENOTYPE.OBSERVED,
            m={
               "yes": constants.PHENOTYPE.OBSERVED.POSITIVE,
               "no": constants.PHENOTYPE.OBSERVED.NEGATIVE
            }
        )

        # tennis finger phenotypes
        keep_map(
            in_col="Tennis Age",
            out_col=CONCEPTS.PHENOTYPE.AGE_AT_OBSERVATION
        ),
        constant_map(
            m="Tennis Fingers",
            out_col=CONCEPTS.PHENOTYPE.NAME
        ),
        value_map(
            in_col="Tennis Fingers",
            out_col=CONCEPTS.PHENOTYPE.OBSERVED,
            m={
               "yes": constants.PHENOTYPE.OBSERVED.POSITIVE,
               "no": constants.PHENOTYPE.OBSERVED.NEGATIVE
            }
        )
    ]

As long as you consistently put the grouped operations together, the result
should be correct. If, however, you were to swap the two value_map operations
with each other, then you would be associating the wrong observation to each
phenotype.

Melting Solution
================

Use melt, and wrap the melt groups in :ref:`Nested-Operation-Sublists` to
safely navigate the column-length consequences of having multiple columns
melted together and then grouped with another shorter column which is then
lengthened by :ref:`Column-Stacking` from another melt group.

.. code-block:: Python

    operations = [
        # participants
        keep_map(
            in_col="Participant",
            out_col=CONCEPTS.PARTICIPANT.ID
        ),
        keep_map(
            in_col="Age",
            out_col=CONCEPTS.PARTICIPANT.AGE
        ),

        # mass/cleft phenotypes group
        [
            keep_map(
                in_col="Mass/Cleft Age",
                out_col=CONCEPT.PHENOTYPE.EVENT_AGE_DAYS
            ),
            melt_map(
                var_name=CONCEPT.PHENOTYPE.NAME,
                map_for_vars={
                    "Mass": "Mass",
                    "Cleft Ear": "Cleft Ear"
                },
                value_name=CONCEPT.PHENOTYPE.OBSERVED,
                map_for_values={
                    "yes": constants.PHENOTYPE.OBSERVED.POSITIVE,
                    "no": constants.PHENOTYPE.OBSERVED.NEGATIVE
                }
            )
        ],

        # tennis fingers phenotype group
        [
            keep_map(
                in_col="Tennis Age",
                out_col=CONCEPT.PHENOTYPE.EVENT_AGE_DAYS
            ),
            melt_map(
                var_name=CONCEPT.PHENOTYPE.NAME,
                map_for_vars={
                    "Tennis Fingers": "Tennis Fingers"
                },
                value_name=CONCEPT.PHENOTYPE.OBSERVED,
                map_for_values={
                    "yes": constants.PHENOTYPE.OBSERVED.POSITIVE,
                    "no": constants.PHENOTYPE.OBSERVED.NEGATIVE
                }
            )
        ],

.. caution::

    Without nested sublists clustering the operations into groups, the result
    would be wrong.

Melting Solution With Multiple Smaller Extracts
===============================================

Nothing says that you need to extract the whole file all at once. You can also
choose to virtually divide the source data into simple chunks, which is another
way of dealing with the complex length stacking problem:

.. csv-table::
    :header: "Participant", "Age"

    P1, 70
    P2, 80

.. code-block:: Python

    operations = [
        keep_map(
            in_col="Participant",
            out_col=CONCEPTS.PARTICIPANT.ID
        ),
        keep_map(
            in_col="Age",
            out_col=CONCEPTS.PARTICIPANT.AGE
        )
    ]

and

.. csv-table::
    :header: "Participant", "Mass/Cleft Age", "Mass", "Cleft Ear"

    P1, 60, yes, yes
    P2, 80, no, yes

.. code-block:: Python

    operations = [
        keep_map(
            in_col="Participant",
            out_col=CONCEPTS.PARTICIPANT.ID
        ),
        keep_map(
            in_col="Mass/Cleft Age",
            out_col=CONCEPT.PHENOTYPE.EVENT_AGE_DAYS
        ),
        melt_map(
            var_name=CONCEPT.PHENOTYPE.NAME,
            map_for_vars={
                "Mass": "Mass",
                "Cleft Ear": "Cleft Ear"
            },
            value_name=CONCEPT.PHENOTYPE.OBSERVED,
            map_for_values={
                "yes": constants.PHENOTYPE.OBSERVED.POSITIVE,
                "no": constants.PHENOTYPE.OBSERVED.NEGATIVE
            }
        )
    ]

and

.. csv-table::
    :header: "Participant", "Tennis Age", "Tennis Fingers"

    P1, 65, no
    P2, 45, no

.. code-block:: Python

    operations = [
        keep_map(
            in_col="Participant",
            out_col=CONCEPTS.PARTICIPANT.ID
        ),
        keep_map(
            in_col="Tennis Age",
            out_col=CONCEPT.PHENOTYPE.EVENT_AGE_DAYS
        ),
        melt_map(
            var_name=CONCEPT.PHENOTYPE.NAME,
            map_for_vars={
                "Tennis Fingers": "Tennis Fingers"
            },
            value_name=CONCEPT.PHENOTYPE.OBSERVED,
            map_for_values={
                "yes": constants.PHENOTYPE.OBSERVED.POSITIVE,
                "no": constants.PHENOTYPE.OBSERVED.NEGATIVE
            }
        )
    ]