Design Overview¶

The ultimate goal for ingest is to take raw investigator data and compose from it a series of simple and unambiguous factual statements of the form:

There is a thing X.
Thing X has properties A, B, and C.
Thing Y comes from thing X.

Example:

There is a participant P1.
Participant P1 has properties age=7, sex=male, and race=unknown.
There is a biospecimen S1.
Biospecimen S1 comes from Participant P1.

For the most part, the investigator’s raw data tables already implicitly encode the above statements as row colinearity. When the investigator puts a specimen ID in the same row as a participant ID, usually that means that the specimen came from that participant.

For instance, the above statements might have been written like this:

Participant ID	Specimen ID	Participant Age	Participant Sex	Participant Race
P1	S1	7	m	unknown

The Extract stage exists partly to fix any source data that either does not encode the three fundamental statements by row colinearity or accidentally encodes a relationship that does not actually exist among the things being described.

Contents