Source Data stored by the Kids First Data Tracker¶
If you are developing an ingest package for Kids First study data, then the source data files for the study will be managed by the Kids First Data Tracker.
The Extract Stage tutorial uses an unprotected file for example purposes, but, in a more realistic scenario, the source data files for a study would have been uploaded to the Data Tracker, and your ingest package would need to be configured to access these files with the proper authorization headers.
Note
Uploading files to the Data Tracker is outside of the scope of this tutorial. Read the Study Creator API docs or visit the Data Tracker web app to learn more about how to do that.
Get Your Developer Token¶
All source data files in the Data Tracker are protected by some sort of authentication depending on the type of user trying to access them. As an ingest package developer, you will need a developer token to access files. The developer token grants authorization to download any source data file.
To generate a token, go to https://kf-ui-data-tracker.kidsfirstdrc.org/tokens
Set Your Environment¶
Now that you have your token, you’re going to store it in your shell environment so that the ingest app can read it and use it when fetching files through the Study Creator API.
In the configuration for your local shell environment, add the following:
export KF_STUDY_CREATOR_API_TOKEN=<YOUR_TOKEN>
Be careful with this token and make sure to keep it secret.
Update Your Extract Config to get the file from the Data Tracker¶
The clinical.tsv
and family_and_phenotype.tsv
source data files have
already been uploaded to the SD_ME0WME0W
study via the Data Tracker
service.
For example, the Data Tracker URL for the family_and_phenotypes.tsv
file is:
Replace the value of source_data_url
parameter in extract_config.py
,
with the Data Tracker URL above. Note that you will not be able to manually
download from that URL without the correct authorization headers.
# family_and_phenotype.tsv
source_data_url = 'https://kf-study-creator.kidsfirstdrc.org/download/study/SD_ME0WME0W/file/SF_HH5PMCJX/version/FV_5H7MEHG2'
Try It¶
If you try running ingest and the file was fetched successfully, you should see something like this in your log:
2019-04-24 11:19:31,719 - FileRetriever - INFO - Selected `token` authentication to fetch https://kf-study-creator.kidsfirstdrc.org/download/study/SD_ME0WME0W/file/SF_HH5PMCJX/version/FV_5H7MEHG2
2019-04-24 11:19:32,269 - kf_lib_data_ingest.network.utils - INFO - Successfully fetched https://kf-study-creator.kidsfirstdrc.org/download/study/SD_ME0WME0W/file/SF_HH5PMCJX/version/FV_5H7MEHG2 with original file name "family_and_phenotype.tsv"
Don’t worry if your ingest package fails validation. You will learn how to fix this in a later section.