Distributed Data Types¶
The Kids First is responsible for distributing a wide variety of data, in many different formats.
Genomic Data¶
Raw Reads - .fastq, .fq¶
.fastq and .fq are files that contain raw read data from the sequencer. These files contain simple ACTG nucleotide sequences and corresponding quality information.
Aligned Reads - .bam, .cram¶
.bam and .cram are formats that contain sequencer reads that have been aligned with a base genome.
Index Files - .bai, .crai, .tbi¶
Because genomic files can often reach up to several hundred gigabytes per file, they are often accompanied by an index file to reduce seek time.
Variant Calls - .vcf, .gvcf¶
.vcf and .gvcf are analyisis upon .bam and .cram files.
Imaging Data¶
DICOM¶
Dicom files are a general file format that can hold various types of data, although they will typically contain imagery in the context of Kids First.
Slide Images - .svs¶
.svs files are high-resolution images of tissue slides. The format is similar to .tiff, but with a hierarchical structure.