Distributed Data Types

The Kids First is responsible for distributing a wide variety of data, in many different formats.

Genomic Data

Raw Reads - .fastq, .fq

.fastq and .fq are files that contain raw read data from the sequencer. These files contain simple ACTG nucleotide sequences and corresponding quality information.

Read more about FASTQ

Aligned Reads - .bam, .cram

.bam and .cram are formats that contain sequencer reads that have been aligned with a base genome.

Read more about BAM Read more about CRAM

Index Files - .bai, .crai, .tbi

Because genomic files can often reach up to several hundred gigabytes per file, they are often accompanied by an index file to reduce seek time.

Read more about Tabix index files

Variant Calls - .vcf, .gvcf

.vcf and .gvcf are analyisis upon .bam and .cram files.

Read more about VCF Read more about GVCF

Imaging Data

DICOM

Dicom files are a general file format that can hold various types of data, although they will typically contain imagery in the context of Kids First.

Read more about the DICOM format

Slide Images - .svs

.svs files are high-resolution images of tissue slides. The format is similar to .tiff, but with a hierarchical structure.

Read more about the SVS format