Fraction of reads mapping to the human genome

Ensure there is adequate mapping of sequenced reads to the human genome

Theoretical Method

This metric is obtained by iterating through the bam file, and looking at the sam flag which indicates whether each read has an adequate mapping to the HG19 reference.

Technical Methods

Waltz uses a method from the SAMRecord Class of the HTSJDK library:

SAMRecord.getReadUnmappedFlag()

Note: This method is distinct from getProperPairFlag(),which will only consider reads which are mapped in a proper pair.

  • Tool Used

    • waltz.jar CountReads

    • Aggregate_bam_metrics.sh

    • tables_module.py (TotalMapped / TotalReads)

    • plots_module.r

  • Input

    • Standard Bam (tables also produced for U / S / D bams)

  • Output

    • Text file with read count information: “sample_id.bam.read-counts.txt”

Interpretations

Mapping fraction to the human genome should be above 97%, in most cases if it is below that, there is a chance that there is contamination from another species.

Note: this metric come from the standard bam, and is calculated across the entire bam file (as opposed to pool A or pool B on their own)

Last updated