Coverage vs GC content

Awareness of possible loss of accuracy in downstream sequencing results due to coverage bias

Theoretical Method

Bin GC content of each region in the bam file into 5% intervals, and plot mean coverage across all regions that fall into each bin.

Technical Methods

  • Tool Used:

    • Waltz CountReads

    • aggregate_bam_metrics.sh

    • tables_module.py

    • plots_module.r

  • Input

    • Standard bam

    • Collapsed unfiltered bam

    • ACCESS pool A bed file

  • Output

    • sample_id-intervals.txt

Interpretations Extreme base compositions, i.e., GC-poor or GC-rich sequences, lead to an uneven coverage or even no coverage of reads across the genome. This can affect downstream small variant and copy number calling. Both of which rely on consistent sequencing depth across all regions. Ideally this plot should be as flat as possible. The above example depicts a slight decrease in coverage at really high GC-rich regions, but is a good result for ACCESS.

Last updated