Coverage vs GC content
Awareness of possible loss of accuracy in downstream sequencing results due to coverage bias
Last updated
Was this helpful?
Awareness of possible loss of accuracy in downstream sequencing results due to coverage bias
Last updated
Was this helpful?
Theoretical Method
Bin GC content of each region in the bam file into 5% intervals, and plot mean coverage across all regions that fall into each bin.
Technical Methods
Tool Used:
Waltz CountReads
aggregate_bam_metrics.sh
tables_module.py
plots_module.r
Input
Standard bam
Collapsed unfiltered bam
ACCESS pool A bed file
Output
sample_id-intervals.txt
Interpretations Extreme base compositions, i.e., GC-poor or GC-rich sequences, lead to an uneven coverage or even no coverage of reads across the genome. This can affect downstream small variant and copy number calling. Both of which rely on consistent sequencing depth across all regions. Ideally this plot should be as flat as possible. The above example depicts a slight decrease in coverage at really high GC-rich regions, but is a good result for ACCESS.