Access Quality Control (v1)
  • Introduction
  • Meta information per sample
  • Raw read-pair counts (standard BAM)
  • On Target Coverage
  • Fraction of reads mapping to the human genome
  • “On Bait” reads localized to ACCESS panel
  • Coverage vs GC content
  • Insert Size Distribution
  • Distribution of ACCESS panel A coverage values
  • Average Coverage, Sample Level, Pool A Targets
  • UMI Family types Composition (Pool A)
  • Average Coverage, Sample Level, Pool B Targets
  • UMI Family types Composition (Pool B)
  • Base Quality Recalibration Scores
  • UMI family sizes (Simplex reads)
  • UMI family sizes (Duplex reads)
  • Sample Level Noise
  • Noise by Substitution Type
  • Contributing Sites for Noise
  • Hotspots In Normals
  • Sample mix-up
  • (Un)expected (Mis)matches Tables
  • Major Contamination
  • Minor Contamination
  • Duplex Minor Contamination
  • Sex Mismatch
  • FAQ
Powered by GitBook
On this page

Was this helpful?

Export as PDF

UMI family sizes (Duplex reads)

Understanding the frequency of UMI families of different read counts

PreviousUMI family sizes (Simplex reads)NextSample Level Noise

Last updated 4 years ago

Was this helpful?

Theoretical Method

Similarly for the Simplex read pairs, we investigate the number of families of each discrete size for duplex reads, which consist of fragments with at least 1 read pair mapping on each of the top and bottom strands.

Technical Methods

  • Tools Used:

    • Marianas

    • make_umi_qc_tables.sh

  • Input

    • collapsed_R1_.fastq

    • collapsed_R2_.fastq

    • MSK-ACCESS-v1_0-A-on-target-positions.txt

    • MSK-ACCESS-v1_0-B-on-target-positions.txt

  • Output

    • family-sizes.txt

Interpretations

We expect duplex family size peak between 5 and 15 read pairs, which gives us confidence that there are enough unique molecules for adequate error correction during the collapsing process.