Access Quality Control (v1)
  • Introduction
  • Meta information per sample
  • Raw read-pair counts (standard BAM)
  • On Target Coverage
  • Fraction of reads mapping to the human genome
  • “On Bait” reads localized to ACCESS panel
  • Coverage vs GC content
  • Insert Size Distribution
  • Distribution of ACCESS panel A coverage values
  • Average Coverage, Sample Level, Pool A Targets
  • UMI Family types Composition (Pool A)
  • Average Coverage, Sample Level, Pool B Targets
  • UMI Family types Composition (Pool B)
  • Base Quality Recalibration Scores
  • UMI family sizes (Simplex reads)
  • UMI family sizes (Duplex reads)
  • Sample Level Noise
  • Noise by Substitution Type
  • Contributing Sites for Noise
  • Hotspots In Normals
  • Sample mix-up
  • (Un)expected (Mis)matches Tables
  • Major Contamination
  • Minor Contamination
  • Duplex Minor Contamination
  • Sex Mismatch
  • FAQ
Powered by GitBook
On this page

Was this helpful?

Export as PDF

UMI Family types Composition (Pool A)

Understanding the relative abundance of each fragment subtype

PreviousAverage Coverage, Sample Level, Pool A TargetsNextAverage Coverage, Sample Level, Pool B Targets

Last updated 4 years ago

Was this helpful?

Theoretical Method

Marianas performs read grouping based on the 6-base UMI sequence (three from each side of the DNA fragment), as well as the fragment start position (and stop position?). If multiple read pairs have the same information for these two metrics, they will be grouped into the same UMI "family".

UMI family types are defined by the following categories:

  • Duplex: both top and bottom strand were found for this fragment

  • Simplex: only one of (top|bottom) strand was sequenced, and >=3 copies for that strand were found

  • Sub-Simplex: exactly 2 copies of a single strand were found

  • Singletons: exactly 1 copy of a single strand was found

Technical Methods

  • Tool Used:

    • Marianas

    • make_umi_qc_tables.sh

    • plots_module.r

  • Input

    • Marianas collapsed fastqs

  • Output

    • family-types-A.txt

Interpretations

Duplex families are valuable for their low noise rate after collapsing, thus we'd like to see as high of a duplex "saturation" as possible. If this value is lower, we may not have captured enough of the original molecules to find both strands after PCR replication.