UMI Family types Composition (Pool A)

Understanding the relative abundance of each fragment subtype

Theoretical Method

Marianas performs read grouping based on the 6-base UMI sequence (three from each side of the DNA fragment), as well as the fragment start position (and stop position?). If multiple read pairs have the same information for these two metrics, they will be grouped into the same UMI "family".

UMI family types are defined by the following categories:

  • Duplex: both top and bottom strand were found for this fragment

  • Simplex: only one of (top|bottom) strand was sequenced, and >=3 copies for that strand were found

  • Sub-Simplex: exactly 2 copies of a single strand were found

  • Singletons: exactly 1 copy of a single strand was found

Technical Methods

  • Tool Used:

    • Marianas

    • make_umi_qc_tables.sh

    • plots_module.r

  • Input

    • Marianas collapsed fastqs

  • Output

    • family-types-A.txt

Interpretations

Duplex families are valuable for their low noise rate after collapsing, thus we'd like to see as high of a duplex "saturation" as possible. If this value is lower, we may not have captured enough of the original molecules to find both strands after PCR replication.

Last updated