UMI Family types Composition (Pool A)
Understanding the relative abundance of each fragment subtype
Last updated
Was this helpful?
Understanding the relative abundance of each fragment subtype
Last updated
Was this helpful?
Theoretical Method
Marianas performs read grouping based on the 6-base UMI sequence (three from each side of the DNA fragment), as well as the fragment start position (and stop position?). If multiple read pairs have the same information for these two metrics, they will be grouped into the same UMI "family".
UMI family types are defined by the following categories:
Duplex: both top and bottom strand were found for this fragment
Simplex: only one of (top|bottom) strand was sequenced, and >=3 copies for that strand were found
Sub-Simplex: exactly 2 copies of a single strand were found
Singletons: exactly 1 copy of a single strand was found
Technical Methods
Tool Used:
Marianas
make_umi_qc_tables.sh
plots_module.r
Input
Marianas collapsed fastqs
Output
family-types-A.txt
Interpretations
Duplex families are valuable for their low noise rate after collapsing, thus we'd like to see as high of a duplex "saturation" as possible. If this value is lower, we may not have captured enough of the original molecules to find both strands after PCR replication.