UMI Family types Composition (Pool A)

Understanding the relative abundance of each fragment subtype

Theoretical Method

Marianas performs read grouping based on the 6-base UMI sequence (three from each side of the DNA fragment), as well as the fragment start position ~~(and stop position?)~~. If multiple read pairs have the same information for these two metrics, they will be grouped into the same UMI "family".

UMI family types are defined by the following categories:

Duplex: both top and bottom strand were found for this fragment
Simplex: only one of (top|bottom) strand was sequenced, and >=3 copies for that strand were found
Sub-Simplex: exactly 2 copies of a single strand were found
Singletons: exactly 1 copy of a single strand was found

Technical Methods

Tool Used:
- Marianas
- make_umi_qc_tables.sh
- plots_module.r
Input
- Marianas collapsed fastqs
Output
- family-types-A.txt

Interpretations

Duplex families are valuable for their low noise rate after collapsing, thus we'd like to see as high of a duplex "saturation" as possible. If this value is lower, we may not have captured enough of the original molecules to find both strands after PCR replication.

PreviousAverage Coverage, Sample Level, Pool A Targets NextAverage Coverage, Sample Level, Pool B Targets

Last updated 5 years ago

Was this helpful?