🔏
Marianas
  • Introduction
  • Quick Usage
  • Detailed Usage
  • Consensus-Calling Algorithm
  • Read Name Information
  • FAQ
  • Unanswered Q's
Powered by GitBook
On this page
  • In depth example
  • Initial fastqs:
  • Collapsed Bams:

Was this helpful?

Export as PDF

Read Name Information

PreviousConsensus-Calling AlgorithmNextFAQ

Last updated 4 years ago

Was this helpful?

In each bam that has been processed with Marianas, the read name holds information about the number of original "uncollapsed" reads were used to generate the current read that is being viewed. The colon-separated fields are:

Marianas
UMI1+UMI2
read 1 contig
read 1 start
read 1 Positive Strand Read Count
read 1 Negative Strand Read Count

read 2 contig
read 2 start
read 2 Positive Strand Read Count
read 2 Negative Strand Read Count

Here is an example read name:

Marianas:ACT+TTA:2:48033828:4:3:2:48033899:4:3

In this example, we are looking at a read that maps to position 48033828 on contig 2, and this consensus read was generated from 4 reads on the positive strand, and 3 reads on the negative strand. Here is an image to describe this example:

In depth example

Let's explore the transformation from the reads in a fastq to a collapsed bam, for a single duplex family that comes from two paired reads on each strand of the original DNA template (four read pairs total). Note that base qualities and read names have been removed for clarity.

Initial fastqs:

Here are two reads from PCR duplicates of the (+) strand, as they appear in the read 1 fastq (with a space separating the 3bp UMI from the rest of the read for clarity). These two reads have the exact same sequence:

TCC TCAGGCTGCCGTCCCGCAGGAGCGAGTCCCGAGGCGCCGTGCGCATCAAGGTGCTGGACGTGCTGTCCTTTGTGCTGCTCATCAACAGGCAGTTCTATGAGGTGCGTGTCCAGGCGGCCGCAG
+
TCC TCAGGCTGCCGTCCCGCAGGAGCGAGTCCCGAGGCGCCGTGCGCATCAAGGTGCTGGACGTGCTGTCCTTTGTGCTGCTCATCAACAGGCAGTTCTATGAGGTGCGTGTCCAGGCGGCCGCAG
+

Here are the corresponding reads from the (-) strand, also found in the read 1 fastq:

CGA GTAAATAGCCCTGAGCCCCCAGCTGCGGCCGCCTGGACACGCACCTCATAGAACTGCCTGTTGATGAGCAGCACAAAGGACAGCACGTCCAGCACCTTGATGCGCACGGCGCCTCGGGACTCG
+
CGA GTAAATAGCCCTGAGCCCCCAGCTGCGGCCGCCTGGACACGCACCTCATAGAACTGCCTGTTGATGAGCAGCACAAAGGACAGCACGTCCAGCACCTTGATGCGCACGGCGCCTCGGGACTCG
+

Read 2 of the two paired fastqs is not shown here, but it will also have 4 reads that represent the mates of the four reads in this fastq.

Collapsed Bams:

After UMI clipping, mapping using BWA, and collapsing, the two collapsed reads from theduplex.bam file look like this:

Marianas:CGA+TCC:16:2112955:2:2:16:2112976:2:2	97	16	2112958	60	116M	=	2112979	136	GCTGCCGTCCCGCAGGAGCGAGTCCCGAGGCGCCGTGCGCATCAAGGTGCTGGACGTGCTGTCCTTTGTGCTGCTCATCAACAGGCAGTTCTATGAGGTGCGTGTCCAGGCGGCCG	{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{	RG:Z:test_sample_1	NM:i:0	MQ:i:60	AS:i:116	XS:i:0
Marianas:CGA+TCC:16:2112955:2:2:16:2112976:2:2	145	16	2112979	60	115M	=	2112958	-136	GTCCCGAGGCGCCGTGCGCATCAAGGTGCTGGACGTGCTGTCCTTTGTGCTGCTCATCAACAGGCAGTTCTATGAGGTGCGTGTCCAGGCGGCCGCAGCTGGGGGCTCAGGGCTA	{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{	RG:Z:test_sample_1	NM:i:0	MQ:i:60	AS:i:115	XS:i:0

Here are the key observations to take away from how the mapping and UMI information is represented:

  1. The original 4 read pairs, two from (+) and 2 from (-) strand, have been collapsed into a single read pair. Thus we've taken the four reads from the "left" cluster of reads and turned them into a single read, and similarly for the "right" cluster.

  2. The two read 1's that mapped in the (+) direction, along with the two read 2's that mapped in the (+) direction, are now represented as a single read that maps in the (+) direction (as designated by the 136 TLEN field). This is also represented by the sam flag of 97, which does not have the "read reverse strand" flag set.

  3. Similarly, the 2 read 2's that mapped in the (-) direction, along with the 2 reads 1's that originally mapped in the (-) direction, are now represented as a single read that maps in the (-) direction (as designated by the -136 TLEN field). This is also represented by the sam flag of 145, which has the "read reverse strand" flag set.

  4. The "left" and "right" UMI's from the fastqs (TCC and CGA) have been concatenated with a "+" and placed in the read name of the 2 collapsed reads. Marianas will order them alphabetically in the read names.

22KB
Illumina_UMI_Adapters_Reads.docx
UMI Adapter Information