Read Name Information
In each bam that has been processed with Marianas, the read name holds information about the number of original "uncollapsed" reads were used to generate the current read that is being viewed. The colon-separated fields are:
Marianas
UMI1+UMI2
read 1 contig
read 1 start
read 1 Positive Strand Read Count
read 1 Negative Strand Read Count
read 2 contig
read 2 start
read 2 Positive Strand Read Count
read 2 Negative Strand Read Count
Here is an example read name:
Marianas:ACT+TTA:2:48033828:4:3:2:48033899:4:3
In this example, we are looking at a read that maps to position 48033828
on contig 2
, and this consensus read was generated from 4 reads on the positive strand, and 3 reads on the negative strand. Here is an image to describe this example:

In depth example
Let's explore the transformation from the reads in a fastq to a collapsed bam, for a single duplex family that comes from two paired reads on each strand of the original DNA template (four read pairs total). Note that base qualities and read names have been removed for clarity.
Initial fastqs:
Here are two reads from PCR duplicates of the (+) strand, as they appear in the read 1 fastq (with a space separating the 3bp UMI from the rest of the read for clarity). These two reads have the exact same sequence:
TCC TCAGGCTGCCGTCCCGCAGGAGCGAGTCCCGAGGCGCCGTGCGCATCAAGGTGCTGGACGTGCTGTCCTTTGTGCTGCTCATCAACAGGCAGTTCTATGAGGTGCGTGTCCAGGCGGCCGCAG
+
TCC TCAGGCTGCCGTCCCGCAGGAGCGAGTCCCGAGGCGCCGTGCGCATCAAGGTGCTGGACGTGCTGTCCTTTGTGCTGCTCATCAACAGGCAGTTCTATGAGGTGCGTGTCCAGGCGGCCGCAG
+
Here are the corresponding reads from the (-) strand, also found in the read 1 fastq:
CGA GTAAATAGCCCTGAGCCCCCAGCTGCGGCCGCCTGGACACGCACCTCATAGAACTGCCTGTTGATGAGCAGCACAAAGGACAGCACGTCCAGCACCTTGATGCGCACGGCGCCTCGGGACTCG
+
CGA GTAAATAGCCCTGAGCCCCCAGCTGCGGCCGCCTGGACACGCACCTCATAGAACTGCCTGTTGATGAGCAGCACAAAGGACAGCACGTCCAGCACCTTGATGCGCACGGCGCCTCGGGACTCG
+
Read 2 of the two paired fastqs is not shown here, but it will also have 4 reads that represent the mates of the four reads in this fastq.
Collapsed Bams:
After UMI clipping, mapping using BWA, and collapsing, the two collapsed reads from theduplex.bam
file look like this:
Marianas:CGA+TCC:16:2112955:2:2:16:2112976:2:2 97 16 2112958 60 116M = 2112979 136 GCTGCCGTCCCGCAGGAGCGAGTCCCGAGGCGCCGTGCGCATCAAGGTGCTGGACGTGCTGTCCTTTGTGCTGCTCATCAACAGGCAGTTCTATGAGGTGCGTGTCCAGGCGGCCG {{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{ RG:Z:test_sample_1 NM:i:0 MQ:i:60 AS:i:116 XS:i:0
Marianas:CGA+TCC:16:2112955:2:2:16:2112976:2:2 145 16 2112979 60 115M = 2112958 -136 GTCCCGAGGCGCCGTGCGCATCAAGGTGCTGGACGTGCTGTCCTTTGTGCTGCTCATCAACAGGCAGTTCTATGAGGTGCGTGTCCAGGCGGCCGCAGCTGGGGGCTCAGGGCTA {{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{ RG:Z:test_sample_1 NM:i:0 MQ:i:60 AS:i:115 XS:i:0
Here are the key observations to take away from how the mapping and UMI information is represented:
The original 4 read pairs, two from (+) and 2 from (-) strand, have been collapsed into a single read pair. Thus we've taken the four reads from the "left" cluster of reads and turned them into a single read, and similarly for the "right" cluster.
The two read 1's that mapped in the (+) direction, along with the two read 2's that mapped in the (+) direction, are now represented as a single read that maps in the (+) direction (as designated by the 136 TLEN field). This is also represented by the sam flag of 97, which does not have the "read reverse strand" flag set.
Similarly, the 2 read 2's that mapped in the (-) direction, along with the 2 reads 1's that originally mapped in the (-) direction, are now represented as a single read that maps in the (-) direction (as designated by the -136 TLEN field). This is also represented by the sam flag of 145, which has the "read reverse strand" flag set.
The "left" and "right" UMI's from the fastqs (
TCC
andCGA
) have been concatenated with a "+" and placed in the read name of the 2 collapsed reads. Marianas will order them alphabetically in the read names.
Last updated
Was this helpful?