Quick Usage
Steps to go from fastq format, to collapsed Bam files
Requirements
Java 8
Marianas jar file from GitHub
HG19 reference fasta
Steps
1. UMI clipping
java \
-server \
-Xms8g \
-Xmx8g \
-cp Marianas.jar \
org.mskcc.marianas.umi.duplex.fastqprocessing.ProcessLoopUMIFastq \
read_1.fastq.gz \
read_2.fastq.gz \
3
This will create a pair of _umi-clipped.fastq.gz files in the current working directory
2. Collapsing and making consensus reads
Collapsing involves 3 sub-steps: First pass that collapses the "left" read cluster of the read pairs, followed by a sort step on the first pass output, followed by second pass that collapses the "right" read cluster of the read pairs.
A. First pass
java \
-server \
-Xms8g \
-Xmx8g \
-cp Marianas.jar \
org.mskcc.marianas.umi.duplex.DuplexUMIBamToCollapsedFastqFirstPass \
bam_file \
pileup_file \
1 \
20 \
0 \
2 \
90 \
reference_fasta_file
The pileup file is generated by running Waltz bam metrics module. These are our recommended default parameters, but for a detailed description of these parameters please refer to the Parameters page.
This step will produce some QC files as well as first-pass.txt in the current working directory. The first-pass.txt file contains the collapsed sequences and base qualities for the first pass. This file must be sorted before running the second pass.
B. Sorting
sort \
-S 8G \
-k 6,6n \
-k 8,8n \
first-pass.txt > first-pass.mate-position-sorted.txt
C. Second Pass
java \
-server \
-Xms8g \
-Xmx8g \
-cp Marianas.jar \
org.mskcc.marianas.umi.duplex.DuplexUMIBamToCollapsedFastqSecondPass \
standard_bam_file \
pileup_file \
1 \
20 \
0 \
2 \
90 \
reference_fasta_file \
first-pass.mate-position-sorted.txt
This will produce two files named collapsed_R1_.fastq
and collapsed_R2_.fastq
along with some QC files.
3. Re-map the collapsed fastqs to bam format
At this stage you must make use of your preferred sequence aligner to turn the error-corrected fastqs into a Bam file with collapsed reads. It is also advisable to perform a second indel realignment with the error-corrected reads at this stage. We recommend BWA mem for mapping, Picard AddOrReplaceReadGroups to compress and sort the sam file and generate a bam file, and ABRA2 to perform indel realignment.
4. Separating simplex and duplex bams from collapsed bam
java \
-server \
-Xms8g \
-Xmx8g \
-cp Marianas.jar \
org.mskcc.marianas.umi.duplex.postprocessing.SeparateBams \
collapsed.bam
This will produce collapsed-duplex.bam and collapsed-simplex.bam (based on the input bam file name) in the current working directory.
Last updated
Was this helpful?