Quick Usage

Steps to go from fastq format, to collapsed Bam files

Requirements

  • Java 8

  • Marianas jar file from GitHub

  • HG19 reference fasta

Note: The collapsing steps here have been encapsulated in a CWL workflow used by MSK-ACCESS. Documentation and usage for this workflow can be found here.

Steps

1. UMI clipping

java \
-server \
-Xms8g \
-Xmx8g \
-cp Marianas.jar \
org.mskcc.marianas.umi.duplex.fastqprocessing.ProcessLoopUMIFastq \
read_1.fastq.gz \
read_2.fastq.gz \
3

This will create a pair of _umi-clipped.fastq.gz files in the current working directory

2. Collapsing and making consensus reads

Collapsing involves 3 sub-steps: First pass that collapses the "left" read cluster of the read pairs, followed by a sort step on the first pass output, followed by second pass that collapses the "right" read cluster of the read pairs.

A. First pass

java \
-server \
-Xms8g \
-Xmx8g \
-cp Marianas.jar \
org.mskcc.marianas.umi.duplex.DuplexUMIBamToCollapsedFastqFirstPass \
bam_file \
pileup_file \
1 \
20 \
0 \
2 \
90 \
reference_fasta_file

The pileup file is generated by running Waltz bam metrics module. These are our recommended default parameters, but for a detailed description of these parameters please refer to the Parameters page.

This step will produce some QC files as well as first-pass.txt in the current working directory. The first-pass.txt file contains the collapsed sequences and base qualities for the first pass. This file must be sorted before running the second pass.

B. Sorting

sort \
-S 8G \
-k 6,6n \
-k 8,8n \
first-pass.txt > first-pass.mate-position-sorted.txt

C. Second Pass

java \
-server \
-Xms8g \
-Xmx8g \
-cp Marianas.jar \
org.mskcc.marianas.umi.duplex.DuplexUMIBamToCollapsedFastqSecondPass \
standard_bam_file \
pileup_file \
1 \
20 \
0 \
2 \
90 \
reference_fasta_file \
first-pass.mate-position-sorted.txt

This will produce two files named collapsed_R1_.fastq and collapsed_R2_.fastqalong with some QC files.

3. Re-map the collapsed fastqs to bam format

At this stage you must make use of your preferred sequence aligner to turn the error-corrected fastqs into a Bam file with collapsed reads. It is also advisable to perform a second indel realignment with the error-corrected reads at this stage. We recommend BWA mem for mapping, Picard AddOrReplaceReadGroups to compress and sort the sam file and generate a bam file, and ABRA2 to perform indel realignment.

4. Separating simplex and duplex bams from collapsed bam

java \
-server \
-Xms8g \
-Xmx8g \
-cp Marianas.jar \
org.mskcc.marianas.umi.duplex.postprocessing.SeparateBams \
collapsed.bam

This will produce collapsed-duplex.bam and collapsed-simplex.bam (based on the input bam file name) in the current working directory.

Last updated

Was this helpful?