🔏
Marianas
  • Introduction
  • Quick Usage
  • Detailed Usage
  • Consensus-Calling Algorithm
  • Read Name Information
  • FAQ
  • Unanswered Q's
Powered by GitBook
On this page
  • Requirements
  • Steps
  • 1. UMI clipping
  • 2. Collapsing and making consensus reads
  • 3. Re-map the collapsed fastqs to bam format
  • 4. Separating simplex and duplex bams from collapsed bam

Was this helpful?

Export as PDF

Quick Usage

Steps to go from fastq format, to collapsed Bam files

PreviousIntroductionNextDetailed Usage

Last updated 5 years ago

Was this helpful?

Requirements

  • Java 8

  • Marianas jar file from

  • HG19 reference fasta

Note: The collapsing steps here have been encapsulated in a CWL workflow used by MSK-ACCESS. Documentation and usage for this workflow can be found .

Steps

1. UMI clipping

java \
-server \
-Xms8g \
-Xmx8g \
-cp Marianas.jar \
org.mskcc.marianas.umi.duplex.fastqprocessing.ProcessLoopUMIFastq \
read_1.fastq.gz \
read_2.fastq.gz \
3

This will create a pair of _umi-clipped.fastq.gz files in the current working directory

2. Collapsing and making consensus reads

Collapsing involves 3 sub-steps: First pass that collapses the "left" read cluster of the read pairs, followed by a sort step on the first pass output, followed by second pass that collapses the "right" read cluster of the read pairs.

A. First pass

java \
-server \
-Xms8g \
-Xmx8g \
-cp Marianas.jar \
org.mskcc.marianas.umi.duplex.DuplexUMIBamToCollapsedFastqFirstPass \
bam_file \
pileup_file \
1 \
20 \
0 \
2 \
90 \
reference_fasta_file

This step will produce some QC files as well as first-pass.txt in the current working directory. The first-pass.txt file contains the collapsed sequences and base qualities for the first pass. This file must be sorted before running the second pass.

B. Sorting

sort \
-S 8G \
-k 6,6n \
-k 8,8n \
first-pass.txt > first-pass.mate-position-sorted.txt

C. Second Pass

java \
-server \
-Xms8g \
-Xmx8g \
-cp Marianas.jar \
org.mskcc.marianas.umi.duplex.DuplexUMIBamToCollapsedFastqSecondPass \
standard_bam_file \
pileup_file \
1 \
20 \
0 \
2 \
90 \
reference_fasta_file \
first-pass.mate-position-sorted.txt

This will produce two files named collapsed_R1_.fastq and collapsed_R2_.fastqalong with some QC files.

3. Re-map the collapsed fastqs to bam format

At this stage you must make use of your preferred sequence aligner to turn the error-corrected fastqs into a Bam file with collapsed reads. It is also advisable to perform a second indel realignment with the error-corrected reads at this stage. We recommend BWA mem for mapping, Picard AddOrReplaceReadGroups to compress and sort the sam file and generate a bam file, and ABRA2 to perform indel realignment.

4. Separating simplex and duplex bams from collapsed bam

java \
-server \
-Xms8g \
-Xmx8g \
-cp Marianas.jar \
org.mskcc.marianas.umi.duplex.postprocessing.SeparateBams \
collapsed.bam

This will produce collapsed-duplex.bam and collapsed-simplex.bam (based on the input bam file name) in the current working directory.

The pileup file is generated by running Waltz bam metrics module. These are our recommended default parameters, but for a detailed description of these parameters please refer to the page.

GitHub
here
Parameters