Quick Usage
Steps to go from fastq format, to collapsed Bam files
Requirements
Java 8
Marianas jar file from GitHub
HG19 reference fasta
Note: The collapsing steps here have been encapsulated in a CWL workflow used by MSK-ACCESS. Documentation and usage for this workflow can be found here.
Steps
1. UMI clipping
This will create a pair of _umi-clipped.fastq.gz files in the current working directory
2. Collapsing and making consensus reads
Collapsing involves 3 sub-steps: First pass that collapses the "left" read cluster of the read pairs, followed by a sort step on the first pass output, followed by second pass that collapses the "right" read cluster of the read pairs.
A. First pass
The pileup file is generated by running Waltz bam metrics module. These are our recommended default parameters, but for a detailed description of these parameters please refer to the Parameters page.
This step will produce some QC files as well as first-pass.txt in the current working directory. The first-pass.txt file contains the collapsed sequences and base qualities for the first pass. This file must be sorted before running the second pass.
B. Sorting
C. Second Pass
This will produce two files named collapsed_R1_.fastq
and collapsed_R2_.fastq
along with some QC files.
3. Re-map the collapsed fastqs to bam format
At this stage you must make use of your preferred sequence aligner to turn the error-corrected fastqs into a Bam file with collapsed reads. It is also advisable to perform a second indel realignment with the error-corrected reads at this stage. We recommend BWA mem for mapping, Picard AddOrReplaceReadGroups to compress and sort the sam file and generate a bam file, and ABRA2 to perform indel realignment.
4. Separating simplex and duplex bams from collapsed bam
This will produce collapsed-duplex.bam and collapsed-simplex.bam (based on the input bam file name) in the current working directory.
Last updated
Was this helpful?