Steps to go from fastq format, to collapsed Bam files
Java 8
Marianas jar file from GitHub
HG19 reference fasta
Note: The collapsing steps here have been encapsulated in a CWL workflow used by MSK-ACCESS. Documentation and usage for this workflow can be found here.
This will create a pair of _umi-clipped.fastq.gz files in the current working directory
Collapsing involves 3 sub-steps: First pass that collapses the "left" read cluster of the read pairs, followed by a sort step on the first pass output, followed by second pass that collapses the "right" read cluster of the read pairs.
The pileup file is generated by running Waltz bam metrics module. These are our recommended default parameters, but for a detailed description of these parameters please refer to the Parameters page.
This step will produce some QC files as well as first-pass.txt in the current working directory. The first-pass.txt file contains the collapsed sequences and base qualities for the first pass. This file must be sorted before running the second pass.
This will produce two files named collapsed_R1_.fastq
and collapsed_R2_.fastq
along with some QC files.
At this stage you must make use of your preferred sequence aligner to turn the error-corrected fastqs into a Bam file with collapsed reads. It is also advisable to perform a second indel realignment with the error-corrected reads at this stage. We recommend BWA mem for mapping, Picard AddOrReplaceReadGroups to compress and sort the sam file and generate a bam file, and ABRA2 to perform indel realignment.
This will produce collapsed-duplex.bam and collapsed-simplex.bam (based on the input bam file name) in the current working directory.