BAM Collapsing
1.0.0
1.0.0
  • Introduction
  • Quickstart
  • Inputs Description
  • Tool Descriptions
  • Outputs Description
  • Github Specifications
    • Contributor Covenant Code of Conduct
    • Contributing
    • ISSUE_TEMPLATE
Powered by GitBook
On this page
  • Parameter Used by Tools
  • Common Parameters Across Tools
  • Fgbio GroupReadsByUmi
  • Fgbio CollectDuplexSeqMetrics
  • Fgbio CallDuplexConsensusReads
  • GATK SamToFastq
  • BWA MEM
  • Picard AddOrReplaceReadGroups
  • GATK MergeBamAlignment
  • bedtools genomecov
  • bedtools merge
  • ABRA2
  • Picard FixMateInformation
  • Fgbio FilterConsensusReads
  • Fgbio Postprocessing
  • Picard CollectAlignmentSummaryMetrics
  • Template inputs file

Was this helpful?

Inputs Description

Input files and parameters required to run workflow

PreviousQuickstartNextTool Descriptions

Last updated 4 years ago

Was this helpful?

Common workflow language execution engines accept two types of input that are or , please make sure to use one of these while generating the input file. For more information refer to:

Parameter Used by Tools

Common Parameters Across Tools

Argument Name

Summary

Default Value

reference_sequence

Reference sequence file. Please include ".fai", "^.dict", ".amb" , ".sa", ".bwt", ".pac" as secondary files if they are not present in the same location as the ".fasta" file

validation_stringency

GATK: Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

create_bam_index

GATK: Generate BAM index file when possible

sort_order

GATK: The order in which the merged reads should be output.

Fgbio

Argument Name

Summary

Default Value

fgbio_group_reads_by_umi_input

The input BAM file

fgbio_group_reads_by_umi_strategy

The UMI assignment strategy. (identity, edit, adjacency, paired)

fgbio_group_reads_by_umi_raw_tag

The tag containing the raw UMI.

RX

fgbio_group_reads_by_umi_output_file_name

The output BAM file name

fgbio_group_reads_by_umi_min_umi_length

The minimum UMI length. If not specified then all UMIs must have the same length, otherwise, discard reads with UMIs shorter than this length and allow for differing UMI lengths.

fgbio_group_reads_by_umi_include_non_pf_reads

Include non-PF reads.

False

fgbio_group_reads_by_umi_family_size_histogram

Optional output of tag family size counts.

fgbio_group_reads_by_umi_edits

The allowable number of edits between UMIs.

1

fgbio_group_reads_by_umi_assign_tag

The output tag for UMI grouping.

MI

Argument Name

Summary

Default Value

fgbio_collect_duplex_seq_metrics_intervals

Optional set of intervals over which to restrict analysis.

fgbio_collect_duplex_seq_metrics_output_prefix

Prefix of output files to write.

fgbio_collect_duplex_seq_metrics_min_ba_reads

Minimum BA reads to call a tag family a ‘duplex’.

fgbio_collect_duplex_seq_metrics_min_ab_reads

Minimum AB reads to call a tag family a ‘duplex’.

fgbio_collect_duplex_seq_metrics_mi_tag

The output tag for UMI grouping.

MI

fgbio_collect_duplex_seq_metrics_duplex_umi_counts

If true, produce the .duplex_umi_counts.txt file with counts of duplex UMI observations.

fgbio_collect_duplex_seq_metrics_description

Description of data set used to label plots. Defaults to sample/library.

Argument Name

Summary

Default Value

fgbio_call_duplex_consensus_reads_trim

If true, quality trim input reads in addition to masking low Q bases.

fgbio_call_duplex_consensus_reads_sort_order

The sort order of the output, if :none: then the same as the input.

fgbio_call_duplex_consensus_reads_read_name_prefix

The prefix all consensus read names

fgbio_call_duplex_consensus_reads_read_group_id

The new read group ID for all the consensus reads.

fgbio_call_duplex_consensus_reads_output_file_name

Output SAM or BAM file to write consensus reads.

fgbio_call_duplex_consensus_reads_min_reads

The minimum number of input reads to a consensus read.

fgbio_call_duplex_consensus_reads_min_input_base_quality

Ignore bases in raw reads that have Q below this value.

fgbio_call_duplex_consensus_reads_max_reads_per_strand

The maximum number of reads to use when building a single-strand consensus. If more than this many reads are present in a tag family, the family is randomly downsampled to exactly max-reads reads.

fgbio_call_duplex_consensus_reads_error_rate_pre_umi

The Phred-scaled error rate for an error prior to the UMIs being integrated.

fgbio_call_duplex_consensus_reads_error_rate_post_umi

The Phred-scaled error rate for an error post the UMIs have been integrated.

Argument Name

Summary

Default Value

gatk_sam_to_fastq_output_name_unpaired

unpaired fastq output file name

gatk_sam_to_fastq_output_name_R1

Read1 fastq.gz output file name

gatk_sam_to_fastq_output_name_R2

Read2 fastq.gz output file name

gatk_sam_to_fastq_include_non_primary_alignments

If true, include non-primary alignments in the output. Support of non-primary alignments in SamToFastq is not comprehensive, so there may be exceptions if this is set to true and there are paired reads with non-primary alignments.

gatk_sam_to_fastq_include_non_pf_reads

Include non-PF reads from the SAM file into the output FASTQ files. PF means 'passes filtering'. Reads whose 'not passing quality controls' flag is set are non-PF reads. See GATK Dictionary for more info.

Argument Name

Summary

Default Value

bwa_mem_Y

Force soft-clipping rather than default hard-clipping of supplementary alignments

bwa_mem_T

Don’t output alignment with score lower than INT. This option only affects output.

bwa_mem_P

In the paired-end mode, perform SW to rescue missing hits only but do not try to find hits that fit a proper pair.

bwa_mem_output

Output SAM file name

bwa_mem_M

Mark shorter split hits as secondary

bwa_mem_K

to achieve deterministic alignment results (Note: this is a hidden option)

bwa_number_of_threads

Number of threads

Argument Name

Summary

Default Value

picard_addRG_read_group_sequencing_platform

Read-Group platform (e.g. ILLUMINA, SOLID)

picard_addRG_read_group_sequencing_center

Read-Group sequencing center name

picard_addRG_read_group_run_date

Read-Group date in (Iso8601Date)

picard_addRG_read_group_platform_unit

Read-Group Platform Unit (eg. run barcode)

picard_addRG_read_group_library

Read-Group library

picard_addRG_read_group_identifier

Read-Group ID

picard_addRG_read_group_description

Read-Group Description

picard_addRG_output_file_name

Output BAM file name

picard_addRG_sort_order

Sort order for BAM file

picard_addRG_read_group_sample_name

Read-Group sample name

Argument Name

Summary

Default Value

gatk_merge_bam_alignment_output_file_name

Output BAM file name

Argument Name

Summary

Default Value

bedtools_genomecov_option_bedgraph

option flag parameter to choose output file format. -bg refers to bedgraph format

Argument Name

Summary

Default Value

bedtools_merge_distance_between_features

Maximum distance between features allowed for features to be merged.

Argument Name

Summary

Default Value

abra2_window_size

Processing window size and overlap (size,overlap) (default: 400,200)

abra2_soft_clip_contig

Soft clip contig args [maxcontigs,min_base_qual,frac high_qual_bases,min_soft_clip_len] (default:16,13,80,15)

abra2_scoring_gap_alignments

Scoring used for contig alignments(match, mismatch_penalty,gap_open_penalty,gap_extend_penalty) (default:8,32,48,1)

abra2_no_sort

Do not attempt to sort final output

abra2_no_edge_complex_indel

Prevent output of complex indels at read start or read end

abra2_maximum_mixmatch_rate

Max allowed mismatch rate when mapping reads back to contigs (default: 0.05)

abra2_maximum_average_depth

Regions with average depth exceeding this value will be downsampled (default: 1000)

abra2_contig_anchor

Contig anchor [M_bases_at_contig_edge,max_mismatches_near_edge] (default:10,2)

abra2_consensus_sequence

Use positional consensus sequence when aligning high quality soft clipping

Argument Name

Summary

Default Value

picard_fixmate_information_output_file_name

The output BAM file to write to

Argument Name

Summary

Default Value

fgbio_filter_consensus_read_reverse_per_base_tags_simplex_duplex

Reverse [complement] per base tags on reverse strand reads.- Simplex+Duplex

fgbio_filter_consensus_read_reverse_per_base_tags_duplex

Reverse [complement] per base tags on reverse strand reads. - Duplex

fgbio_filter_consensus_read_require_single_strand_agreement_simplex_duplex

Mask (make N) consensus bases where the AB and BA consensus reads disagree (for duplex-sequencing only).

fgbio_filter_consensus_read_require_single_strand_agreement_duplex

Mask (make N) consensus bases where the AB and BA consensus reads disagree (for duplex-sequencing only).

fgbio_filter_consensus_read_max_base_error_rate_duplex

The maximum error rate for a single consensus base. (Max 3 values) - Duplex

fgbio_filter_consensus_read_max_base_error_rate_simplex_duplex

The maximum error rate for a single consensus base. (Max 3 values) - Simplex + Duplex

fgbio_filter_consensus_read_max_no_call_fraction_duplex

Maximum fraction of no- calls in the read after filtering - Duplex

fgbio_filter_consensus_read_max_read_error_rate_duplex

The maximum raw-read error rate across the entire consensus read. (Max 3 values) - Duplex

fgbio_filter_consensus_read_max_no_call_fraction_simplex_duplex

Maximum fraction of no- calls in the read after filtering - Simplex + Duplex

fgbio_filter_consensus_read_max_read_error_rate_simplex_duplex

The maximum raw-read error rate across the entire consensus read. (Max 3 values) - Simplex + Duplex

fgbio_filter_consensus_read_min_base_quality_duplex

Mask (make N) consensus bases with quality less than this threshold. - Dupelx

fgbio_filter_consensus_read_min_base_quality_simplex_duplex

Mask (make N) consensus bases with quality less than this threshold. - Simplex+Dupelx

fgbio_filter_consensus_read_min_mean_base_quality_duplex

The minimum mean base quality across the consensus read - Duplex

fgbio_filter_consensus_read_min_mean_base_quality_simplex_duplex

The minimum mean base quality across the consensus read - Simplex + Duplex

fgbio_filter_consensus_read_min_reads_duplex

The minimum number of reads supporting a consensus base/read. (Max 3 values) - Duplex

fgbio_filter_consensus_read_min_reads_simplex_duplex

The minimum number of reads supporting a consensus base/read. (Max 3 values)

-Simplex+Duplex

fgbio_filter_consensus_read_output_file_name_simplex_duplex

Output BAM file name Simplex + Duplex

fgbio_filter_consensus_read_output_file_name_duplex_aln_metrics

Output file name Duplex alignment metrics

fgbio_filter_consensus_read_output_file_name_simplex_aln_metrics

Output file name Simplex alignment metrics

fgbio_filter_consensus_read_output_file_name_duplex

Output BAM file name - Duplex

fgbio_filter_consensus_read_min_simplex_reads

The minimum number of reads supporting a consensus base/read. (Max 3 values) -

Simplex+Duplex

Argument Name

Summary

Default Value

fgbio_postprocessing_output_file_name_simplex

Output BAM file name Simplex

Argument Name

Summary

Default Value

gatk_collect_alignment_summary_metrics_output_file_name

Output file name for metrics on collapsed BAM (Duplex+Simplex+Singletons)

Template inputs file

inputs.json
{
    "abra2_consensus_sequence": null,
    "abra2_contig_anchor": null,
    "abra2_maximum_average_depth": null,
    "abra2_maximum_mixmatch_rate": null,
    "abra2_no_edge_complex_indel": true,
    "abra2_no_sort": null,
    "abra2_output_bams": "collapsed_abra.bam",
    "abra2_scoring_gap_alignments": null,
    "abra2_soft_clip_contig": null,
    "abra2_window_size": null,
    "bedtools_genomecov_option_bedgraph": true,
    "bedtools_merge_distance_between_features": null,
    "bwa_mem_K": 1000000,
    "bwa_mem_M": null,
    "bwa_mem_P": null,
    "bwa_mem_T": 30,
    "bwa_mem_Y": true,
    "bwa_mem_output": "test_collapsed_alignment.sam",
    "bwa_number_of_threads": null,
    "create_bam_index": true,
    "fgbio_call_duplex_consensus_reads_error_rate_post_umi": null,
    "fgbio_call_duplex_consensus_reads_error_rate_pre_umi": null,
    "fgbio_call_duplex_consensus_reads_max_reads_per_strand": null,
    "fgbio_call_duplex_consensus_reads_min_input_base_quality": null,
    "fgbio_call_duplex_consensus_reads_min_reads": [
        1,
        1,
        0
    ],
    "fgbio_call_duplex_consensus_reads_output_file_name": null,
    "fgbio_call_duplex_consensus_reads_read_group_id": "test",
    "fgbio_call_duplex_consensus_reads_read_name_prefix": null,
    "fgbio_call_duplex_consensus_reads_sort_order": null,
    "fgbio_call_duplex_consensus_reads_trim": null,
    "fgbio_collect_duplex_seq_metrics_description": null,
    "fgbio_collect_duplex_seq_metrics_duplex_umi_counts": true,
    "fgbio_collect_duplex_seq_metrics_intervals": null,
    "fgbio_collect_duplex_seq_metrics_mi_tag": null,
    "fgbio_collect_duplex_seq_metrics_min_ab_reads": null,
    "fgbio_collect_duplex_seq_metrics_min_ba_reads": null,
    "fgbio_collect_duplex_seq_metrics_output_prefix": null,
    "fgbio_filter_consensus_read_max_base_error_rate_duplex": null,
    "fgbio_filter_consensus_read_max_base_error_rate_simplex_duplex": null,
    "fgbio_filter_consensus_read_max_no_call_fraction_duplex": null,
    "fgbio_filter_consensus_read_max_no_call_fraction_simplex_duplex": null,
    "fgbio_filter_consensus_read_max_read_error_rate_duplex": null,
    "fgbio_filter_consensus_read_max_read_error_rate_simplex_duplex": null,
    "fgbio_filter_consensus_read_min_base_quality_duplex": 30,
    "fgbio_filter_consensus_read_min_base_quality_simplex_duplex": 30,
    "fgbio_filter_consensus_read_min_mean_base_quality_duplex": null,
    "fgbio_filter_consensus_read_min_mean_base_quality_simplex_duplex": null,
    "fgbio_filter_consensus_read_min_reads_duplex": [
        2,
        1,
        1
    ],
    "fgbio_filter_consensus_read_min_reads_simplex_duplex": [
        3,
        3,
        0
    ],
    "fgbio_filter_consensus_read_min_simplex_reads": null,
    "fgbio_filter_consensus_read_output_file_name_duplex": "collapsed_duplex.bam",
    "fgbio_filter_consensus_read_output_file_name_duplex_aln_metrics": null,
    "fgbio_filter_consensus_read_output_file_name_simplex_aln_metrics": null,
    "fgbio_filter_consensus_read_output_file_name_simplex_duplex": "collapsed_simplex-duplex.bam",
    "fgbio_filter_consensus_read_require_single_strand_agreement_duplex": true,
    "fgbio_filter_consensus_read_require_single_strand_agreement_simplex_duplex": null,
    "fgbio_filter_consensus_read_reverse_per_base_tags_duplex": true,
    "fgbio_filter_consensus_read_reverse_per_base_tags_simplex_duplex": true,
    "fgbio_group_reads_by_umi_assign_tag": null,
    "fgbio_group_reads_by_umi_edits": null,
    "fgbio_group_reads_by_umi_family_size_histogram": "group_reads_umi.hist.txt",
    "fgbio_group_reads_by_umi_include_non_pf_reads": null,
    "fgbio_group_reads_by_umi_input": {
        "class": "File",
        "path": "/Users/shahr2/Documents/test_reference/Uncollapsed_BAM_Generation_Output/test_fx.bam"
    },
    "fgbio_group_reads_by_umi_min_umi_length": null,
    "fgbio_group_reads_by_umi_output_file_name": null,
    "fgbio_group_reads_by_umi_raw_tag": null,
    "fgbio_group_reads_by_umi_strategy": "paired",
    "fgbio_postprocessing_output_file_name_simplex": "collapsed_simplex.bam",
    "gatk_collect_alignment_summary_metrics_output_file_name": null,
    "gatk_merge_bam_alignment_output_file_name": null,
    "gatk_sam_to_fastq_include_non_pf_reads": null,
    "gatk_sam_to_fastq_include_non_primary_alignments": null,
    "gatk_sam_to_fastq_output_name_R1": "test_fx_group_cons_R1.fastq.gz",
    "gatk_sam_to_fastq_output_name_R2": "test_fx_group_cons_R2.fastq.gz",
    "gatk_sam_to_fastq_output_name_unpaired": null,
    "picard_addRG_sort_order": "queryname",
    "picard_addRG_output_file_name": null,
    "picard_addRG_read_group_description": null,
    "picard_addRG_read_group_identifier": "test",
    "picard_addRG_read_group_library": "test",
    "picard_addRG_read_group_platform_unit": "IDT11",
    "picard_addRG_read_group_run_date": null,
    "picard_addRG_read_group_sample_name": "MSKCC",
    "picard_addRG_read_group_sequencing_center": "ILLUMINA",
    "picard_addRG_read_group_sequencing_platform": "test",
    "picard_fixmate_information_output_file_name": "collapsed_fx.bam",
    "reference_sequence": {
        "class": "File",
        "path": "/Users/shahr2/Documents/test_reference/test_uncollapsed_bam_generation/reference/chr14_chr16.fasta"
    },
    "sort_order": "coordinate",
    "validation_stringency": "LENIENT"
}

Fgbio

Fgbio

GATK

Picard

GATK

bedtools

bedtools

Picard

Fgbio

Fgbio

Picard

JSON
YAML
http://www.commonwl.org/user_guide/yaml/
GroupReadsByUmi
CollectDuplexSeqMetrics
CallDuplexConsensusReads
SamToFastq
BWA MEM
AddOrReplaceReadGroups
MergeBamAlignment
genomecov
merge
ABRA2
FixMateInformation
FilterConsensusReads
Postprocessing
CollectAlignmentSummaryMetrics