Input files and parameters required to run workflow
Common workflow language execution engines accept two types of input that are JSON or YAML, please make sure to use one of these while generating the input file. For more information refer to: http://www.commonwl.org/user_guide/yaml/
Argument Name
Summary
Default Value
sequencing-center
The sequencing center from which the data originated
MSKCC
sample
The name of the sequenced sample.(Required)
run-date
Date the run was produced, to insert into the read group header (Iso8601Date)
read-group-id
Read group ID to use in the file header (Required)
platform-unit
Read-Group Platform Unit (eg. run barcode) (Required)
platform-model
Platform model to insert into the group header (ex. miseq, hiseq2500, hiseqX)
novaseq
platform
Read-Group platform (e.g. ILLUMINA, SOLID).
ILLUMINA
library
The name/ID of the sequenced library. (Required)
description
Description of the read group.
comment
Comments to include in the output file’s header.
validation_stringency
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. The --VALIDATION_STRINGENCY argument is an enumerated type (ValidationStringency), which can have one of the following values: STRICT or LENIENT or SILENT
LENIENT
sort_order
GATK: The order in which the reads should be output.
create_bam_index
GATK: Generate BAM index file when possible
reference_sequence
Reference sequence file. Please include ".fai", "^.dict", ".amb" , ".sa", ".bwt", ".pac", ".ann" as secondary files if they are not present in the same location as the ".fasta" file
temporary_directory
Temporary directory to be used for all steps
fgbio_async_io
Fgbio asynchronous execution
Argument Name
Summary
Default Value
fgbio_fastq_to_bam_umi-tag
Tag in which to store molecular barcodes/UMIs.
fgbio_fastq_to_bam_sort
If true, query-name sort the BAM file, otherwise preserve input order.
fgbio_fastq_to_bam_input
Fastq files corresponding to each sequencing read ( e.g. R1, I1, etc.). Please refer to the template file to get this correct.
read-structures
Read structures, one for each of the FASTQs. Refer to the tool for more details
fgbio_fastq_to_bam_predicted-insert-size
Predicted median insert size, to insert into the read group header
fgbio_fastq_to_bam_output_file_name
The output SAM or BAM file to be written.
Argument Name
Summary
Default Value
gatk_merge_sam_files_output_file_name
SAM or BAM file to write the merged result to (Required)
merge_sam_files_sort_order
Sort order of output file
queryname
Argument Name
Summary
Default Value
unpaired_fastq_file
unpaired fastq output file name
UBG_picard_SamToFastq_R1_output_fastq
Read1 fastq.gz output file name for uncollapsed bam generation (Required)
UBG_picard_SamToFastq_R2_output_fastq
Read2 fastq.gz output file name for uncollapsed bam generation (Required)
BC_gatk_sam_to_fastq_output_name_R1
Read1 fastq.gz output file name for bam collapsing (Required)
BC_gatk_sam_to_fastq_output_name_R2
Read2 fastq.gz output file name for bam collapsing (Required)
gatk_sam_to_fastq_include_non_primary_alignments
If true, include non-primary alignments in the output. Support of non-primary alignments in SamToFastq is not comprehensive, so there may be exceptions if this is set to true and there are paired reads with non-primary alignments.
gatk_sam_to_fastq_include_non_pf_reads
Include non-PF reads from the SAM file into the output FASTQ files. PF means 'passes filtering'. Reads whose 'not passing quality controls' flag is set are non-PF reads. See GATK Dictionary for more info.
Argument Name
Summary
Default Value
fastp_unpaired1_output_file_name
For PE input, if read1 passed QC but read2 not, it will be written to unpaired1. Default is to discard it.
fastp_unpaired2_output_file_name
For PE input, if read2 passed QC but read1 not, it will be written to unpaired2. If --unpaired2 is same as --unpaired1 (default mode), both unpaired reads will be written to this same file.
fastp_read1_adapter_sequence
the adapter for read1. For SE data, if not specified, the adapter will be auto-detected. For PE data, this is used if R1/R2 are found not overlapped.
GATCGGAAGAGC
fastp_read2_adapter_sequence
The adapter for read2 (PE data only). This is used if R1/R2 are found not overlapped. If not specified, it will be the same as (string)
AGATCGGAAGAGC
fastp_read1_output_file_name
Read1 output File Name (Required)
fastp_read2_output_file_name
Read2 output File Name (Required)
fastp_minimum_read_length
reads shorter than length_required will be discarded
25
fastp_json_output_file_name
the json format report file name (Required)
fastp_html_output_file_name
the html format report file name (Required)
disable_trim_poly_g
Disable Poly-G trimming.
True
disable_quality_filtering
Disable base quality filtering.
True
Argument Name
Summary
Default Value
bwa_mem_Y
Force soft-clipping rather than default hard-clipping of supplementary alignments
True
bwa_mem_T
Don’t output alignment with score lower than INT. This option only affects output.
30
bwa_mem_P
In the paired-end mode, perform SW to rescue missing hits only but do not try to find hits that fit a proper pair.
UBG_bwa_mem_output
Output SAM file name for uncollapsed bam generation (Required)
BC_bwa_mem_output
Output SAM file name for bam collapsing (Required)
bwa_mem_M
Mark shorter split hits as secondary
bwa_mem_K
to achieve deterministic alignment results (Note: this is a hidden option)
1000000
bwa_number_of_threads
Number of threads
Argument Name
Summary
Default Value
UBG_picard_addRG_output_file_name
Output BAM file name for uncollapsed bam generation (Required)
BC_picard_addRG_output_file_name
Output BAM file name for bam collapsing (Required)
picard_addRG_sort_order
Sort order for the BAM file
queryname
Argument Name
Summary
Default Value
UBG_gatk_merge_bam_alignment_output_file_name
Output BAM file name for uncollapsed bam generation (Required)
BC_gatk_merge_bam_alignment_output_file_name
Output BAM file name for bam collapsing (Required)
Argument Name
Summary
Default Value
optical_duplicate_pixel_distance
The maximum offset between two duplicate clusters in order to consider them optical duplicates. The default is appropriate for unpatterned versions of the Illumina platform. For the patterned flowcell models, 2500 is more appropriate. For other platforms and models, users should experiment to find what works best.
2500
read_name_regex
Regular expression that can be used to parse read names in the incoming SAM file. Read names are parsed to extract three variables: tile/region, x coordinate and y coordinate. These values are used to estimate the rate of optical duplication in order to give a more accurate estimated library size. Set this option to null to disable optical duplicate detection, e.g. for RNA-seq or other data where duplicate sets are extremely large and estimating library complexity is not an aim. Note that without optical duplicate counts, library size estimation will be inaccurate. The regular expression should contain three capture groups for the three variables, in order. It must match the entire read name. Note that if the default regex is specified, a regex match is not actually done, but instead the read name is split on colon character. For 5 element names, the 3rd, 4th and 5th elements are assumed to be tile, x and y values. For 7 element names (CASAVA 1.8), the 5th, 6th, and 7th elements are assumed to be tile, x and y values.
duplicate_scoring_strategy
The scoring strategy for choosing the non-duplicate among candidates.
gatk_mark_duplicates_output_file_name
The output file to write marked records to (Required)
gatk_mark_duplicates_duplication_metrics_file_name
File to write duplication metrics to (Required)
gatk_mark_duplicates_assume_sort_order
If not null, assume that the input file has this order even if the header says otherwise.
Argument Name
Summary
Default Value
bedtools_genomecov_option_bedgraph
option flag parameter to choose output file format. -bg refers to bedgraph format
True
Argument Name
Summary
Default Value
bedtools_merge_distance_between_features
Maximum distance between features allowed for features to be merged.
10
Argument Name
Summary
Default Value
abra2_window_size
Processing window size and overlap (size,overlap)
"400,200"
abra2_soft_clip_contig
Soft clip contig args [maxcontigs,min_base_qual,frac high_qual_bases,min_soft_clip_len]
"16,13,80,15"
abra2_scoring_gap_alignments
Scoring used for contig alignments(match, mismatch_penalty,gap_open_penalty,gap_extend_penalty)
"8,32,48,1"
abra2_no_sort
Do not attempt to sort final output
True
abra2_no_edge_complex_indel
Prevent output of complex indels at read start or read end
True
abra2_maximum_mixmatch_rate
Max allowed mismatch rate when mapping reads back to contigs
0.1
abra2_maximum_average_depth
Regions with average depth exceeding this value will be down-sampled
1000
abra2_contig_anchor
Contig anchor [M_bases_at_contig_edge,max_mismatches_near_edge]
"10,2"
abra2_consensus_sequence
Use positional consensus sequence when aligning high quality soft clipping
BC_abra2_output_bams
The output BAM file to write to (Required)
UBG_abra2_output_bams
The output BAM file to write to (Required)
Argument Name
Summary
Default Value
UBG_picard_fixmateinformation_output_file_name
The output BAM file to write to for uncollapsed bam generation (Required)
BC_picard_fixmate_information_output_file_name
The output BAM file to write to for bam collapsing (Required)
Argument Name
Summary
Default Value
gatk_base_recalibrator_known_sites
One or more databases of known polymorphic sites used to exclude regions around known polymorphisms from analysis (Required)
gatk_bqsr_read_filter
Read filters to be applied before analysis
base_recalibrator_output_file_name
The output recalibration table file to create (Required)
Argument Name
Summary
Default Value
apply_bqsr_output_file_name
The output BAM file (Required)
gatk_bqsr_disable_read_filter
Read filters to be disabled before analysis
Argument Name
Summary
Default Value
fgbio_group_reads_by_umi_input
The input BAM file
fgbio_group_reads_by_umi_strategy
The UMI assignment strategy. (identity, edit, adjacency, paired)
paired
fgbio_group_reads_by_umi_raw_tag
The tag containing the raw UMI.
RX
fgbio_group_reads_by_umi_output_file_name
The output BAM file name (Required)
fgbio_group_reads_by_umi_min_umi_length
The minimum UMI length. If not specified then all UMIs must have the same length, otherwise, discard reads with UMIs shorter than this length and allow for differing UMI lengths.
fgbio_group_reads_by_umi_include_non_pf_reads
Include non-PF reads.
False
fgbio_group_reads_by_umi_family_size_histogram
Optional output of tag family size counts. (Required)
Give a file name. ex: samplename.hist
fgbio_group_reads_by_umi_edits
The allowable number of edits between UMIs.
1
fgbio_group_reads_by_umi_assign_tag
The output tag for UMI grouping.
MI
Argument Name
Summary
Default Value
fgbio_collect_duplex_seq_metrics_intervals
Optional set of intervals over which to restrict analysis.
fgbio_collect_duplex_seq_metrics_output_prefix
Prefix of output files to write.
fgbio_collect_duplex_seq_metrics_min_ba_reads
Minimum BA reads to call a tag family a ‘duplex’.
fgbio_collect_duplex_seq_metrics_min_ab_reads
Minimum AB reads to call a tag family a ‘duplex’.
fgbio_collect_duplex_seq_metrics_mi_tag
The output tag for UMI grouping.
MI
fgbio_collect_duplex_seq_metrics_duplex_umi_counts
If true, produce the .duplex_umi_counts.txt file with counts of duplex UMI observations.
True
fgbio_collect_duplex_seq_metrics_description
Description of data set used to label plots. Defaults to sample/library.
Argument Name
Summary
Default Value
fgbio_call_duplex_consensus_reads_trim
If true, quality trim input reads in addition to masking low Q bases.
fgbio_call_duplex_consensus_reads_sort_order
The sort order of the output, if :none: then the same as the input.
fgbio_call_duplex_consensus_reads_read_name_prefix
The prefix all consensus read names
fgbio_call_duplex_consensus_reads_read_group_id
The new read group ID for all the consensus reads.
fgbio_call_duplex_consensus_reads_output_file_name
Output SAM or BAM file to write consensus reads.
fgbio_call_duplex_consensus_reads_min_reads
The minimum number of input reads to a consensus read.
1 1 0
fgbio_call_duplex_consensus_reads_min_input_base_quality
Ignore bases in raw reads that have Q below this value.
fgbio_call_duplex_consensus_reads_max_reads_per_strand
The maximum number of reads to use when building a single-strand consensus. If more than this many reads are present in a tag family, the family is randomly downsampled to exactly max-reads reads.
fgbio_call_duplex_consensus_reads_error_rate_pre_umi
The Phred-scaled error rate for an error prior to the UMIs being integrated.
fgbio_call_duplex_consensus_reads_error_rate_post_umi
The Phred-scaled error rate for an error post the UMIs have been integrated.
Argument Name
Summary
Default Value
fgbio_filter_consensus_read_reverse_per_base_tags_simplex_duplex
Reverse [complement] per base tags on reverse strand reads.- Simplex+Duplex
fgbio_filter_consensus_read_reverse_per_base_tags_duplex
Reverse [complement] per base tags on reverse strand reads. - Duplex
fgbio_filter_consensus_read_require_single_strand_agreement_simplex_duplex
Mask (make N) consensus bases where the AB and BA consensus reads disagree (for duplex-sequencing only).
fgbio_filter_consensus_read_require_single_strand_agreement_duplex
Mask (make N) consensus bases where the AB and BA consensus reads disagree (for duplex-sequencing only).
fgbio_filter_consensus_read_max_base_error_rate_duplex
The maximum error rate for a single consensus base. (Max 3 values) - Duplex
fgbio_filter_consensus_read_max_base_error_rate_simplex_duplex
The maximum error rate for a single consensus base. (Max 3 values) - Simplex + Duplex
fgbio_filter_consensus_read_max_no_call_fraction_duplex
Maximum fraction of no-calls in the read after filtering - Duplex
fgbio_filter_consensus_read_max_read_error_rate_duplex
The maximum raw-read error rate across the entire consensus read. (Max 3 values) - Duplex
fgbio_filter_consensus_read_max_no_call_fraction_simplex_duplex
Maximum fraction of no- calls in the read after filtering - Simplex + Duplex
fgbio_filter_consensus_read_max_read_error_rate_simplex_duplex
The maximum raw-read error rate across the entire consensus read. (Max 3 values) - Simplex + Duplex
fgbio_filter_consensus_read_min_base_quality_duplex
Mask (make N) consensus bases with quality less than this threshold. - Duplex
fgbio_filter_consensus_read_min_base_quality_simplex_duplex
Mask (make N) consensus bases with quality less than this threshold. - Simplex+Duplex
fgbio_filter_consensus_read_min_mean_base_quality_duplex
The minimum mean base quality across the consensus read - Duplex
fgbio_filter_consensus_read_min_mean_base_quality_simplex_duplex
The minimum mean base quality across the consensus read - Simplex + Duplex
fgbio_filter_consensus_read_min_reads_duplex
The minimum number of reads supporting a consensus base/read. (Max 3 values) - Duplex
2, 1, 1
fgbio_filter_consensus_read_min_reads_simplex_duplex
The minimum number of reads supporting a consensus base/read. (Max 3 values) - Simplex+Duplex
3, 3, 0
fgbio_filter_consensus_read_output_file_name_simplex_duplex
Output BAM file name Simplex + Duplex (Required)
fgbio_filter_consensus_read_output_file_name_duplex_aln_metrics
Output file name Duplex alignment metrics
fgbio_filter_consensus_read_output_file_name_simplex_aln_metrics
Output file name Simplex alignment metrics
fgbio_filter_consensus_read_output_file_name_duplex
Output BAM file name - Duplex (Required)
fgbio_filter_consensus_read_min_simplex_reads
The minimum number of reads supporting a consensus base/read. (Max 3 values) - Simplex+Duplex
Argument Name
Summary
Default Value
fgbio_postprocessing_output_file_name_simplex
Output BAM file name Simplex (Required)
Argument Name
Summary
Default Value
gatk_collect_alignment_summary_metrics_output_file_name
Output file name for metrics on collapsed BAM (Duplex+Simplex+Singletons)