Inputs Description
Various parameters required to run the workflow
Last updated
Was this helpful?
Various parameters required to run the workflow
Last updated
Was this helpful?
Common workflow language execution engines accept two types of input that are or , please make sure to use one of these while generating the input file. For more information refer to:
Argument Name
Summary
Default Value
sequencing-center
The sequencing center from which the data originated
sample
The name of the sequenced sample.
run-date
Date the run was produced, to insert into the read group header (Iso8601Date)
read-group-id
Read group ID to use in the file header
platform-unit
Read-Group Platform Unit (eg. run barcode)
platform-model
Platform model to insert into the group header (ex. miseq, hiseq2500, hiseqX)
platform
Read-Group platform (e.g. ILLUMINA, SOLID).
library
The name/ID of the sequenced library.
description
Description of the read group.
comment
Comments to include in the output file’s header.
validation_stringency
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. The --VALIDATION_STRINGENCY argument is an enumerated type (ValidationStringency), which can have one of the following values: STRICT or LENIENT or SILENT
sort_order
GATK: The order in which the reads should be output.
create_bam_index
GATK: Generate BAM index file when possible
reference_sequence
Reference sequence file. Please include ".fai", "^.dict", ".amb" , ".sa", ".bwt", ".pac", ".ann" as secondary files if they are not present in the same location as the ".fasta" file
Argument Name
Summary
Default Value
fgbio_fastq_to_bam_umi-tag
Tag in which to store molecular barcodes/UMIs.
fgbio_fastq_to_bam_sort
If true, query-name sort the BAM file, otherwise preserve input order.
fgbio_fastq_to_bam_input
fgbio_fastq_to_bam_predicted-insert-size
Predicted median insert size, to insert into the read group header
fgbio_fastq_to_bam_output_file_name
The output SAM or BAM file to be written.
Argument Name
Summary
Default Value
gatk_merge_sam_files_output_file_name
SAM or BAM file to write the merged result to
merge_sam_files_sort_order
Sort order of output file
Argument Name
Summary
Default Value
unpaired_fastq_file
unpaired fastq output file name
R1_output_fastq
Read1 fastq.gz output file name
R2_output_fastq
Read2 fastq.gz output file name
gatk_sam_to_fastq_include_non_primary_alignments
If true, include non-primary alignments in the output. Support of non-primary alignments in SamToFastq is not comprehensive, so there may be exceptions if this is set to true and there are paired reads with non-primary alignments.
gatk_sam_to_fastq_include_non_pf_reads
Include non-PF reads from the SAM file into the output FASTQ files. PF means 'passes filtering'. Reads whose 'not passing quality controls' flag is set are non-PF reads. See GATK Dictionary for more info.
Argument Name
Summary
Default Value
fastp_unpaired1_output_file_name
For PE input, if read1 passed QC but read2 not, it will be written to unpaired1. Default is to discard it.
fastp_unpaired2_output_file_name
For PE input, if read2 passed QC but read1 not, it will be written to unpaired2. If --unpaired2 is same as --unpaired1 (default mode), both unpaired reads will be written to this same file.
fastp_read1_adapter_sequence
the adapter for read1. For SE data, if not specified, the adapter will be auto-detected. For PE data, this is used if R1/R2 are found not overlapped.
fastp_read2_adapter_sequence
The adapter for read2 (PE data only). This is used if R1/R2 are found not overlapped. If not specified, it will be the same as (string)
AGATCGGAAGAGC
fastp_read1_output_file_name
Read1 output File Name
1
fastp_read2_output_file_name
Read2 output File Name
fastp_minimum_read_length
reads shorter than length_required will be discarded
15
fastp_json_output_file_name
the json format report file name
fastp_html_output_file_name
the html format report file name
fastp_failed_reads_output_file_name
specify the file to store reads that cannot pass the filters.
Argument Name
Summary
Default Value
bwa_mem_Y
Force soft-clipping rather than default hard-clipping of supplementary alignments
bwa_mem_T
Don’t output alignment with score lower than INT. This option only affects output.
bwa_mem_P
In the paired-end mode, perform SW to rescue missing hits only but do not try to find hits that fit a proper pair.
bwa_mem_output
Output SAM file name
bwa_mem_M
Mark shorter split hits as secondary
bwa_mem_K
to achieve deterministic alignment results (Note: this is a hidden option)
bwa_number_of_threads
Number of threads
Argument Name
Summary
Default Value
picard_addRG_output_file_name
Output BAM file name
picard_addRG_sort_order
Sort order for the BAM file
Argument Name
Summary
Default Value
gatk_merge_bam_alignment_output_file_name
Output BAM file name
Argument Name
Summary
Default Value
optical_duplicate_pixel_distance
The maximum offset between two duplicate clusters in order to consider them optical duplicates. The default is appropriate for unpatterned versions of the Illumina platform. For the patterned flowcell models, 2500 is more appropriate. For other platforms and models, users should experiment to find what works best.
read_name_regex
Regular expression that can be used to parse read names in the incoming SAM file. Read names are parsed to extract three variables: tile/region, x coordinate and y coordinate. These values are used to estimate the rate of optical duplication in order to give a more accurate estimated library size. Set this option to null to disable optical duplicate detection, e.g. for RNA-seq or other data where duplicate sets are extremely large and estimating library complexity is not an aim. Note that without optical duplicate counts, library size estimation will be inaccurate. The regular expression should contain three capture groups for the three variables, in order. It must match the entire read name. Note that if the default regex is specified, a regex match is not actually done, but instead the read name is split on colon character. For 5 element names, the 3rd, 4th and 5th elements are assumed to be tile, x and y values. For 7 element names (CASAVA 1.8), the 5th, 6th, and 7th elements are assumed to be tile, x and y values.
duplicate_scoring_strategy
The scoring strategy for choosing the non-duplicate among candidates.
gatk_mark_duplicates_output_file_name
The output file to write marked records to
gatk_mark_duplicates_duplication_metrics_file_name
File to write duplication metrics to
gatk_mark_duplicates_assume_sort_order
If not null, assume that the input file has this order even if the header says otherwise.
Argument Name
Summary
Default Value
bedtools_genomecov_option_bedgraph
option flag parameter to choose output file format. -bg refers to bedgraph format
Argument Name
Summary
Default Value
bedtools_merge_distance_between_features
Maximum distance between features allowed for features to be merged.
Argument Name
Summary
Default Value
abra2_window_size
Processing window size and overlap (size,overlap) (default: 400,200)
abra2_soft_clip_contig
Soft clip contig args [maxcontigs,min_base_qual,frac high_qual_bases,min_soft_clip_len] (default:16,13,80,15)
abra2_scoring_gap_alignments
Scoring used for contig alignments(match, mismatch_penalty,gap_open_penalty,gap_extend_penalty) (default:8,32,48,1)
abra2_no_sort
Do not attempt to sort final output
abra2_no_edge_complex_indel
Prevent output of complex indels at read start or read end
abra2_maximum_mixmatch_rate
Max allowed mismatch rate when mapping reads back to contigs (default: 0.05)
abra2_maximum_average_depth
Regions with average depth exceeding this value will be downsampled (default: 1000)
abra2_contig_anchor
Contig anchor [M_bases_at_contig_edge,max_mismatches_near_edge] (default:10,2)
abra2_consensus_sequence
Use positional consensus sequence when aligning high quality soft clipping
Argument Name
Summary
Default Value
picard_fixmate_information_output_file_name
The output BAM file to write to
Parameters not marked as optional are required
Fastq files corresponding to each sequencing read ( e.g. R1, I1, etc.). Please refer to the to get this correct.
********
Read structures, one for each of the FASTQs. Refer to the for more details