Input files and parameters required to run workflow
Parameter
Description
Default
reference_fasta
Reference FASTA file
sample_name
The name of the sample submitted to the workflow
The entire workflow can be divided into 3 parts. 1. VARDICT workflow - consisting of calling the variants from VARDICT and normalizing and concatenating the complex and simple Variants in VCF format
Parameter
Description
Default
BedFile
Target file
Vardict_allele_frequency_threshold
Vardict
0.01
Minimum_allele_frequency
0.05
input_bam_case:
Input CH sample BAM file
ad
Allele Depth
1
totalDepth
Total Depth
20
tnRatio
Tumor-Normal Variant Fraction ratio threshold
1
variantFraction
Tumor Variant fraction threshold
5.00E-05
minQual
Minimum variant call quality
0
allow_overlaps
First coordinate of the next file can precede last record of the current file
TRUE
stdout
Write to standard output, keep original files unchanged
TRUE
check-ref
what to do when incorrect or missing REF allele is encountered. 's' is to set/fix bad sites. Note that 's' can swap alleles and will update genotypes (GT) and AC counts, but will not attempt to fix PL or other fields. Also it will not fix strand issues in your VCF.
s
multiallelics
If multiallelic sites should be split or joined. '+'denotes that the biallelic sites should be joined into a multiallelic record.
+
output-type
Output type from BCFtools sort. 'z' denotes compressed VCF
z
preset
Input format for indexing
VCF
sample-name_vardict_STDFilter.txt
sample-name_single_filter_vcf
VCF file with filtered SNPs
sample-name_single_filer_complex.vcf
VCF file with filtered complex variant
sample-name_vardict_concatenated.vcf
VCF file with both complex and simple Variants
2. Variant Annotation - The VCF file from the before process is annotated with various files.
Parameter
Description
Default
retain_info
Comma-delimited names of INFO fields to retain as extra columns in MAF
CNT,TUMOR_TYPE
min_hom_vaf
If GT undefined in VCF, minimum allele fraction to call a variant homozygous
0.7
buffer_size
Number of variants VEP loads at a time; Reduce this for low memory systems
5000
custom_enst
List of custom ENST IDs that override canonical selection, in a file
input_cosmicCountDB_vcf
VCF file from COSMIC database with overall prevalence for a variant
input_cosmicprevalenceDB_vcf
VCF file from COSMIC database with lineage specific prevalence for a variant
input_complexity_bed
BED file with complex regions
input_mappability_bed
BED file with un-mappable regions
oncoKbApiToken
oncKB API token file
input_47kchpd_tsv_file
TSV file with 47k CH-PD variants
input_hotspot_tsv_file
TSV file with hotspots obtained from 47k CH-PD variants
input_panmeloid_tsv_file
TSV file with PAN-myeloid variants
opOncoKbMafName
output file name for MAF file that comes out of oncoKB annotation
output_complexity_filename
Output file name for MAF file annotated with complex regions
output_mappability_filename
Output file name for MAF file annotated with mappable regions
output_vcf2mafName
File name for VCF2MAF conversion
output_maf_name_panmyeloid
Output file name for MAF file annotated with PAN-myeloid dataset
output_47kchpd_maf_name
Output file name for MAF file annotated with 47k CH-PD variations
output_hotspot_maf_name
Output file name for MAF file annotated with hotspot variations
snpsift_countOpName
Output File name for VCF annotated with COSMIC prevalence
snpsift_prevalOpName
Output File name for VCF annotated with COSMIC lineage prevalence
column_name_complexity
Column name in the MAF file where complexity is annotated
column_name_mappability
Column name in the MAF file where mappability is annotated
output_column_name_panmyeloid
Column name in the MAF file where the presence of variants in PAN-Myeloid dataset is annotated
output_column_name_47kchpd
Column name in the MAF file where the presence of variants in 47k CH-PD dataset is annotated
output_column_name_hotspot
Column name in the MAF file where presence of variants in hotspot dataset is annotated
CH specific processing - where the MAF file from the above process is filtered and tagged, specifically for CH variants.
Parameter
Description
Default
output_maf_name_filer
Output MAF file name after filtering for CMO-CH criteria
output_maf_name_tag
Output MAF file name after tagging for CMO-CH criteria
Common workflow language execution engines accept two types of input that are JSON or YAML, please make sure to use one of these while generating the input file. For more information refer to: http://www.commonwl.org/user_guide/yaml/