Inputs Description
Input files and parameters required to run workflow
Common Parameters
Parameter | Description | Default |
reference_fasta | Reference FASTA file | |
sample_name | The name of the sample submitted to the workflow |
The entire workflow can be divided into 3 parts. 1. VARDICT workflow - consisting of calling the variants from VARDICT and normalizing and concatenating the complex and simple Variants in VCF format
Parameter | Description | Default |
BedFile | Target file | |
Vardict_allele_frequency_threshold | Vardict | 0.01 |
Minimum_allele_frequency | 0.05 | |
input_bam_case: | Input CH sample BAM file | |
ad | Allele Depth | 1 |
totalDepth | Total Depth | 20 |
tnRatio | Tumor-Normal Variant Fraction ratio threshold | 1 |
variantFraction | Tumor Variant fraction threshold | 5.00E-05 |
minQual | Minimum variant call quality | 0 |
allow_overlaps | First coordinate of the next file can precede last record of the current file | TRUE |
stdout | Write to standard output, keep original files unchanged | TRUE |
check-ref | what to do when incorrect or missing REF allele is encountered. 's' is to set/fix bad sites. Note that 's' can swap alleles and will update genotypes (GT) and AC counts, but will not attempt to fix PL or other fields. Also it will not fix strand issues in your VCF. | s |
multiallelics | If multiallelic sites should be split or joined. '+'denotes that the biallelic sites should be joined into a multiallelic record. | + |
output-type | Output type from BCFtools sort. 'z' denotes compressed VCF | z |
preset | Input format for indexing | VCF |
sample-name_vardict_STDFilter.txt | ||
sample-name_single_filter_vcf | VCF file with filtered SNPs | |
sample-name_single_filer_complex.vcf | VCF file with filtered complex variant | |
sample-name_vardict_concatenated.vcf | VCF file with both complex and simple Variants |
2. Variant Annotation - The VCF file from the before process is annotated with various files.
Parameter | Description | Default |
retain_info | Comma-delimited names of INFO fields to retain as extra columns in MAF | CNT,TUMOR_TYPE |
min_hom_vaf | If GT undefined in VCF, minimum allele fraction to call a variant homozygous | 0.7 |
buffer_size | Number of variants VEP loads at a time; Reduce this for low memory systems | 5000 |
custom_enst | List of custom ENST IDs that override canonical selection, in a file | |
input_cosmicCountDB_vcf | VCF file from COSMIC database with overall prevalence for a variant | |
input_cosmicprevalenceDB_vcf | VCF file from COSMIC database with lineage specific prevalence for a variant | |
input_complexity_bed | BED file with complex regions | |
input_mappability_bed | BED file with un-mappable regions | |
oncoKbApiToken | oncKB API token file | |
input_47kchpd_tsv_file | TSV file with 47k CH-PD variants | |
input_hotspot_tsv_file | TSV file with hotspots obtained from 47k CH-PD variants | |
input_panmeloid_tsv_file | TSV file with PAN-myeloid variants | |
opOncoKbMafName | output file name for MAF file that comes out of oncoKB annotation | |
output_complexity_filename | Output file name for MAF file annotated with complex regions | |
output_mappability_filename | Output file name for MAF file annotated with mappable regions | |
output_vcf2mafName | File name for VCF2MAF conversion | |
output_maf_name_panmyeloid | Output file name for MAF file annotated with PAN-myeloid dataset | |
output_47kchpd_maf_name | Output file name for MAF file annotated with 47k CH-PD variations | |
output_hotspot_maf_name | Output file name for MAF file annotated with hotspot variations | |
snpsift_countOpName | Output File name for VCF annotated with COSMIC prevalence | |
snpsift_prevalOpName | Output File name for VCF annotated with COSMIC lineage prevalence | |
column_name_complexity | Column name in the MAF file where complexity is annotated | |
column_name_mappability | Column name in the MAF file where mappability is annotated | |
output_column_name_panmyeloid | Column name in the MAF file where the presence of variants in PAN-Myeloid dataset is annotated | |
output_column_name_47kchpd | Column name in the MAF file where the presence of variants in 47k CH-PD dataset is annotated | |
output_column_name_hotspot | Column name in the MAF file where presence of variants in hotspot dataset is annotated |
CH specific processing - where the MAF file from the above process is filtered and tagged, specifically for CH variants.
Parameter | Description | Default |
output_maf_name_filer | Output MAF file name after filtering for CMO-CH criteria | |
output_maf_name_tag | Output MAF file name after tagging for CMO-CH criteria |
Common workflow language execution engines accept two types of input that are JSON or YAML, please make sure to use one of these while generating the input file. For more information refer to: http://www.commonwl.org/user_guide/yaml/
Example Input YML file to run the CWL
Last updated