arrow-left

All pages
gitbookPowered by GitBook
1 of 4

Loading...

Loading...

Loading...

Loading...

Inputs Description

Input files and parameters required to run workflow

circle-exclamation

Common workflow language execution engines accept two types of input that are JSONarrow-up-right or YAMLarrow-up-right, please make sure to use one of these while generating the input file. For more information refer to: http://www.commonwl.org/user_guide/yaml/arrow-up-right

hashtag
Parameter Used by Tools

hashtag
Common Parameters Across Tools

hashtag
Template Inputs File

simplex_bam

Simplex BAM file.(Required)

sample_name

The sample name (Required)

sample_group

The sample group (e.g. the patient ID).

sample_sex

The sample sex (e.g. M). (Required)

pool_a_bait_intervals

The Pool A bait interval file.(Required)

pool_a_target_intervals

The Pool A targets interval file.(Required)

pool_b_bait_intervals

The Pool B bait interval file.(Required)

pool_b_target_intervals

The Pool B targets interval file.(Required)

noise_sites_bed

BED file containing sites for duplex noise calculation.(Required)

biometrics_vcf_file

VCF file containing sites for genotyping and contamination calculations.(Required)

reference

Reference sequence file. Please include ".fai", "^.dict", ".amb" , ".sa", ".bwt", ".pac", ".ann" as secondary files if they are not present in the same location as the ".fasta" file

biometrics_plot

Whether to output biometrics plots.

true

biometrics_json

Whether to output biometrics results in JSON.

true

collapsed_biometrics_coverage_threshold

Coverage threshold for biometrics collapsed BAM calculations.

200

collapsed_biometrics_major_threshold

Major contamination threshold for biometrics collapsed BAM calculations.

1

collapsed_biometrics_min_base_quality

Minimum base quality threshold for biometrics collapsed BAM calculations.

1

collapsed_biometrics_min_coverage

Minimum coverage for a site to be included in biometrics collapsed BAM calculations.

10

collapsed_biometrics_min_homozygous_thresh

Minimum threshold to consider a site as homozygous in biometrics collapsed BAM calculations.

0.1

collapsed_biometrics_min_mapping_quality

Minimum mapping quality for biometrics collapsed BAM calculations.

10

collapsed_biometrics_minor_threshold

Minor contamination threshold used for biometrics collapsed BAM calculations.

0.02

duplex_biometrics_major_threshold

Major contamination threshold for biometrics duplex BAM calculations.

0.6

duplex_biometrics_min_base_quality

Minimum base quality threshold for biometrics duplex BAM calculations.

1

duplex_biometrics_min_coverage

Minimum coverage for a site to be included in biometrics duplex BAM calculations.

10

duplex_biometrics_min_homozygous_thresh

Minimum threshold to consider a site as homozygous in biometrics duplex BAM calculations.

0.1

duplex_biometrics_min_mapping_quality

Minimum mapping quality for biometrics duplex BAM calculations.

1

duplex_biometrics_minor_threshold

Minor contamination threshold used for biometrics duplex BAM calculations.

0.02

hsmetrics_coverage_cap

Read coverage max for CollectHsMetrics calculations.

30000

hsmetrics_minimum_base_quality

Minimum base quality for CollectHsMetrics calculations.

10

hsmetrics_minimum_mapping_quality

Minimum mapping quality for CollectHsMetrics calculations.

10

sequence_qc_min_basq

Minimum base quality threshold for sequence_qc calculations.

1

sequence_qc_min_mapq

Minimum mapping quality threshold for sequence_qc calculations.

1

sequence_qc_threshold

Noise threshold used for sequence_qc calculations.

0.002

sequence_qc_truncate

Whether to set the truncate parameter to True when using pysam.

Argument Name

Summary

Default Value

uncollapsed_bam

Base-recalibrated uncollapsed BAM file.(Required)

collapsed_bam

Collapsed BAM file.(Required)

group_reads_by_umi_bam

Collapsed BAM file produced by fgbio's GroupReadsByUmi tool.(Required)

duplex_bam

Duplex BAM file.(Required)

inputs.yaml
biometrics_bed_file:
  class: File
  path: /path/to/MSK-ACCESS-v1_0-probe-B.sorted.bed
biometrics_json: true
biometrics_plot: true
biometrics_vcf_file:
  class: File
  path: /path/to/MSK-ACCESS-v1_0-TilingaAndFpSNPs.vcf
collapsed_bam:
- class: File
  path: /path/to/bam
collapsed_biometrics_coverage_threshold: null
collapsed_biometrics_major_threshold: null
collapsed_biometrics_min_base_quality: null
collapsed_biometrics_min_coverage: null
collapsed_biometrics_min_homozygous_thresh: null
collapsed_biometrics_min_mapping_quality: null
collapsed_biometrics_minor_threshold: null
duplex_bam:
- class: File
  path: /path/to/bam
duplex_biometrics_major_threshold: null
duplex_biometrics_min_base_quality: null
duplex_biometrics_min_coverage: null
duplex_biometrics_min_homozygous_thresh: null
duplex_biometrics_min_mapping_quality: null
duplex_biometrics_minor_threshold: null
group_reads_by_umi_bam:
- class: File
  path: /path/to/bam
hsmetrics_coverage_cap: 30000
hsmetrics_minimum_base_quality: 1
hsmetrics_minimum_mapping_quality: 1
noise_sites_bed:
  class: File
  path: /path/to/MSK-ACCESS-v1_0-probe-A_no_msi_sorted_deduped.bed
pool_a_bait_intervals:
  class: File
  path: /path/to/MSK-ACCESS-v1_0-probe-A_baits.sorted.interval_list
pool_a_target_intervals:
  class: File
  path: /path/to/MSK-ACCESS-v1_0_panelA_targets.interval_list
pool_b_bait_intervals:
  class: File
  path: /path/to/MSK-ACCESS-v1_0-probe-B_baits.sorted.interval_list
pool_b_target_intervals:
  class: File
  path: /path
reference:
  class: File
  path: /path
sample_group:
- patient_id
sample_name:
- sample_id
sample_sex:
- M
sample_type:
- tumor
sequence_qc_min_basq: 1
sequence_qc_min_mapq: 1
sequence_qc_threshold: null
sequence_qc_truncate: null
simplex_bam:
- class: File
  path: /path
uncollapsed_bam:
- class: File
  path: /path
uncollapsed_bam_base_recal:
- class: File
  path: /path

Requirements

hashtag
Requirements

hashtag
Before of the pipeline, make sure your system supports these requirements

Following are the requirements for running the workflow:
  • A system with either dockerarrow-up-right or singularityarrow-up-right configured.

  • Python 3.6 (for running cwltoolarrow-up-rightand running toil-cwl-runnerarrow-up-right)

    • Python Packages (will be installed as part of pipeline installation):

      • toil[cwl]==5.1.0

      • pytz==2021.1

      • typing==3.7.4.3

    • Python Virtual Environment using or .

Installationarrow-up-right
ruamel.yaml==0.16.5
  • pip==20.2.3

  • bumpversion==0.6.0

  • wheel==0.35.1

  • watchdog==0.10.3

  • flake8==3.8.4

  • tox==3.20.0

  • coverage==5.3

  • twine==3.2.0

  • pytest==6.1.1

  • pytest-runner==5.2

  • coloredlogs==10.0

  • pytest-travis-fold==1.3.0

  • virtualenvarrow-up-right
    condaarrow-up-right

    Installation and Running

    Workflows that generate, aggregate, and visualize quality control files for MSK-ACCESS.

    Given the output files from Nucleoarrow-up-right, there are workflows to generate the quality control files, aggregate them files across many samples, and visualize them using MultQC. You can choose to run these workflows whether you have just one or hundreds of samples. Depending on your use case, there are two main options:

    (1) Run qc_generator.cwl followed by aggregate_visualize.cwl. This approach first generates the QC files for one or more samples, and you use the second CWL script to aggregate the QC files and visualize them with MultiQC. This option can be useful for when you want to generate the QC files for some samples just once and then reuse those samples in multiple MultiQC reports.

    (2) Run just access_qc.cwl. This option just combines the two steps from the first option into one workflow. This workflow can

    circle-exclamation

    Warning: Including more than 50 samples in the MultiQC report will cause some figures to lose interactivity. Including more than a few hundreds samples may cause MultiQC to fail.

    Installation and Usage

    You must have run the Nucleo workflow first before running any of the MSK-ACCESS QC workflows. Depending on your use case, there are two main sets of workflows you can choose to run: (1) `qc_generator

    hashtag
    Step 1: Create a virtual environment.

    hashtag
    Option (A) - if using cwltool

    If you are using cwltool only, please proceed using python 3.6 as done below:

    Here we can use either or . Here we will use virtualenv.

    hashtag
    Option (B) - recommended for Juno HPC cluster

    If you are using toil, python 3 is required. Please install using Python 3.6 as done below:

    Here we can use either or . Here we will use virtualenv.

    circle-info

    Once you execute the above command you will see your bash prompt something on this lines:

    hashtag
    Step 2: Clone the repository

    circle-info

    Note: Change 0.1.0 to the latest stable release of the pipeline

    hashtag
    Step 3: Install requirements using pip

    We have already specified the version of cwltool and other packages in the requirements.txt file. Please use this to install.

    hashtag
    Step 4: Generate an inputs file

    Next you must generate a proper input file in either or format.

    For details on how to create this file, please follow this example (there is a minimal example of what needs to be filled in at the end of the page):

    It's also possible to create and fill in a "template" inputs file using this command:

    circle-info

    Note: To see help for the inputs for cwl workflow you can use: toil-cwl-runner nucleo.cwl --help

    Once we have successfully installed the requirements we can now run the workflow using cwltool/toil .

    hashtag
    Step 5: Run the workflow

    Here we show how to use to run the workflow on a single machine, such as a laptop

    hashtag
    Run the workflow with a given set of input using on single machine

    To generate the QC files for one sample:

    circle-check

    Your workflow should now be running on the specified batch system. See for a description of the resulting files when is it completed.

    To aggregate the QC files across one or more samples and visualize with MultiQC:

    Here we show how to run the workflow using toil-cwl-runnerarrow-up-right using single machine interface

    Once we have successfully installed the requirements we can now run the workflow using cwltool if you have proper input file generated either in jsonarrow-up-right or yamlarrow-up-right format. Please look at Inputs Description for more details.

    hashtag
    Run the workflow with a given set of input using toilarrow-up-right on single machine

    Here we show how to run the workflow using toil-cwl-runnerarrow-up-right on MSKCC internal compute cluster called JUNO which has IBM LSFarrow-up-right as a scheduler.

    Note the use of --singularityto convert Docker containers into singularity containers, the TMPDIR environment variable to avoid writing temporary files to shared disk space, the _JAVA_OPTIONS environment variable to specify java temporary directory to /scratch, using SINGULARITY_BINDPATH environment variable to bind the /scratch when running singularity containers and TOIl_LSF_ARGS to specify any additional arguments to bsubcommands that the jobs should have (in this case, setting a max wall-time of 6 hours).

    Run the workflow with a given set of input using on JUNO (MSKCC Research Cluster)

    virtualenvarrow-up-right
    condaarrow-up-right
    virtualenvarrow-up-right
    condaarrow-up-right
    jsonarrow-up-right
    yamlarrow-up-right
    Inputs Descriptionchevron-right
    cwltoolarrow-up-right
    cwltoolarrow-up-right
    outputsarrow-up-right
    cwltool-execution
    cwltool nucleo.cwl inputs.yaml
    toil-local-execution
    toil-cwl-runner nucleo.cwl inputs.yaml
    python3-virtualenv
    pip3 install virtualenv
    python3 -m venv my_project
    source my_project/bin/activate
    python3-virtaulenv
    pip install virtualenv
    virtualenv my_project
    source my_project/bin/activate
    bash-prompt-example
    (my_project)[server]$
    git-clone-with-submodule
    git clone --recursive --branch 0.1.0 https://github.com/msk-access/access_qc_generation.git
    python-package-installation-using-pip
    #python3
    pip3 install -r requirements.txt
    $ cwltool --make-template nucleo.cwl > inputs.yaml
    cwltool-execution
    cwltool nucleo.cwl inputs.yaml
    toilarrow-up-right
    toil-lsf-execution
    TMPDIR=$PWD
    TOIL_LSF_ARGS='-W 3600 -P test_nucleo -app anyOS -R select[type==CentOS7]'
    _JAVA_OPTIONS='-Djava.io.tmpdir=/scratch/'
    SINGULARITY_BINDPATH='/scratch:/scratch:rw'
    toil-cwl-runner \
           --singularity \
           --logFile ./example.log  \
           --jobStore ./example_jobStore \
           --batchSystem lsf \
           --workDir ./example_working_directory/ \
           --outdir $PWD \
           --writeLogs ./example_log_folder/ \
           --logLevel DEBUG \
           --stats \
           --retryCount 2 \
           --disableCaching \
           --disableChaining \
           --preserve-environment TOIL_LSF_ARGS TMPDIR \
           --maxLogFileSize 20000000000 \
           --cleanWorkDir onSuccess \
           nucleo.cwl \
           inputs.yaml \
           > toil.stdout \
           2> toil.stderr &