Access Quality Control (v2)
MSK-ACCESS QC generation V2
MSK-ACCESS QC generation V2
  • MSK-ACCESS QC generation
  • Installation and Running
    • Requirements
    • Installation and Usage
    • Inputs Description
  • Interpretation
    • Sample meta information
    • Coverage vs GC bias
    • Insert size metrics
    • Target coverage distribution
    • Capture metrics
    • Duplex family metrics
    • Mean base quality
    • Duplex noise metrics
    • Contamination
    • Fingerprinting
Powered by GitBook
On this page
  • Step 1: Create a virtual environment.
  • Option (A) - if using cwltool
  • Option (B) - recommended for Juno HPC cluster
  • Step 2: Clone the repository
  • Step 3: Install requirements using pip
  • Step 4: Generate an inputs file
  • Step 5: Run the workflow

Was this helpful?

Export as PDF
  1. Installation and Running

Installation and Usage

You must have run the Nucleo workflow first before running any of the MSK-ACCESS QC workflows. Depending on your use case, there are two main sets of workflows you can choose to run: (1) `qc_generator

PreviousRequirementsNextInputs Description

Last updated 3 years ago

Was this helpful?

Step 1: Create a virtual environment.

Option (A) - if using cwltool

If you are using cwltool only, please proceed using python 3.6 as done below:

Here we can use either or . Here we will use virtualenv.

python3-virtualenv
pip3 install virtualenv
python3 -m venv my_project
source my_project/bin/activate

Option (B) - recommended for Juno HPC cluster

If you are using toil, python 3 is required. Please install using Python 3.6 as done below:

Here we can use either or . Here we will use virtualenv.

python3-virtaulenv
pip install virtualenv
virtualenv my_project
source my_project/bin/activate

Once you execute the above command you will see your bash prompt something on this lines:

bash-prompt-example
(my_project)[server]$

Step 2: Clone the repository

git-clone-with-submodule
git clone --recursive --branch 0.1.0 https://github.com/msk-access/access_qc_generation.git

Note: Change 0.1.0 to the latest stable release of the pipeline

Step 3: Install requirements using pip

We have already specified the version of cwltool and other packages in the requirements.txt file. Please use this to install.

python-package-installation-using-pip
#python3
pip3 install -r requirements.txt

Step 4: Generate an inputs file

For details on how to create this file, please follow this example (there is a minimal example of what needs to be filled in at the end of the page):

It's also possible to create and fill in a "template" inputs file using this command:

$ cwltool --make-template nucleo.cwl > inputs.yaml

Note: To see help for the inputs for cwl workflow you can use: toil-cwl-runner nucleo.cwl --help

Once we have successfully installed the requirements we can now run the workflow using cwltool/toil .

Step 5: Run the workflow

To generate the QC files for one sample:

cwltool-execution
cwltool nucleo.cwl inputs.yaml

To aggregate the QC files across one or more samples and visualize with MultiQC:

cwltool-execution
cwltool nucleo.cwl inputs.yaml
toil-local-execution
toil-cwl-runner nucleo.cwl inputs.yaml

Note the use of --singularityto convert Docker containers into singularity containers, the TMPDIR environment variable to avoid writing temporary files to shared disk space, the _JAVA_OPTIONS environment variable to specify java temporary directory to /scratch, using SINGULARITY_BINDPATH environment variable to bind the /scratch when running singularity containers and TOIl_LSF_ARGS to specify any additional arguments to bsubcommands that the jobs should have (in this case, setting a max wall-time of 6 hours).

toil-lsf-execution
TMPDIR=$PWD
TOIL_LSF_ARGS='-W 3600 -P test_nucleo -app anyOS -R select[type==CentOS7]'
_JAVA_OPTIONS='-Djava.io.tmpdir=/scratch/'
SINGULARITY_BINDPATH='/scratch:/scratch:rw'
toil-cwl-runner \
       --singularity \
       --logFile ./example.log  \
       --jobStore ./example_jobStore \
       --batchSystem lsf \
       --workDir ./example_working_directory/ \
       --outdir $PWD \
       --writeLogs ./example_log_folder/ \
       --logLevel DEBUG \
       --stats \
       --retryCount 2 \
       --disableCaching \
       --disableChaining \
       --preserve-environment TOIL_LSF_ARGS TMPDIR \
       --maxLogFileSize 20000000000 \
       --cleanWorkDir onSuccess \
       nucleo.cwl \
       inputs.yaml \
       > toil.stdout \
       2> toil.stderr &

Next you must generate a proper input file in either or format.

Here we show how to use to run the workflow on a single machine, such as a laptop

Run the workflow with a given set of input using on single machine

Here we show how to run the workflow using using single machine interface

Once we have successfully installed the requirements we can now run the workflow using cwltool if you have proper input file generated either in or format. Please look at for more details.

Run the workflow with a given set of input using on single machine

Here we show how to run the workflow using on MSKCC internal compute cluster called JUNO which has as a scheduler.

Run the workflow with a given set of input using on JUNO (MSKCC Research Cluster)

Your workflow should now be running on the specified batch system. See for a description of the resulting files when is it completed.

virtualenv
conda
virtualenv
conda
json
yaml
Inputs Description
cwltool
cwltool
toil-cwl-runner
json
yaml
Inputs Description
toil
toil-cwl-runner
IBM LSF
toil
outputs