arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Installation and Usage

You must have run the Nucleo workflow first before running any of the MSK-ACCESS QC workflows. Depending on your use case, there are two main sets of workflows you can choose to run: (1) `qc_generator

hashtag
Step 1: Create a virtual environment.

hashtag
Option (A) - if using cwltool

If you are using cwltool only, please proceed using python 3.6 as done below:

Here we can use either or . Here we will use virtualenv.

hashtag
Option (B) - recommended for Juno HPC cluster

If you are using toil, python 3 is required. Please install using Python 3.6 as done below:

Here we can use either or . Here we will use virtualenv.

circle-info

Once you execute the above command you will see your bash prompt something on this lines:

hashtag
Step 2: Clone the repository

circle-info

Note: Change 0.1.0 to the latest stable release of the pipeline

hashtag
Step 3: Install requirements using pip

We have already specified the version of cwltool and other packages in the requirements.txt file. Please use this to install.

hashtag
Step 4: Generate an inputs file

Next you must generate a proper input file in either or format.

For details on how to create this file, please follow this example (there is a minimal example of what needs to be filled in at the end of the page):

It's also possible to create and fill in a "template" inputs file using this command:

circle-info

Note: To see help for the inputs for cwl workflow you can use: toil-cwl-runner nucleo.cwl --help

Once we have successfully installed the requirements we can now run the workflow using cwltool/toil .

hashtag
Step 5: Run the workflow

Here we show how to use to run the workflow on a single machine, such as a laptop

hashtag
Run the workflow with a given set of input using on single machine

To generate the QC files for one sample:

circle-check

Your workflow should now be running on the specified batch system. See for a description of the resulting files when is it completed.

To aggregate the QC files across one or more samples and visualize with MultiQC:

Here we show how to run the workflow using toil-cwl-runnerarrow-up-right using single machine interface

Once we have successfully installed the requirements we can now run the workflow using cwltool if you have proper input file generated either in jsonarrow-up-right or yamlarrow-up-right format. Please look at Inputs Description for more details.

hashtag
Run the workflow with a given set of input using toilarrow-up-right on single machine

Here we show how to run the workflow using toil-cwl-runnerarrow-up-right on MSKCC internal compute cluster called JUNO which has IBM LSFarrow-up-right as a scheduler.

Note the use of --singularityto convert Docker containers into singularity containers, the TMPDIR environment variable to avoid writing temporary files to shared disk space, the _JAVA_OPTIONS environment variable to specify java temporary directory to /scratch, using SINGULARITY_BINDPATH environment variable to bind the /scratch when running singularity containers and TOIl_LSF_ARGS to specify any additional arguments to bsubcommands that the jobs should have (in this case, setting a max wall-time of 6 hours).

Run the workflow with a given set of input using on JUNO (MSKCC Research Cluster)

virtualenvarrow-up-right
condaarrow-up-right
virtualenvarrow-up-right
condaarrow-up-right
jsonarrow-up-right
yamlarrow-up-right
Inputs Descriptionchevron-right
cwltoolarrow-up-right
cwltoolarrow-up-right
outputsarrow-up-right
cwltool-execution
cwltool nucleo.cwl inputs.yaml
toil-local-execution
toil-cwl-runner nucleo.cwl inputs.yaml
python3-virtualenv
pip3 install virtualenv
python3 -m venv my_project
source my_project/bin/activate
python3-virtaulenv
pip install virtualenv
virtualenv my_project
source my_project/bin/activate
bash-prompt-example
(my_project)[server]$
git-clone-with-submodule
git clone --recursive --branch 0.1.0 https://github.com/msk-access/access_qc_generation.git
python-package-installation-using-pip
#python3
pip3 install -r requirements.txt
$ cwltool --make-template nucleo.cwl > inputs.yaml
cwltool-execution
cwltool nucleo.cwl inputs.yaml
toilarrow-up-right
toil-lsf-execution
TMPDIR=$PWD
TOIL_LSF_ARGS='-W 3600 -P test_nucleo -app anyOS -R select[type==CentOS7]'
_JAVA_OPTIONS='-Djava.io.tmpdir=/scratch/'
SINGULARITY_BINDPATH='/scratch:/scratch:rw'
toil-cwl-runner \
       --singularity \
       --logFile ./example.log  \
       --jobStore ./example_jobStore \
       --batchSystem lsf \
       --workDir ./example_working_directory/ \
       --outdir $PWD \
       --writeLogs ./example_log_folder/ \
       --logLevel DEBUG \
       --stats \
       --retryCount 2 \
       --disableCaching \
       --disableChaining \
       --preserve-environment TOIL_LSF_ARGS TMPDIR \
       --maxLogFileSize 20000000000 \
       --cleanWorkDir onSuccess \
       nucleo.cwl \
       inputs.yaml \
       > toil.stdout \
       2> toil.stderr &