Installation and Usage
You must have run the Nucleo workflow first before running any of the MSK-ACCESS QC workflows. Depending on your use case, there are two main sets of workflows you can choose to run: (1) `qc_generator
Step 1: Create a virtual environment.
Option (A) - if using cwltool
If you are using cwltool only, please proceed using python 3.6 as done below:
Here we can use either virtualenv or conda. Here we will use virtualenv.
pip3 install virtualenv
python3 -m venv my_project
source my_project/bin/activateOption (B) - recommended for Juno HPC cluster
If you are using toil, python 3 is required. Please install using Python 3.6 as done below:
Here we can use either virtualenv or conda. Here we will use virtualenv.
pip install virtualenv
virtualenv my_project
source my_project/bin/activateStep 2: Clone the repository
git clone --recursive --branch 0.1.0 https://github.com/msk-access/access_qc_generation.gitStep 3: Install requirements using pip
We have already specified the version of cwltool and other packages in the requirements.txt file. Please use this to install.
#python3
pip3 install -r requirements.txtStep 4: Generate an inputs file
Next you must generate a proper input file in either json or yaml format.
For details on how to create this file, please follow this example (there is a minimal example of what needs to be filled in at the end of the page):
Inputs DescriptionIt's also possible to create and fill in a "template" inputs file using this command:
$ cwltool --make-template nucleo.cwl > inputs.yamlOnce we have successfully installed the requirements we can now run the workflow using cwltool/toil .
Step 5: Run the workflow
To aggregate the QC files across one or more samples and visualize with MultiQC:
cwltool nucleo.cwl inputs.yamlHere we show how to run the workflow using toil-cwl-runner using single machine interface
Once we have successfully installed the requirements we can now run the workflow using cwltool if you have proper input file generated either in json or yaml format. Please look at Inputs Description for more details.
Run the workflow with a given set of input using toil on single machine
toil-cwl-runner nucleo.cwl inputs.yamlHere we show how to run the workflow using toil-cwl-runner on MSKCC internal compute cluster called JUNO which has IBM LSF as a scheduler.
Note the use of --singularityto convert Docker containers into singularity containers, the TMPDIR environment variable to avoid writing temporary files to shared disk space, the _JAVA_OPTIONS environment variable to specify java temporary directory to /scratch, using SINGULARITY_BINDPATH environment variable to bind the /scratch when running singularity containers and TOIl_LSF_ARGS to specify any additional arguments to bsubcommands that the jobs should have (in this case, setting a max wall-time of 6 hours).
Run the workflow with a given set of input using toil on JUNO (MSKCC Research Cluster)
TMPDIR=$PWD
TOIL_LSF_ARGS='-W 3600 -P test_nucleo -app anyOS -R select[type==CentOS7]'
_JAVA_OPTIONS='-Djava.io.tmpdir=/scratch/'
SINGULARITY_BINDPATH='/scratch:/scratch:rw'
toil-cwl-runner \
--singularity \
--logFile ./example.log \
--jobStore ./example_jobStore \
--batchSystem lsf \
--workDir ./example_working_directory/ \
--outdir $PWD \
--writeLogs ./example_log_folder/ \
--logLevel DEBUG \
--stats \
--retryCount 2 \
--disableCaching \
--disableChaining \
--preserve-environment TOIL_LSF_ARGS TMPDIR \
--maxLogFileSize 20000000000 \
--cleanWorkDir onSuccess \
nucleo.cwl \
inputs.yaml \
> toil.stdout \
2> toil.stderr &Your workflow should now be running on the specified batch system. See outputs for a description of the resulting files when is it completed.
Last updated
Was this helpful?