arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Installation and Usage

If you have paired-end umi-tagged fastqs, you can run the ACCESS fastq to bam workflow with the following steps

hashtag
Step 1: Create a virtual environment.

hashtag
Option (A) - if using cwltool

If you are using cwltool only, please proceed using python 3.9 as done below:

Here we can use either or . Here we will use conda.

hashtag
Option (B) - recommended for Juno HPC cluster

If you are using toil, python 3 is required. Please install using Python 3.9 as done below:

Here we can use either or . Here we will use conda.

circle-info

Once you execute the above command you will see your bash prompt something on this lines:

hashtag
Step 2: Clone the repository

circle-info

Note: Change 3.0.4 to the latest stable release of the pipeline

hashtag
Step 3: Install requirements using pip

We have already specified the version of cwltool and other packages in the requirements.txt file. Please use this to install.

hashtag
Step 4: Check if you have singularity and nodejs for HPC

For HPC normally singularity is used for containers. Thus please make sure that is installed. For JUNO, you can do the following:

We also need to make sure nodejs is installed, this can be installed using conda:

hashtag
Step 5: Generate an inputs file

Next, you must generate a proper input file in either or format.

For details on how to create this file, please follow this example (there is a minimal example of what needs to be filled in at the end of the page):

It's also possible to create and fill in a "template" inputs file using this command:

circle-exclamation

This may or may not work. We are not exactly sure why. But you can always use Rabix to generate the template input

circle-info

Note: To see help for the inputs for cwl workflow you can use: toil-cwl-runner nucleo.cwl --help

Once we have successfully installed the requirements we can now run the workflow using cwltool/toil .

hashtag
Step 6: Run the workflow

Here we show how to use to run the workflow on a single machine, such as a laptop

hashtag
Run the workflow with a given set of input using on single machine

Here we show how to run the workflow using using single machine interface

circle-check

Your workflow should now be running on the specified batch system. See for a description of the resulting files when is it completed.

Once we have successfully installed the requirements we can now run the workflow using cwltool if you have proper input file generated either in jsonarrow-up-right or yamlarrow-up-right format. Please look at Inputs Description for more details.

hashtag
Run the workflow with a given set of input using toilarrow-up-right on single machine

Here we show how to run the workflow using toil-cwl-runnerarrow-up-right on MSKCC internal compute cluster called JUNO which has IBM LSFarrow-up-right as a scheduler.

Note the use of --singularityto convert Docker containers into singularity containers, the TMPDIR environment variable to avoid writing temporary files to shared disk space, the _JAVA_OPTIONS environment variable to specify java temporary directory to /scratch, using SINGULARITY_BINDPATH environment variable to bind the /scratch when running singularity containers and TOIl_LSF_ARGS to specify any additional arguments to bsubcommands that the jobs should have (in this case, setting a max wall-time of 6 hours).

Run the workflow with a given set of input using on JUNO (MSKCC Research Cluster)

virtualenvarrow-up-right
condaarrow-up-right
virtualenvarrow-up-right
condaarrow-up-right
jsonarrow-up-right
yamlarrow-up-right
Inputs Descriptionchevron-right
cwltoolarrow-up-right
cwltoolarrow-up-right
toil-cwl-runnerarrow-up-right
outputs
python3-conda-virtualenv
conda create --name my_project python=3.9
conda activate my_project
python3-conda-virtaulenv
conda create --name my_project python=3.9
conda activate my_project
bash-prompt-example
(my_project)[server]$
git-clone-with-submodule
git clone --recursive --branch 3.0.4 https://github.com/msk-access/nucleo.git
python-package-installation-using-pip
#python3
cd nucleo
pip3 install -r requirements.txt
load-singularity-on-juno
module load singularity
conda-install-nodejs
conda install -c conda-forge nodejs
$ cwltool --make-template nucleo.cwl > inputs.yaml
cwltool-execution
cwltool nucleo.cwl inputs.yaml
toil-local-execution
toil-cwl-runner nucleo.cwl inputs.yaml
toilarrow-up-right
toil-lsf-execution
TMPDIR=$PWD
TOIL_LSF_ARGS='-W 3600 -P test_nucleo -app anyOS -R select[type==CentOS7]'
_JAVA_OPTIONS='-Djava.io.tmpdir=/scratch/'
SINGULARITY_BINDPATH='/scratch:/scratch:rw'
toil-cwl-runner \
       --singularity \
       --logFile ./example.log  \
       --jobStore ./example_jobStore \
       --batchSystem lsf \
       --workDir ./example_working_directory/ \
       --outdir $PWD \
       --writeLogs ./example_log_folder/ \
       --logLevel DEBUG \
       --stats \
       --retryCount 2 \
       --disableCaching \
       --disableChaining \
       --preserve-environment TOIL_LSF_ARGS TMPDIR \
       --maxLogFileSize 20000000000 \
       --cleanWorkDir onSuccess \
       nucleo.cwl \
       inputs.yaml \
       > toil.stdout \
       2> toil.stderr &