If you have paired-end umi-tagged fastqs, you can run the ACCESS fastq to bam workflow with the following steps
If you are using cwltool only, please proceed using python 3.9 as done below:
Here we can use either virtualenv or conda. Here we will use conda.
If you are using toil, python 3 is required. Please install using Python 3.9 as done below:
Here we can use either virtualenv or conda. Here we will use conda.
Once you execute the above command you will see your bash prompt something on this lines:
Note: Change 3.0.4 to the latest stable release of the pipeline
We have already specified the version of cwltool and other packages in the requirements.txt file. Please use this to install.
For HPC normally singularity is used for containers. Thus please make sure that is installed. For JUNO, you can do the following:
We also need to make sure nodejs is installed, this can be installed using conda:
Next, you must generate a proper input file in either json or yaml format.
For details on how to create this file, please follow this example (there is a minimal example of what needs to be filled in at the end of the page):
It's also possible to create and fill in a "template" inputs file using this command:
This may or may not work. We are not exactly sure why. But you can always use Rabix to generate the template input
Note: To see help for the inputs for cwl workflow you can use: toil-cwl-runner nucleo.cwl --help
Once we have successfully installed the requirements we can now run the workflow using cwltool/toil .
Here we show how to run the workflow using toil-cwl-runner using single machine interface
Once we have successfully installed the requirements we can now run the workflow using cwltool if you have proper input file generated either in json or yaml format. Please look at Inputs Description for more details.
Here we show how to run the workflow using toil-cwl-runner on MSKCC internal compute cluster called JUNO which has IBM LSF as a scheduler.
Note the use of --singularity
to convert Docker containers into singularity containers, the TMPDIR
environment variable to avoid writing temporary files to shared disk space, the _JAVA_OPTIONS
environment variable to specify java temporary directory to /scratch
, using SINGULARITY_BINDPATH
environment variable to bind the /scratch
when running singularity containers and TOIl_LSF_ARGS
to specify any additional arguments to bsub
commands that the jobs should have (in this case, setting a max wall-time of 6 hours).
Run the workflow with a given set of input using toil on JUNO (MSKCC Research Cluster)
Your workflow should now be running on the specified batch system. See outputs for a description of the resulting files when is it completed.