1 of 6

Nucleo - UMI based BAM generation

Requirements

Before Installation of the pipeline, make sure your system supports these requirements

Following are the requirements for running the workflow:

A system with either or configured.
Python 3.6 (for running and running )
- Python Packages (will be installed as part of pipeline installation):

Installation and Usage

If you have paired-end umi-tagged fastqs, you can run the ACCESS fastq to bam workflow with the following steps

Step 1: Create a virtual environment.

Option (A) - if using cwltool

If you are using cwltool only, please proceed using python 3.9 as done below:

Here we can use either or . Here we will use conda.

Option (B) - recommended for Juno HPC cluster

If you are using toil, python 3 is required. Please install using Python 3.9 as done below:

Here we can use either or . Here we will use conda.

Once you execute the above command you will see your bash prompt something on this lines:

Step 2: Clone the repository

Note: Change 3.0.4 to the latest stable release of the pipeline

Step 3: Install requirements using pip

We have already specified the version of cwltool and other packages in the requirements.txt file. Please use this to install.

Step 4: Check if you have singularity and nodejs for HPC

For HPC normally singularity is used for containers. Thus please make sure that is installed. For JUNO, you can do the following:

We also need to make sure nodejs is installed, this can be installed using conda:

Step 5: Generate an inputs file

Next, you must generate a proper input file in either or format.

For details on how to create this file, please follow this example (there is a minimal example of what needs to be filled in at the end of the page):

It's also possible to create and fill in a "template" inputs file using this command:

This may or may not work. We are not exactly sure why. But you can always use Rabix to generate the template input

Note: To see help for the inputs for cwl workflow you can use: toil-cwl-runner nucleo.cwl --help

Once we have successfully installed the requirements we can now run the workflow using cwltool/toil .

Step 6: Run the workflow

Here we show how to use to run the workflow on a single machine, such as a laptop

Run the workflow with a given set of input using on single machine

Here we show how to run the workflow using using single machine interface

Your workflow should now be running on the specified batch system. See for a description of the resulting files when is it completed.

Inputs Description

Input files and parameters required to run workflow

Common workflow language execution engines accept two types of input that are JSON or YAML, please make sure to use one of these while generating the input file. For more information refer to: http://www.commonwl.org/user_guide/yaml/

Parameter Used by Tools

Common Parameters Across Tools

Uncollapsed BAM Generation

Fgbio

Picard

GATK

Picard

bedtools

Picard

Base Quality Score Recalibration

GATK

Collapsed BAM Generation

Fgbio

Picard

Template Inputs File

Outputs Description

Files present after workflow is finished

Output File

Description

sample-name_fastp_out.html

Trimming metrics from fastp in html format

sample-name_fastp_out.json

Trimming metrics from fastp in json format

sample-name_fx.bam

Binary alignment map (BAM) file generated after FixMateInformation

sample-name_fx.bai

The binary alignment index (BAI) file associated with the FixMateInformation bam file

Installation and Usage

If you have paired-end umi-tagged fastqs, you can run the ACCESS fastq to bam workflow with the following steps

Step 1: Create a virtual environment.

Option (A) - if using cwltool

If you are using cwltool only, please proceed using python 3.9 as done below:

Here we can use either or . Here we will use conda.

Option (B) - recommended for Juno HPC cluster

If you are using toil, python 3 is required. Please install using Python 3.9 as done below:

Here we can use either or . Here we will use conda.

Once you execute the above command you will see your bash prompt something on this lines:

Step 2: Clone the repository

Note: Change 3.0.4 to the latest stable release of the pipeline

Step 3: Install requirements using pip

We have already specified the version of cwltool and other packages in the requirements.txt file. Please use this to install.

Step 4: Check if you have singularity and nodejs for HPC

For HPC normally singularity is used for containers. Thus please make sure that is installed. For JUNO, you can do the following:

We also need to make sure nodejs is installed, this can be installed using conda:

Step 5: Generate an inputs file

Next, you must generate a proper input file in either or format.

For details on how to create this file, please follow this example (there is a minimal example of what needs to be filled in at the end of the page):

It's also possible to create and fill in a "template" inputs file using this command:

This may or may not work. We are not exactly sure why. But you can always use Rabix to generate the template input

Note: To see help for the inputs for cwl workflow you can use: toil-cwl-runner nucleo.cwl --help

Once we have successfully installed the requirements we can now run the workflow using cwltool/toil .

Step 6: Run the workflow

Here we show how to use to run the workflow on a single machine, such as a laptop

Run the workflow with a given set of input using on single machine

Here we show how to run the workflow using using single machine interface

Your workflow should now be running on the specified batch system. See for a description of the resulting files when is it completed.

Here we show how to run the workflow using toil-cwl-runner on MSKCC internal compute cluster called JUNO which has IBM LSF as a scheduler.

Note the use of --singularityto convert Docker containers into singularity containers, the TMPDIR environment variable to avoid writing temporary files to shared disk space, the _JAVA_OPTIONS environment variable to specify java temporary directory to /scratch, using SINGULARITY_BINDPATH environment variable to bind the /scratch when running singularity containers and TOIl_LSF_ARGS to specify any additional arguments to bsubcommands that the jobs should have (in this case, setting a max wall-time of 6 hours).

Run the workflow with a given set of input using on JUNO (MSKCC Research Cluster)

inputs.yaml

BC_abra2_output_bams: null
BC_bwa_mem_output: null
BC_gatk_merge_bam_alignment_output_file_name: null
BC_gatk_sam_to_fastq_output_name_R1: null
BC_gatk_sam_to_fastq_output_name_R2: null
BC_picard_addRG_output_file_name: null
BC_picard_fixmate_information_output_file_name: null
UBG_abra2_output_bams: null
UBG_bwa_mem_output: null
UBG_gatk_merge_bam_alignment_output_file_name: null
UBG_picard_SamToFastq_R1_output_fastq: null
UBG_picard_SamToFastq_R2_output_fastq: null
UBG_picard_addRG_output_file_name: null
UBG_picard_fixmateinformation_output_file_name: null
abra2_bam_index: null
abra2_consensus_sequence: null
abra2_contig_anchor: null
abra2_maximum_average_depth: null
abra2_maximum_mixmatch_rate: null
abra2_no_edge_complex_indel: null
abra2_scoring_gap_alignments: null
abra2_soft_clip_contig: null
abra2_window_size: null
apply_bqsr_output_file_name: null
base_recalibrator_output_file_name: null
bedtools_genomecov_option_bedgraph: null
bedtools_merge_distance_between_features: null
bwa_mem_K: null
bwa_mem_T: null
bwa_mem_Y: null
create_bam_index: null
fastp_html_output_file_name: null
fastp_json_output_file_name: null
fastp_minimum_read_length: null
fastp_read1_adapter_sequence: null
fastp_read1_output_file_name: null
fastp_read2_adapter_sequence: null
fastp_read2_output_file_name: null
fgbio_async_io: null
fgbio_call_duplex_consensus_reads_min_reads: null
fgbio_call_duplex_consensus_reads_output_file_name: null
fgbio_collect_duplex_seq_metrics_duplex_umi_counts: null
fgbio_collect_duplex_seq_metrics_intervals: null
fgbio_collect_duplex_seq_metrics_output_prefix: null
fgbio_fastq_to_bam_input: null
fgbio_filter_consensus_read_min_base_quality_duplex: null
fgbio_filter_consensus_read_min_base_quality_simplex_duplex: null
fgbio_filter_consensus_read_min_reads_duplex: null
fgbio_filter_consensus_read_min_reads_simplex_duplex: null
fgbio_filter_consensus_read_output_file_name_duplex: null
fgbio_filter_consensus_read_output_file_name_duplex_aln_metrics: null
fgbio_filter_consensus_read_output_file_name_simplex_aln_metrics: null
fgbio_filter_consensus_read_output_file_name_simplex_duplex: null
fgbio_filter_consensus_read_reverse_per_base_tags_simplex_duplex: null
fgbio_group_reads_by_umi_family_size_histogram: null
fgbio_group_reads_by_umi_output_file_name: null
fgbio_group_reads_by_umi_strategy: null
fgbio_postprocessing_output_file_name_simplex: null
gatk_base_recalibrator_add_output_sam_program_record: null
gatk_base_recalibrator_known_sites:
  - class: File
    metadata: {}
    path: >-
      /Users/shahr2/Documents/test_reference/test_fastq_to_bam/known_sites/dbsnp_137_14_16.b37.vcf
    secondaryFiles:
      - class: File
        path: >-
          /Users/shahr2/Documents/test_reference/test_nucleo/known_sites/dbsnp_137_14_16.b37.vcf.idx
  - class: File
    metadata: {}
    path: >-
      /Users/shahr2/Documents/test_reference/test_fastq_to_bam/known_sites/Mills_and_1000G_gold_standard-14_16.indels.b37.vcf
    secondaryFiles:
      - class: File
        path: >-
          /Users/shahr2/Documents/test_reference/test_fastq_to_bam/known_sites/Mills_and_1000G_gold_standard-14_16.indels.b37.vcf.idx
gatk_collect_alignment_summary_metrics_output_file_name: null
gatk_mark_duplicates_duplication_metrics_file_name: null
gatk_mark_duplicates_output_file_name: null
gatk_merge_sam_files_output_file_name: null
library: null
merge_sam_files_sort_order: null
optical_duplicate_pixel_distance: null
picard_addRG_sort_order: null
platform: null
platform-model: null
platform-unit: null
read-group-id: null
read-structures: null
reference_sequence:
  class: File
  metadata: {}
  path: /Users/shahr2/Documents/test_reference/fasta/chr14_chr16.fasta
  secondaryFiles:
    - class: File
      path: ../../test_reference/fasta/chr14_chr16.fasta.amb
    - class: File
      path: ../../test_reference/fasta/chr14_chr16.fasta.ann
run-date: null
sample: null
sequencing-center: null
sort_order: null
temporary_directory: null
validation_stringency: null

Nucleo - UMI based BAM generation

Requirements

hashtagRequirements

hashtagBefore Installationarrow-up-right of the pipeline, make sure your system supports these requirements

Installation and Usage

hashtagStep 1: Create a virtual environment.

hashtagOption (A) - if using cwltool

hashtagOption (B) - recommended for Juno HPC cluster

hashtagStep 2: Clone the repository

hashtagStep 3: Install requirements using pip

hashtagStep 4: Check if you have singularity and nodejs for HPC

hashtagStep 5: Generate an inputs file

hashtagStep 6: Run the workflow

hashtagRun the workflow with a given set of input using on single machine

Inputs Description

hashtagParameter Used by Tools

hashtagCommon Parameters Across Tools

hashtagUncollapsed BAM Generation

hashtagFgbio

hashtagPicard

hashtagPicard

hashtag

hashtag

hashtagPicard

hashtagGATK

hashtagPicard

hashtagbedtools

hashtagbedtools

hashtag

hashtagPicard

hashtagBase Quality Score Recalibration

hashtagGATK

hashtagGATK

hashtagCollapsed BAM Generation

hashtagFgbio

hashtagFgbio

hashtagFgbio

hashtagFgbio

hashtagFgbio

hashtagPicard

hashtagTemplate Inputs File

Outputs Description

Outputs Description

Requirements

hashtagRequirements

hashtagBefore Installationarrow-up-right of the pipeline, make sure your system supports these requirements

Inputs Description

hashtagParameter Used by Tools

hashtagCommon Parameters Across Tools

hashtagUncollapsed BAM Generation

hashtagFgbio

hashtagPicard

hashtagPicard

hashtag

hashtag

hashtagPicard

hashtagGATK

hashtagPicard

hashtagbedtools

hashtagbedtools

hashtag

hashtagPicard

hashtagBase Quality Score Recalibration

hashtagGATK

hashtagGATK

hashtagCollapsed BAM Generation

hashtagFgbio

hashtagFgbio

hashtagFgbio

hashtagFgbio

hashtagFgbio

hashtagPicard

hashtagTemplate Inputs File

Installation and Usage

hashtagStep 1: Create a virtual environment.

hashtagOption (A) - if using cwltool

hashtagOption (B) - recommended for Juno HPC cluster

hashtagStep 2: Clone the repository

hashtagStep 3: Install requirements using pip

hashtagStep 4: Check if you have singularity and nodejs for HPC

Requirements

Before Installation of the pipeline, make sure your system supports these requirements

Step 1: Create a virtual environment.

Option (A) - if using cwltool

Option (B) - recommended for Juno HPC cluster

Step 2: Clone the repository

Step 3: Install requirements using pip

Step 4: Check if you have singularity and nodejs for HPC

Step 5: Generate an inputs file

Step 6: Run the workflow

Run the workflow with a given set of input using on single machine

Parameter Used by Tools

Common Parameters Across Tools

Uncollapsed BAM Generation

Fgbio

Picard

Picard

Picard

GATK

Picard

bedtools

bedtools

Picard

Base Quality Score Recalibration

GATK

GATK

Collapsed BAM Generation

Fgbio

Fgbio

Fgbio

Fgbio

Fgbio

Picard

Template Inputs File

Requirements

Before Installation of the pipeline, make sure your system supports these requirements

Parameter Used by Tools

Common Parameters Across Tools

Uncollapsed BAM Generation

Fgbio

Picard

Picard

Picard

GATK

Picard

bedtools

bedtools

Picard

Base Quality Score Recalibration

GATK

GATK

Collapsed BAM Generation

Fgbio

Fgbio

Fgbio

Fgbio

Fgbio

Picard

Template Inputs File

Step 1: Create a virtual environment.

Option (A) - if using cwltool

Option (B) - recommended for Juno HPC cluster

Step 2: Clone the repository

Step 3: Install requirements using pip

Step 4: Check if you have singularity and nodejs for HPC

Step 5: Generate an inputs file

Step 6: Run the workflow

Run the workflow with a given set of input using on single machine

Run the workflow with a given set of input using toil on single machine

Features

Installation

Credits