Nextflow Workflow
This guide covers running py-gbcms as a Nextflow workflow for processing multiple samples in parallel, particularly on HPC clusters.
Overview
The Nextflow workflow provides:
Automatic parallelization across samples
SLURM/HPC integration with resource management
Containerization with Docker/Singularity
Resume capability for failed runs
Reproducible pipelines
Prerequisites
Nextflow >= 21.10.3
One of:
Docker (for local)
Singularity (for HPC)
Install Nextflow:
curl -s https://get.nextflow.io | bash
mv nextflow ~/bin/ # or any directory in your PATHQuick Start
1. Prepare Samplesheet
Create a CSV file with your samples:
Or with per-sample suffix (for multiple BAM types):
Notes:
baicolumn is optional - will auto-discover<bam>.baiif not providedsuffixcolumn is optional - per-row suffix overrides global--suffixparameterBAI files must exist or workflow will fail early with clear error
2. Run the Workflow
Local with Docker:
SLURM cluster with Singularity:
Parameters
Required
--input
Path to samplesheet CSV
--variants
Path to VCF/MAF variants file
--fasta
Reference FASTA (with .fai index)
Output Options
--outdir
results
Output directory
--format
vcf
Output format (vcf or maf)
--suffix
''
Suffix for output filenames
Filtering Options
--min_mapq
20
Minimum mapping quality
--min_baseq
0
Minimum base quality
--filter_duplicates
true
Filter duplicate reads
--filter_secondary
false
Filter secondary alignments
--filter_supplementary
false
Filter supplementary alignments
--filter_qc_failed
false
Filter QC failed reads
--filter_improper_pair
false
Filter improperly paired reads
--filter_indel
false
Filter reads with indels
Resource Limits
--max_cpus
16
Maximum CPUs per job
--max_memory
128.GB
Maximum memory per job
--max_time
240.h
Maximum runtime per job
Execution Profiles
Docker (Local)
Uses Docker containers
Best for local development
Requires Docker installed
Singularity (HPC)
Uses Singularity images
Best for HPC without SLURM
Requires Singularity installed
SLURM (HPC Cluster)
Submits jobs to SLURM
Uses Singularity containers
Queue:
cmobic_cpu(customizable)
Customizing for Your Cluster
Edit nextflow/nextflow.config to customize the SLURM profile:
Common customizations:
Output Structure
Results are organized in ${outdir}/:
Advanced Usage
Resume Failed Runs
Nextflow caches completed tasks. Resume from where it failed:
Custom Suffix
Add suffix to output filenames:
MAF Output
Generate MAF instead of VCF:
Strict Filtering
Enable all filters for high-quality genotyping:
Monitoring
View Running Jobs
Check Progress
Nextflow prints real-time progress:
Execution Report
After completion, view the HTML report:
Troubleshooting
Job Failed with Error
Check the work directory in error message:
Out of Memory
Increase memory in config:
Wrong Queue
Update queue name in nextflow/nextflow.config:
Missing Container
Pull the container manually:
Comparison with CLI
Multiple samples
Sequential
Parallel
Resource management
Manual
Automatic
Retry failed jobs
Manual
Automatic
HPC integration
Manual scripts
Built-in
Resume capability
No
Yes
When to use CLI instead: See Usage Patterns
Next Steps
See Usage Patterns for comparison with CLI usage
See
nextflow/README.mdfor additional workflow documentation
Last updated