Only this pageAll pages
Powered by GitBook
1 of 21

CMO Cell-Free DNA Informatics (CCI)

Loading...

The Team

Loading...

Loading...

Loading...

Loading...

The OnBoarding

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

The Formal Stuff

Loading...

Loading...

Applications

Applications/Tools the CCI is responsible for at MSKCC in CMO

Repo locations: https://github.com/msk-access

Framework: https://github.com/mskcc-omics-workflows/modules/

Containers: https://github.com/mskcc-omics-workflows/containers

Application
Location
Description

ACCESS-Pipeline

https://github.com/mskcc/ACCESS-Pipeline

This has workflows for BAM generation based on , , small variant calling, micro-satellite instabilty calling, copy number variant calling & structural variant calling for Version 1 of MSK-ACCESS assay

ACCESS_SV

https://github.com/mskcc/ACCESS_SV

This has the core workflow used for structural variant calling in MSK-ACCESS assay

ADMIE

https://github.com/mskcc/ADMIE

This is the algorithm used for calling MSI status for sample associated with MSK-ACCESS assay

Nucleo

https://github.com/msk-access/nucleo

This is the BAM generation workflow for any assay that deals with Unique Molecular Indexs (UMIs) based on Fgbio

ACCESS Quality Cotrol (For version 1 of the Assay)

https://github.com/msk-access/access_qc_generation

This is the version 2 of the ACCESS QC generation and you can read more about it here

ACCESS data analysis

https://github.com/msk-access/access_data_analysis

This repos helps with downstream data analysis of MSK-ACCESS data, you can read more about it here:

Biometrics

https://github.com/msk-access/biometrics

Python package to calculate various sample contamination metrics.

sequence_qc

https://github.com/msk-access/sequence_qc

Package for doing various ad-hoc quality control steps from MSK-ACCESS generated FASTQ or BAM files

Krewlyzer

https://github.com/msk-access/krewlyzer

Krewlyzer is a high-performance toolkit for extracting biological features from cell-free DNA (cfDNA) sequencing data. Designed for cancer genomics, liquid biopsy research, and clinical bioinformatics

Kreview

https://github.com/msk-access/kreview

kreview is a production-grade, notebook-first (nbdev) evaluation engine designed for high-throughput cancer liquid biopsy fragmentomics feature analysis. Developed at Memorial Sloan Kettering (MSKCC), it processes cohorts containing tens of thousands of samples using an embedded DuckDB query engine with chunked I/O and automatic retry logic.

gbcms

https://github.com/msk-access/gbcms

A high-performance orientation-aware genotype counting system for genomic variants

STRiDE

https://github.com/msk-access/STRiDE

Microsatellite Instability prediction for MSK-ACCESS cfDNA sequencing.MSK-ACCESS MSI calling tool

General

What to do first?

CMO onboarding

The first thing to do once you have joined is to visit https://mskcc.github.io/on-boarding/ and finish of task necessary for compliance & initiate the process of getting access to various systems.

Cluster Guide

To learn more about the cluster and its resources visit the MIRO board

CCI specific onboarding

Guides on Confluence

You need to be on the internal network to access msk-confluence, you can request access to it on The Spot

CMO Cell-Free DNA Informatics (CCI) Wiki

This wiki will help you to get insights into CMO Cell-Free DNA Informatics Team (CCI) at MSKCC

Welcome aboard!

Welcome to CCI wiki! Here you'll find everything you need to know about CCI.

You can read more about the Center for Molecular Oncology (CMO) here:

Analysis

This is a wiki for analysis of MSK-ACCESS data

Access Data Analysis:

https://github.com/msk-access/access_data_analysis

Workflows V1

Workflows associated with version 1 of the Assay

BAM Generation & Quality Control

Overview of the BAM Generation and Quality Control workflow

Github Location -> https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/ACCESS_pipeline.cwl

Tools Used:

  • BWA

  • Trimgalore

Github Location ->

Tools Used:

Github Location ->

Refer to Bioinformatics Pipeline to Detect CNA's section in this paper for details:

Github Location ->

Tool Used:

Github Location ->

Tool Used:

Voyager has all our configurations in the jinja template, it includes all the paths for various files and tools associated with the workflows, all location are on JUNO:

Meet the Team!

Meet some fabulous people who make things happen

👋 Lead Scientist — 💌 shahr2@mskcc.org — 🇺🇸 New York

I am responsible for leading a team of Computational Biologists and Bioinformatics Software Engineers who develop, maintain, and operate bioinformatics pipelines and databases in the . We also perform collaborative research with other labs and clinicians both within MSKCC and in the broader research community. On a daily basis, we analyze blood samples from patients with tissue-based cancers, using their circulating tumor DNA extracted from the blood, thereby avoiding the need for tumor biopsies. More specifically, I lead the team in designing, developing, and implementing software tools to process and analyze high-throughput next-generation sequencing data for liquid biopsy applications ( & Clonal Hematopoiesis Panel). Previously at MSK, I helped develop the workflow for analyzing MSK-IMACT data for both clinical and research implementation, which is still in use. You can read more about my background . Also, here is the link to my

👋 Senior Computational Biologist — 💌 sivaprk@mskcc.org — 🇺🇸 Texas

👋 Computational Biologist — 💌 charalk@mskcc.org — 🇺🇸 New York

I am a Computational Biologist as part of the Center of Molecular Oncology Informatics cfDNA group. I am responsible for the development of new and existing tools, maintaining the cfDNA pipeline, and processing samples for both clinical and research work. Previously, I have worked at the NHS trust Cambridge University Hospitals as a Bioinformatician with primary responsibility for the development of the Hemato-Oncology clinical assay. You can read more about my educational background, work experience, and research interests here and you can access my Linkedin profile.

👋 Bioinformatics Engineer II — 💌 buehlere@mskcc.org — 🇺🇸 New York

Ronak H Shah

Bio

Karthigayini Sivaprakasam

Center for Molecular Oncology
MSK-ACCESS
here
google scholar profile
Ronak H Shah

Carmelina Charalambous

Bio

Eric Buehler

Variant Calling

Small Variants

Copy Number Variant (CNV)

Structural Variant (SV)

Microsatellite Instability Status (MSI)

Configurations

snps_indels: beagle/input_template.json.jinja2 at master · mskcc/beagle (github.com)

CNV: beagle/input_template.json.jinja2 at master · mskcc/beagle (github.com)

Fastq_to_bam: beagle/input_template.json.jinja2 at master · mskcc/beagle (github.com)

MSI: beagle/input_template.json.jinja2 at master · mskcc/beagle (github.com)

SV : beagle/input_template.json.jinja2 at master · mskcc/beagle (github.com)

GATK
Picard Tools
ABRA2
Marianas
https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/subworkflows/snps_and_indels.cwl
VardictJava
MuTect
VCF2MAF
https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/subworkflows/call_cnv.cwl
Dara S. Ross, Ahmet Zehir, Donavan T. Cheng, Ryma Benayed, Khedoudja Nafa, Jaclyn F. Hechtman, Yelena Y. Janjigian, Britta Weigelt, Pedram Razavi, David M. Hyman, José Baselga, Michael F. Berger, Marc Ladanyi, Maria E. Arcila, Next-Generation Assessment of Human Epidermal Growth Factor Receptor 2 (ERBB2) Amplification Status: Clinical Validation in the Context of a Hybrid Capture-Based, Comprehensive Solid Tumor Genomic Profiling Assay, The Journal of Molecular Diagnostics, Volume 19, Issue 2, 2017, Pages 244-254, ISSN 1525-1578, https://doi.org/10.1016/j.jmoldx.2016.09.010.
https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/subworkflows/manta.cwl
Manta
iAnnotateSV
https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/subworkflows/msi.cwl
ADMIE
BAM files used for the workflows

Flagship Projects

(Research)

Developed by scientists in the CMO Technology Innovation Lab and Department of Pathology, this high-sensitivity assay is offered by the CMO to MSK researchers for profiling circulating tumor DNA derived from blood plasma. The inclusion of matched buffy coat DNA enables the identification and elimination of germline variants and mutations associated with clonal hematopoiesis, a significant confounder of most commercial assays. The assay is available for clinical use in the Molecular Diagnostics Service and for research projects in the Integrated Genomics Operations (IGO). CCI supports the data processing and analysis of research projects utilizing MSK-ACCESS in IGO and leads the ongoing development of the MSK-ACCESS pipeline for all applications. The current version of the pipeline is available here: mskcc/ACCESS-Pipeline: cfDNA Sequencing Pipeline with UMI (github.com), and more details about the assay and analysis are described in this paper below as well as here:

Brannon, A. R. et al. Enhanced specificity of clinical high-sensitivity tumor mutation profiling in cell- free DNA via paired normal sequencing using MSK-ACCESS. Nat Commun 12, 3770 (2021).

MSK-ACCESS Version 1 assay overview

CMO-CH

Developed by scientists in the CMO Technology Innovation Lab in collaboration with CCI and Clonal Hematopoiesis (CH) program, Diagnostic Molecular Pathology, Precision Interception, and Prevention Initiative & CCI, this assay utilizes the same barcoding and ultra-deep sequencing technology as MSK-ACCESS to detect CH mutations in white blood cells at high sensitivity. CMO-CH is offered by the CMO to MSK researchers for profiling white blood cell DNA to detect mutations in the most commonly altered CH-associated genes. The assay is run in IGO, and CCI supports the data processing and analysis. You can learn more about it

For both projects, additional analysis packages and development versions of the workflows can be found here:

here
CMO Cell-Free DNA (cfDNA) Informatics Team (github.com)
MSK-ACCESS

Mission and Values

Our Mission

The CMO Cell-Free DNA Informatics (CCI) group’s mission is to develop and apply computational methods to organize, analyze and understand genomic data generated from cfDNA assays such as MSK-ACCESS. The group is responsible for all computational infrastructure needed to deploy, run, and deliver results for research cfDNA assays in a production setting for CMO.

CCI's Mission

Our Values

Be Compassionate

We treat everyone we encounter with compassion, seeing the humanity behind their problems and experiences.

Be Mindful

We do not take advantage of our users' attention and adopt mindful working practices so that we can create safe spaces both in our working environment and in our products themselves.

Research First

We challenge our own and others' assumptions through qualitative and quantitative research. Not sure about an idea? Test it.

MSK-ACCESS V2

Things to know for MSK-ACCESS V1 for Research

What is MSK-ACCESS?

It is a hybrid capture panel designed for Analysis of Circulating cfDNA to Evaluate Somatic Status using the Unique Molecular Index (UMIs) for high sensitivity. MSK-ACCESS is 13% as large, captures 47% of all mutations detected by MSK-IMPACT.

Brannon, A. R. et al. Enhanced specificity of clinical high-sensitivity tumor mutation profiling in cell- free DNA via paired normal sequencing using MSK-ACCESS. Nat Commun 12, 3770 (2021).

Duplex UMIs for Error Correction

MSK-ACCESS v2: Expanding Genomic Profiling Capabilities

Memorial Sloan Kettering Cancer Center is currently validating MSK-ACCESS v2 (2021). Building on the foundation of our original 2017 assay, Version 2 streamlines processing into a single probe pool while significantly expanding our genomic target territory.

Version Comparison: Advancing from v1 to v2

The updated panel increases our gene coverage and bait territory while simplifying the workflow.

To ensure we capture the most clinically relevant genomic data, several key upgrades have been integrated into v2:

  • Expanded Target List: 153 additional OncoKB targets.

  • Refined Regions: Updated hotspots and high TMB (Tumor Mutational Burden) regions.

  • New Tumor Suppressor Genes (TSGs): 5 additional TSGs have been included: CHEK2, ERCC2, PALB2, BAP1, and CDK12.


The MSK-ACCESS v2 panel consists of a carefully curated set of targets designed to provide comprehensive genomic insights. Below is the breakdown of targets, probes, and cumulative territory by category:

Note on Category A Exclusions: The OncoKB targets explicitly exclude Heme and L1 prostate markers (BARD1, BRIP1, CHEK1, RAD51B, RAD51C, RAD51D).

Two probe pools

One probe pool

B

Hotspots (40+)

187

410

154 kb

C

High TMB exons

78

131

157 kb

D

Targetable kinase domains

171

276

180 kb

E

TSGs (30 genes)

569

1204

223 kb

F

Additional requests (AR)

13

39

227 kb

G

Additional in ACCESS-v1

29

55

233 kb

H

Microsatellite regions

171

178

255 kb

I

Fingerprint SNPs

42

42

260 kb

J

Clonal hematopoiesis genes

50

151

278 kb

K

Copy number SNPs

285

285

312 kb

L

Introns for SVs (10 genes)

40

828

411 kb

M

Tiling SNPs

300

300

447 kb

Feature

Version 1 (2017)

Version 2 (2021)

Gene Count

129 genes

~147 genes

Bait Territory

391 kb

447 kb

Cat.

Description

Targets

Probes

Cumulative Territory

A

OncoKB L1, 2, 3, 4, R1, R2*

505

1021

Key Additions in Version 2

Detailed Panel Composition

Probe Pools

123 kb

Workflows V2

Workflows associated with version 2 of the Assay

BAM Generation & Quality Control

Github Location -> https://github.com/msk-access/nucleo

Github Location -> https://github.com/msk-access/nucleo_qc

Tools Used:

  • BWA

  • fastp

  • fgbio

  • Multi-QC

Github Location ->

Tools Used:

Github Location ->

Refer to Bioinformatics Pipeline to Detect CNA's section in this paper for details:

Github Location ->

Tool Used:

Github Location ->

Tool Used:

Voyager has all our configurations in the jinja template, it includes all the paths for various files and tools associated with the workflows, all location are on JUNO:

CMO-CH V1

This wiki explains the CMO-CH V1 assay

CMO-CH is offered by the Center for Molecular Oncology (CMO) to MSK researchers for profiling white blood cell DNA to detect mutations in the most commonly altered (CH) associated genes.

  • 596 targets capturing 58% of CH and 90.4% of CH-PD mutations identified in the latest CH dataset from 40K patients

  • Total size = 1,143 probes (0.14 Mb)

Full gene coverage for TP53, TET2, ASXL1, DNMT3A, PPM1D, CHEK2, ASXL1, ATM, SF3B1, SRSF2, U2AF1, and U2AF2•Additional targets with hotspot positions from IMPACT heme assay

  • SNP tiling around TP53, CBL, MPL, JAK2, EZH2, TET2, RUNX1, and ATM (+/-10kb) to identify allelic-imbalances

  • 40 fingerprint SNPs that are shared with all other NGS assays (IMPACT, ACCESS, WES etc.) to detect sample mismatches

  • What is CMO-CH assay?

    What is the panel design?

    clonal hematopoiesis

    Variant Calling

    Small Variants

    Copy Number Variant (CNV)

    Structural Variant (SV)

    Microsatellite Instability Status (MSI)

    Configurations

    snps_indels: beagle/input_template.json.jinja2 at master · mskcc/beagle (github.com)

    CNV: beagle/input_template.json.jinja2 at master · mskcc/beagle (github.com)

    Fastq_to_bam: beagle/input_template.json.jinja2 at master · mskcc/beagle (github.com)

    MSI: beagle/input_template.json.jinja2 at master · mskcc/beagle (github.com)

    SV : beagle/input_template.json.jinja2 at master · mskcc/beagle (github.com)

    GATK
    Picard Tools
    ABRA2
    https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/subworkflows/snps_and_indels.cwl
    VardictJava
    MuTect
    VCF2MAF
    https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/subworkflows/call_cnv.cwl
    Dara S. Ross, Ahmet Zehir, Donavan T. Cheng, Ryma Benayed, Khedoudja Nafa, Jaclyn F. Hechtman, Yelena Y. Janjigian, Britta Weigelt, Pedram Razavi, David M. Hyman, José Baselga, Michael F. Berger, Marc Ladanyi, Maria E. Arcila, Next-Generation Assessment of Human Epidermal Growth Factor Receptor 2 (ERBB2) Amplification Status: Clinical Validation in the Context of a Hybrid Capture-Based, Comprehensive Solid Tumor Genomic Profiling Assay, The Journal of Molecular Diagnostics, Volume 19, Issue 2, 2017, Pages 244-254, ISSN 1525-1578, https://doi.org/10.1016/j.jmoldx.2016.09.010.
    https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/subworkflows/manta.cwl
    Manta
    iAnnotateSV
    https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/subworkflows/msi.cwl
    ADMIE
    BAM files used for the workflows

    Computational Biologist

    Please visit this page once you have done things necessary here:

    • The best place to start is to learn more about MSK-ACCESS and for that please read the paper:

      • Brannon, A. R. et al. Enhanced specificity of clinical high-sensitivity tumor mutation profiling in cell- free DNA via paired normal sequencing using MSK-ACCESS. Nat Commun 12, 3770 (2021).

    • Learn more about the current collapsing method Marianas

    • Learn more about the new collapsing () method using

    • Learn more about the

    • Understand the updated version for the above using these

    • Learn about scripts that help with downstream analysis

    • Learn about for viewing BAM files to distinguish real variants from artifacts

    Assay/Team
    Link to the tool
    Assay/Purpose/Team/Tool
    URL
    Analysis Type
    JUNO Location
    Analysis Type
    JUNO Location
    Analysis Type
    JUNO Location
    Resource Type
    JUNO Location
    • admie - Files used for microsatellite instability detection tool ADMIE for MSK-ACCESS

    • cosmic - VCF file of cosmic used in MSK-ACCESS workflows

    • dbSNP

    Collaborations

    This will help you to find correct people to connect with from the subgroups.

    CCI works closely with CMO project managers to track the progress of cfDNA projects submitted to CMO by MSK researchers. CMO project managers support CCI by facilitating and coordinating project initiation, sample collection, and metadata collection.

    Quality Control for ACCESS V1

    Downstream analysis of ACCESS Data

    Fingerpriting using Biometrics

    High Performance Computing

    Nucleo (Fgbio)

    Quality Control for ACCESS V2

    Structural Variant (SV)

    /data1/core006/access/production/data/structural_variants/{cmo_patient_id}/{cmo_sample_id}/current/

    Copy Number Variants (CNV)

    /data1/core006/access/production/data/copy_number_variants/{cmo_patient_id}/{cmo_sample_id}/current/

    Copy Number Variants (CNV)

    /data1/core006/accessH/production/data/copy_number_variants/{cmo_patient_id}/{cmo_sample_id}/current/

    CMO-CH

    /data1/core006/cch/resources

    Berger Lab

    /data1/bergerm1

    - VCF file of dbSNP used in MSK-ACCESS workflows
  • exac - VCF file of ExAC used in MSK-ACCESS workflows

  • mills-and-1000g - VCF file of mills-and-1000g used in MSK-ACCESS

  • reference - reference genome file used in MSK-ACCESS workflows

  • tools - general packages used in MSK-ACCESS workflows

  • msk-access - Data-specific resources for MSK-ACCESS workflows. This includes the following:

    • hiseq4000_curated_duplex_bams_dmp - curated DMP duplex BAMS from HiSeq 4000

    • novaseq_curated_simplex_bams_dmp - curated DMP simplex BAM from NovaSeq.

    • hiseq4000_curated_simplex_bams_dmp - curated DMP simplex BAM from HiSeq 4000

    • novaseq_curated_standard_bams_dmp - curated DMP standard BAM from NovaSeq

    • hiseq4000_curated_standard_bams_dmp - curated DMP standard BAM from HiSeq 4000

    • novaseq_curated_unfiltered_bams_dmp - curated DMP unfiltered BAM from NovaSeq

    • hiseq4000_curated_unfiltered_bams_dmp - curated DMP unfiltered BAM from HiSeq 4000

    • novaseq_unmatched_normal_plasma_duplex_bams_dmp - DMP unmatched normal plasma duplex BAM from NovaSeq

    • hiseq4000_unmatched_normal_plasma_duplex_bams_dmp - DMP unmatched normal plasma duplex BAM from HiSeq 4000

    • novaseq_unmatched_normal_plasma_standard_bams_dmp - DMP unmatched normal plasma standard BAM from NovaSeq

    • hiseq4000_unmatched_normal_plasma_standard_bams_dmp - DMP unmatched normal plasma standard BAM from HiSeq 4000

    • novaseq_curated_duplex_bams_dmp - curated DMP duplex BAMS from NovaSeq

    • regions_of_interest - Different interval files describing regions of interest for MSK-ACCESS assay

  • MSK-ACCESS

    Link to AirTable

    CMO-CH

    Link to AirTable

    CMO Cell-Free Informatics (CCI)

    MSK-ACCESS V1 (Marianas)

    https://github.com/mskcc/ACCESS-Pipeline

    CCI organization on Github

    https://github.com/msk-access

    cBioPortal DMP data

    BAM

    /data1/core006/access/production/data/bams/{cmo_patient_id}/{cmo_sample_id}/current/

    Small Variant (SNV’s/INDEL’s)

    /data1/core006/access/production/data/small_variants/{cmo_patient_id}/{cmo_sample_id}/current/

    Microsatellite Instability(MSI)

    BAM

    /data1/core006/accessH/production/data/bams/{cmo_patient_id}/{cmo_sample_id}/current/

    Small Variant (SNV’s/INDEL’s)

    /data1/core006/accessH/production/data/small_variants/{cmo_patient_id}/{cmo_sample_id}/current/

    Structural Variant (SV)

    BAM

    /data1/core006/cch/production/data/bams/{cmo_patient_id}/{cmo_sample_id}/current/

    Small Variant (SNV’s/INDEL’s)

    /data1/core006/cch/production/data/small_variants/{cmo_patient_id}/{cmo_sample_id}/current/

    NYS validation data

    /data1/core006/access/production/runs/NYS_validation/current

    CMO-ACCESS-Heme

    /data1/core006/accessH/resources

    CMO-ACCESS Resources

    Below are resources that would be handy for you to learn more about all the tools described in the paper.

    Project Management

    Functional Resources

    Path's to know

    CMO-ACCESS

    Data on IRIS for CMO-ACCESS samples

    CMO-ACCESS-Heme

    Data on IRIS for CMO-ACCESS-Heme samples

    CMO-CH

    Data on IRIS for CMO-CH samples

    Resources

    Details of CMO ACCESS Resources

    If we can justify adding data/tools to the above-mentioned location please contact Ronak Shah.

    Nucleo
    Fgbio
    Quality Control V1
    Quality Control V2
    ACCESS Data analysis
    IGV
    https://mskcc.github.io/on-boarding/

    - Request access once you have your msk email id

    InternalLink:

    /data1/core006/access/production/data/microsatellite_instability/{cmo_patient_id}/{cmo_sample_id}/current/

    /data1/core006/accessH/production/data/structural_variants/{cmo_patient_id}/{cmo_sample_id}/current/

    /work/access/production/resources/

    Jennifer Milbank

    CCI works with to make sure that the data generated for the cfDNA assays is of the highest standards.

    • Neeman Mohibullah

    • Ruchi Patel

    • Marisa Dunigan

    CCI works closely with CMO Software Engineering (CSE) team within CI to support the development of Voyager & Hermes. The CSE/CAS team supports the development, integration, processing of CCI’s workflows implemented in Voyager.

    • Christopher Allan Bolipata

    • Sinisa Ivkovic

    • Nikhil Kumar

    • Timothy Song

    • Stephen Kelly

    • Suleman Vural

    CCI collaborates with scientists in various CMO groups to improve the existing workflows and to analyze the data in a consistent manner.

    • Kanika Arora

    • Shalabh Suman

    • Chaitanya Bandlamundi

    • Brian Loomis

    • David Brown

    • Mark Donoghue

    • Yixiao Gong

    CCI works closely with the ClinBx group in the Molecular Diagnostics Service to deploy core workflows and maintain consistency among analyses performed on research and clinical cfDNA samples. We work together to improve the workflows in sync and to learn from one another about how to identify artifacts and interpret the data. CCI is also working with the ClinBx Software group to port an instance of the mPATH system used by the ClinBx team to sign-out cases for research.

    Rose Brannon

    Aijazuddin Syed

    Ryan Ptashkin

    Anoop Balakrishnan

    CCI works with HPC to request resources and support w.r.t JUNO cluster and virtual machines that enable various aspects of our goals.

    Neeraj Girija

    Lohit Valleru

    Slack Channel

    hpc-request@cbio.mskcc.org

    In all the groups there are multiple amazing individuals involved, but listing just a few

    Confluence knowledgebase across collabrators

    CMO Project Managers

    People to connect

    Kirsten Fuller
    Casey Savin

    Integrated Genomics Operations (IGO)

    People to connect

    CMO Informatics (CI)

    People to connect

    CMO Software Engineers (CSE)

    CMO Analysis Systems (CAS)

    Berger Lab/Technology Innovation Lab/CMO Computational Science (CCS)

    People to connect

    Berger Lab

    Technology Innovation Lab

    CMO Computational Sciences (CCS)

    Clinical Bioinformatics (ClinBx)

    People to know

    High Performance Computing (HPC)

    Glossary

    Terms you might encounter

    -> information associated with a sample. This is a current work-in-progress written in Java/Neo4J that is supposed to be the one source of truth for metadata associated with samples. Currently, these responsibilities are managed by Beagle.

    LIMS -> Laboratory information management system -- when sequencing is complete, metadata and file information on the sequences is first input into this system.

    -> triggers and monitors workflows. There's also a part of it that tracks files and metadata associated with those files which may be splintered out in the future. A request typically comes from LIMS which begins the workflow process.

    -> a nicer interface for Beagle (sample workflow tracking) and soon the MDB (make updates to metadata)

    -> An HTTP API for Toil which Beagle is reliant on for interacting with workflows.

    -> A workflow engine that interfaces well with LSF.

    Voyager -> The suite of applications built by the Voyager team, including Ridgeback, Beagle, and Hermes.

    -> Distributed high-performance computing that's managed in-house. Most data processing and files are housed here.

    -> IBM's Platform Computing (Load Sharing Facility) tool for scheduling workflows on HPC. It has a specific to managing workloads.

    -> a YAML-like language for defining workflows.

    -> Reactive workflow framework and a programming that eases the writing of data-intensive computational pipelines.

    -> Sequencing of the Deoxyribonucleic acid ()

    -> Sequencing of the Ribonucleic acid ()

    -> Cell-Free DNA assay for patients with solid tumors

    CMO-CH -> Assay for profiling clonal hematopoiesis mutations from blood

    -> Assay for profiling patients tissue DNA for solid tumors

    -> Assay for profiling patients blood DNA for Heme malignancies

    -> Allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing

    -> Whole Exome Sequencing

    -> Whole Genome Sequencing

    -> Whole Transcriptome Sequencing

    -> one of the companies that provides sequencing machine

    -> Type of sequencer from Illumina

    -> Type of sequencer from Illumina

    -> Type of sequencer from Illumina

    -> Type of sequencer from Illumina

    -> a company that sells instruments for long-read sequencing based on

    -> a company that sells instruments for long-read sequencing based on technology

    MSK-ACCESS V1

    Things to know for MSK-ACCESS V1 for Research

    It is a hybrid capture panel designed for Analysis of Circulating to Evaluate Somatic Status using the Unique Molecular Index (UMIs) for high sensitivity. MSK-ACCESS is 13% as large, captures 47% of all mutations detected by MSK-IMPACT.

    Selected exons of 129 genes for mutation detection

    SMILE
    Beagle
    Hermes
    Ridgeback
    Toil
    HPC
    LSF
    suite of commands
    Common workflow language
    Nextflow
    DSL
    DNA-SEQ
    DNA
    RNA-SEQ
    RNA
    MSK-ACCESS
    MSK-IMPACT
    MSK-IMPACT Heme
    Facets
    WES
    WGS
    WTS
    Illumina
    Novaseq
    Nextseq
    Miseq
    Hiseq
    PacBio
    zero-mode waveguide
    Oxford Nanopore
    Nanopore

    Filing Expenses

    To get reimbursed for your expenses, just fill in the expense report online.

    You need to be on the MSK network to access these resources

    OncoKB Level 1-4
  • Hotspot sites

  • High rates of mutations

  • Protein kinase domains

  • Tumor suppressor genes

  • Microsatellite regions

    SNPs of zygosity & copy number of 12 genes

    Common SNPs for genome-wide copy number

    Introns for structural variants of 10 genes

    Clonal hematopoiesis genes

    • Matched cfDNA-WBC (”tumor-normal”) assay to detect somatic alterations

    • Sensitivity for mutation calling depends on ‘duplex’ collapsed coverage

    • Different sensitivities for different classes of alterations

      • Genotyping +++++

      • De novo mutations, indels ++++

      • MSI +++

      • Rearrangements +++

      • Copy number ++

      • Tumor mutation burden ○

    What is MSK-ACCESS?

    What is the panel design?

    cfDNA
    Brannon, A. R. et al. Enhanced specificity of clinical high-sensitivity tumor mutation profiling in cell- free DNA via paired normal sequencing using MSK-ACCESS. Nat Commun 12, 3770 (2021).
    Duplex UMIs for Error Correction

    What are the Design Implications?

    + -> sensitivity for that event type

    ○ -> cannot be calculated

    David Mcmanamon
    https://clickup.com/
    https://github.mskcc.org/knowledgesystems/dmp-2022
    https://cmo-ci.gitbook.io/cmo-access-data-analysis/
    https://cmo-ci.gitbook.io/biometrics/
    https://github.mskcc.org/pages/hpc/userdocs/
    https://github.com/msk-access/nucleo
    https://cmo-ci.gitbook.io/access-quality-control-v2/

    Software Engineers

    • The best place to start would be to understand the data flow:

      • LIMS/SMILE -> Voyager -> JUNO

    • Learn more about the codebases listed below that form Voyager

  • Learn about the codebases the form mPATH

  • Resource
    URL

    ACCESS/Voyager team uses this to manage sprints/stories. In addition, ACCESS uses this to keep track of the statuses of samples.

    Auto Track Sample Status

    Internal Gitlab where MPath and other software is hosted.

    Application
    URL

    LIMS

    For Authentication ask your Manager:

    Beagle API

    • http://silo:4001 (Staging)

    • http://voyager:5001 (Production)

    Ridgeback API

    Projects, who to speak with, slack channel

    • MPath & CVR (Aijazuddin Syed/Anoop Balakrishnan Rema)

    • Voyager Projects [Beagle/Seqosystem/Ridgeback] (Sinisa Ivkovic/Nikhil Kumar/Allan Bolipata)

      • #voyager

    • LIMS ()

    • MDB (Angelica Ochoa/Benjamin Gross/Allan Bolipata)

      • #metadb-informatics

    • Toil/CWL (Nikhil Kumar)

    • ACCESS (Ronak Shah)

      • #msk-access

    • ACCESS Servers ()

    Please visit this page once you have done things necessary here: https://mskcc.github.io/on-boarding/

    Sites

    Code/Project Management

    Applications

    People

    (forked from mpath repositories)

    Code specific to ACCESS team pipelines.

    Code for other CMO/MSK teams

    Information on HPC/LSF

    Beagle

    Hermes

    Ridgeback

    • http://silo:4003 (Staging)

    • http://voyager:5003 (Production)

    Flower (Celery Task UI) for Voyager, for Production only

    http://voyager:4001

    ELK for Voyager

    MPath (for Clinical)

    MPath (for Research)

    http://access01:7331/api/ui/

    CVR (to be replaced by MPath)

    cvr.mskcc.org:8083/

    David Mcmanamon
    HPC Request/Neeraj Paramasivam
    clickup.com
    jira.mskcc.org:8090
    https://igolims.mskcc.org:8443/LimsRest/swagger-ui.html
    http://crux.mskcc.org:8929/mpath
    http://crux.mskcc.org:8929/access
    github.com/mskcc-access
    github.com/mskcc
    https://mskcchpc.org/
    https://github.com/mskcc/beagle
    https://github.com/mskcc/hermes
    https://github.com/mskcc/ridgeback
    bic-dockerapp01.mskcc.org:5601/
    https://mpath.mskcc.org/

    Requesting Time Off

    To request time off, just fill in things at MSK TIME, and also please email your manager of the same.

    You need to be on the MSK network to access these resources

    Access Quality Control (v1)
    Quality control generation
    Marianas
    sequence_qc
    MSK-ACCESS QC generation V2
    Biometrics
    https://cmo.mskcc.orgcmo.mskcc.org
    Internal CMO Website
    https://mskconfluence.mskcc.org/x/9zZ2Bmskconfluence.mskcc.org
    Link to CCI Knowledgebase for internal process
    https://mskconfluence.mskcc.org/x/vzV2Bmskconfluence.mskcc.org
    Link to Knowledgebase for information across groups
    Marie-Josée and Henry R. Kravis Center for Molecular Oncology | Memorial Sloan Kettering Cancer Centerwww.mskcc.org
    External CMO Website
    Logo