1 of 19

CMO Cell-Free DNA Informatics (CCI)

CMO Cell-Free DNA Informatics (CCI) Wiki

This wiki will help you to get insights into CMO Cell-Free DNA Informatics Team (CCI) at MSKCC

Welcome aboard!

Welcome to CCI wiki! Here you'll find everything you need to know about CCI.

You can read more about the Center for Molecular Oncology (CMO) here:

The Team

Mission and Values

Our Mission

The CMO Cell-Free DNA Informatics (CCI) group’s mission is to develop and apply computational methods to organize, analyze and understand genomic data generated from cfDNA assays such as MSK-ACCESS. The group is responsible for all computational infrastructure needed to deploy, run, and deliver results for research cfDNA assays in a production setting for CMO.

Our Values

Be Compassionate

We treat everyone we encounter with compassion, seeing the humanity behind their problems and experiences.

Be Mindful

We do not take advantage of our users' attention and adopt mindful working practices so that we can create safe spaces both in our working environment and in our products themselves.

Research First

We challenge our own and others' assumptions through qualitative and quantitative research. Not sure about an idea? Test it.

Meet the Team!

Meet some fabulous people who make things happen

Ronak H Shah

👋 Lead Scientist — 💌 shahr2@mskcc.org — 🇺🇸 New York

Bio

I am responsible for leading a team of Computational Biologists and Bioinformatics Software Engineers who develop, maintain, and operate bioinformatics pipelines and databases in the . We also perform collaborative research with other labs and clinicians both within MSKCC and in the broader research community. On a daily basis, we analyze blood samples from patients with tissue-based cancers, using patients’ circulating tumor DNA extracted from blood, avoiding the need for tumor biopsies. More specifically, I lead the team in designing, developing, and implementing software tools for processing and analyzing high throughput, next-generation sequencing data, specifically for liquid biopsy applications ( & Clonal Hematopoiesis Panel). Previously in my time as MSK, I helped develop the workflow to analyze data for both clinical and research implementation, which is still being used. You can read more about my background , also here is the link to my

Karthigayini Sivaprakasam

👋 Senior Computational Biologist — 💌 sivaprk@mskcc.org — 🇺🇸 Texas

Bio

Good to know: Encourage employees to write a succinct bio that can help new hires learn about them and how they like to work.

Carmelina Charalambous

👋 Computational Biologist — 💌 charalk@mskcc.org — 🇺🇸 New York

Bio

I am a Computational Biologist as part of the Center of Molecular Oncology Informatics cfDNA group. I am responsible for the development of new and existing tools, maintaining the cfDNA pipeline, and processing samples for both clinical and research work. Previously, I have worked at the NHS trust Cambridge University Hospitals as a Bioinformatician with primary responsibility for the development of the Hemato-Oncology clinical assay. You can read more about my educational background, work experience, and research interests and you can access my Linkedin .

Eric Buehler

👋 Bioinformatics Engineer II — 💌 buehlere@mskcc.org — 🇺🇸 New York

Bio

Good to know: Encourage employees to write a succinct bio that can help new hires learn about them and how they like to work.

Alyssa Vann

👋 Bioinformatics Engineer IV — 💌 vanna1@mskcc.org — 🇺🇸 New York

Bio

Good to know: Encourage employees to write a succinct bio that can help new hires learn about them and how they like to work.

Flagship Projects

MSK-ACCESS (Research)

Developed by scientists in the CMO Technology Innovation Lab and Department of Pathology, this high-sensitivity assay is offered by the CMO to MSK researchers for profiling circulating tumor DNA derived from blood plasma. The inclusion of matched buffy coat DNA enables the identification and elimination of germline variants and mutations associated with clonal hematopoiesis, a significant confounder of most commercial assays. The assay is available for clinical use in the Molecular Diagnostics Service and for research projects in the Integrated Genomics Operations (IGO). CCI supports the data processing and analysis of research projects utilizing MSK-ACCESS in IGO and leads the ongoing development of the MSK-ACCESS pipeline for all applications. The current version of the pipeline is available here: mskcc/ACCESS-Pipeline: cfDNA Sequencing Pipeline with UMI (github.com), and more details about the assay and analysis are described in this paper below as well as here:

CMO-CH

Developed by scientists in the CMO Technology Innovation Lab in collaboration with CCI and Clonal Hematopoiesis (CH) program, Diagnostic Molecular Pathology, Precision Interception, and Prevention Initiative & CCI, this assay utilizes the same barcoding and ultra-deep sequencing technology as MSK-ACCESS to detect CH mutations in white blood cells at high sensitivity. CMO-CH is offered by the CMO to MSK researchers for profiling white blood cell DNA to detect mutations in the most commonly altered CH-associated genes. The assay is run in IGO, and CCI supports the data processing and analysis. You can learn more about it

For both projects, additional analysis packages and development versions of the workflows can be found here:

Applications

Applications/Tools the CCI is responsible for at MSKCC in CMO

Application

Location

Description

ACCESS-Pipeline

This has workflows for BAM generation based on , , small variant calling, micro-satellite instabilty calling, copy number variant calling & structural variant calling for Version 1 of MSK-ACCESS assay

ACCESS_SV

This has the core workflow used for structural variant calling in MSK-ACCESS assay

The OnBoarding

General

What to do first?

CMO onboarding

The first thing to do once you have joined is to visit https://mskcc.github.io/on-boarding/ and finish of task necessary for compliance & initiate the process of getting access to various systems.

Cluster Guide

To learn more about the cluster and its resources visit the

CCI specific onboarding

Guides on Confluence

You need to be on the internal network to access msk-confluence, you can request access to it on The Spot

Computational Biologist

Please visit this page once you have done things necessary here: https://mskcc.github.io/on-boarding/

The best place to start is to learn more about MSK-ACCESS and for that please read the paper:
Learn more about the current collapsing method
Learn more about the new collapsing () method using
Learn more about the
Understand the updated version for the above using these
Learn about scripts that help with downstream analysis
Learn about for viewing BAM files to distinguish real variants from artifacts

Below are resources that would be handy for you to learn more about all the tools described in the paper.

Project Management

Assay/Team

Link to the tool

Functional Resources

Assay/Purpose/Team/Tool

URL

Path's to know

CMO-ACCESS

Data on JUNO for CMO-ACCESS samples

Analysis Type

JUNO Location

Resources

Resource Type

JUNO Location

Details of CMO ACCESS Resources

admie - Files used for microsatellite instability detection tool ADMIE for MSK-ACCESS
cosmic - VCF file of cosmic used in MSK-ACCESS workflows
dbSNP

If we can justify adding data/tools to the above-mentioned location please contact .

MSK-ACCESS V1

Things to know for MSK-ACCESS V1 for Research

What is MSK-ACCESS?

It is a hybrid capture panel designed for Analysis of Circulating cfDNA to Evaluate Somatic Status using the Unique Molecular Index (UMIs) for high sensitivity. MSK-ACCESS is 13% as large, captures 47% of all mutations detected by MSK-IMPACT.

Brannon, A. R. et al. Enhanced specificity of clinical high-sensitivity tumor mutation profiling in cell- free DNA via paired normal sequencing using MSK-ACCESS. Nat Commun 12, 3770 (2021).

What is the panel design?

Selected exons of 129 genes for mutation detection

Level 1-4
High rates of mutations

SNPs of & of 12 genes

Common for

Introns for of 10 genes

genes

What are the Design Implications?

Matched cfDNA-WBC (”tumor-normal”) assay to detect somatic alterations
Sensitivity for mutation calling depends on ‘duplex’ collapsed coverage
Different sensitivities for different classes of alterations

+ -> sensitivity for that event type

○ -> cannot be calculated

Workflows V1

Workflows associated with version 1 of the Assay

BAM Generation & Quality Control

Github Location -> https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/ACCESS_pipeline.cwl

Tools Used:

Variant Calling

Small Variants

Github Location ->

Tools Used:

Copy Number Variant (CNV)

Github Location ->

Refer to Bioinformatics Pipeline to Detect CNA's section in this paper for details:

Structural Variant (SV)

Github Location ->

Tool Used:

Microsatellite Instability Status (MSI)

Github Location ->

Tool Used:

Configurations

Voyager has all our configurations in the jinja template, it includes all the paths for various files and tools associated with the workflows, all location are on JUNO:

snps_indels:

CNV:

Fastq_to_bam:

MSI:

SV :

CMO-CH V1

This wiki explains the CMO-CH V1 assay

What is CMO-CH assay?

CMO-CH is offered by the Center for Molecular Oncology (CMO) to MSK researchers for profiling white blood cell DNA to detect mutations in the most commonly altered clonal hematopoiesis (CH) associated genes.

What is the panel design?

596 targets capturing 58% of CH and 90.4% of CH-PD mutations identified in the latest CH dataset from 40K patients
Total size = 1,143 probes (0.14 Mb)
Full gene coverage for TP53, TET2, ASXL1, DNMT3A, PPM1D, CHEK2, ASXL1, ATM, SF3B1, SRSF2, U2AF1, and U2AF2•Additional targets with hotspot positions from IMPACT heme assay

Analysis

This is a wiki for analysis of MSK-ACCESS data

This pages

Software Engineers

Please visit this page once you have done things necessary here: https://mskcc.github.io/on-boarding/

The best place to start would be to understand the data flow:
- LIMS/SMILE -> Voyager -> JUNO
Learn more about the codebases listed below that form Voyager
Learn about the codebases the form mPATH

Sites

Code/Project Management

Resource

URL

Applications

Application

URL

People

Projects, who to speak with, slack channel

MPath & CVR ()
Voyager Projects [Beagle/Seqosystem/Ridgeback] ()
- #voyager

Collaborations

This will help you to find correct people to connect with from the subgroups.

In all the groups there are multiple amazing individuals involved, but listing just a few

Confluence knowledgebase across collabrators

CMO Project Managers

CCI works closely with CMO project managers to track the progress of cfDNA projects submitted to CMO by MSK researchers. CMO project managers support CCI by facilitating and coordinating project initiation, sample collection, and metadata collection.

People to connect

Integrated Genomics Operations (IGO)

CCI works with to make sure that the data generated for the cfDNA assays is of the highest standards.

People to connect

CMO Informatics (CI)

CCI works closely with CMO Software Engineering (CSE) team within CI to support the development of Voyager & Hermes. The CSE/CAS team supports the development, integration, processing of CCI’s workflows implemented in Voyager.

People to connect

CMO Software Engineers (CSE)

CMO Analysis Systems (CAS)

Berger Lab/Technology Innovation Lab/CMO Computational Science (CCS)

CCI collaborates with scientists in various CMO groups to improve the existing workflows and to analyze the data in a consistent manner.

People to connect

Berger Lab

Technology Innovation Lab

CMO Computational Sciences (CCS)

Clinical Bioinformatics (ClinBx)

CCI works closely with the ClinBx group in the Molecular Diagnostics Service to deploy core workflows and maintain consistency among analyses performed on research and clinical cfDNA samples. We work together to improve the workflows in sync and to learn from one another about how to identify artifacts and interpret the data. CCI is also working with the ClinBx Software group to port an instance of the mPATH system used by the ClinBx team to sign-out cases for research.

People to know

High Performance Computing (HPC)

CCI works with HPC to request resources and support w.r.t JUNO cluster and virtual machines that enable various aspects of our goals.

Slack Channel

hpc-request@cbio.mskcc.org

Glossary

Terms you might encounter

SMILE -> information associated with a sample. This is a current work-in-progress written in Java/Neo4J that is supposed to be the one source of truth for metadata associated with samples. Currently, these responsibilities are managed by Beagle.

LIMS -> Laboratory information management system -- when sequencing is complete, metadata and file information on the sequences is first input into this system.

-> triggers and monitors workflows. There's also a part of it that tracks files and metadata associated with those files which may be splintered out in the future. A request typically comes from LIMS which begins the workflow process.

-> a nicer interface for Beagle (sample workflow tracking) and soon the MDB (make updates to metadata)

-> An HTTP API for Toil which Beagle is reliant on for interacting with workflows.

-> A workflow engine that interfaces well with LSF.

Voyager -> The suite of applications built by the Voyager team, including Ridgeback, Beagle, and Hermes.

-> Distributed high-performance computing that's managed in-house. Most data processing and files are housed here.

-> a YAML-like language for defining workflows.

-> Reactive workflow framework and a programming that eases the writing of data-intensive computational pipelines.

-> Sequencing of the Deoxyribonucleic acid ()

-> Sequencing of the Ribonucleic acid ()

-> Cell-Free DNA assay for patients with solid tumors

CMO-CH -> Assay for profiling clonal hematopoiesis mutations from blood

-> Assay for profiling patients tissue DNA for solid tumors

-> Assay for profiling patients blood DNA for Heme malignancies

-> Allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing

-> Whole Exome Sequencing

-> Whole Genome Sequencing

-> Whole Transcriptome Sequencing

-> one of the companies that provides sequencing machine

-> Type of sequencer from Illumina

-> a company that sells instruments for long-read sequencing based on

-> a company that sells instruments for long-read sequencing based on technology

The Formal Stuff

Requesting Time Off

To request time off, just fill in things at , and also please email your manager of the same.

You need to be on the MSK network to access these resources

Filing Expenses

To get reimbursed for your expenses, just fill in the expense report .

You need to be on the MSK network to access these resources

CMO Cell-Free DNA Informatics (CCI)

CMO Cell-Free DNA Informatics (CCI) Wiki

hashtagWelcome aboard!

The Team

Mission and Values

hashtagOur Mission

hashtagOur Values

hashtagBe Compassionate

hashtagBe Mindful

hashtagResearch First

Meet the Team!

hashtagRonak H Shah

hashtagBio

hashtagKarthigayini Sivaprakasam

hashtagBio

hashtagCarmelina Charalambous

hashtag

hashtagBio

hashtagEric Buehler

hashtagBio

hashtagAlyssa Vann

hashtagBio

Flagship Projects

hashtagMSK-ACCESSarrow-up-right (Research)

hashtagCMO-CH

Applications

The OnBoarding

General

hashtagCMO onboarding

hashtagCluster Guide

hashtagCCI specific onboarding

hashtagGuides on Confluence

Computational Biologist

hashtagProject Management

hashtagFunctional Resources

hashtagPath's to know

hashtagCMO-ACCESS

hashtagData on JUNO for CMO-ACCESS samples

hashtagResources

hashtagDetails of CMO ACCESS Resources

MSK-ACCESS V1

hashtagWhat is MSK-ACCESS?

hashtagWhat is the panel design?

hashtagWhat are the Design Implications?

Workflows V1

hashtagBAM Generation & Quality Control

hashtagVariant Calling

hashtagSmall Variants

hashtagCopy Number Variant (CNV)

hashtagStructural Variant (SV)

hashtagMicrosatellite Instability Status (MSI)

hashtagConfigurations

hashtagsnps_indels:

hashtagCNV:

hashtagFastq_to_bam:

hashtagMSI:

hashtagSV :

CMO-CH V1

hashtagWhat is CMO-CH assay?

hashtagWhat is the panel design?

Analysis

Software Engineers

hashtagSites

hashtagCode/Project Management

hashtagApplications

hashtagPeople

Collaborations

hashtagConfluence knowledgebase across collabrators

hashtagCMO Project Managers

hashtagPeople to connect

hashtagIntegrated Genomics Operations (IGO)

hashtagPeople to connect

hashtagCMO Informatics (CI)

hashtagPeople to connect

hashtagCMO Software Engineers (CSE)

hashtagCMO Analysis Systems (CAS)

hashtagBerger Lab/Technology Innovation Lab/CMO Computational Science (CCS)

hashtagPeople to connect

hashtagBerger Lab

hashtagTechnology Innovation Lab

Welcome aboard!

Our Mission

Our Values

Be Compassionate

Be Mindful

Research First

Ronak H Shah

Bio

Karthigayini Sivaprakasam

Bio

Carmelina Charalambous

Bio

Eric Buehler

Bio

Alyssa Vann

Bio

MSK-ACCESS (Research)

CMO-CH

CMO onboarding

Cluster Guide

CCI specific onboarding

Guides on Confluence

Project Management

Functional Resources

Path's to know

CMO-ACCESS

Data on JUNO for CMO-ACCESS samples

Resources

Details of CMO ACCESS Resources

What is MSK-ACCESS?

What is the panel design?

What are the Design Implications?

BAM Generation & Quality Control

Variant Calling

Small Variants

Copy Number Variant (CNV)

Structural Variant (SV)

Microsatellite Instability Status (MSI)

Configurations

snps_indels:

CNV:

Fastq_to_bam:

MSI:

SV :

What is CMO-CH assay?

What is the panel design?

Sites

Code/Project Management

Applications

People

Confluence knowledgebase across collabrators

CMO Project Managers

People to connect

Integrated Genomics Operations (IGO)

People to connect

CMO Informatics (CI)

People to connect

CMO Software Engineers (CSE)

CMO Analysis Systems (CAS)

Berger Lab/Technology Innovation Lab/CMO Computational Science (CCS)

People to connect

Berger Lab

Technology Innovation Lab

CMO Computational Sciences (CCS)

Clinical Bioinformatics (ClinBx)

People to know

High Performance Computing (HPC)

SMILE -> information associated with a sample. This is a current work-in-progress written in Java/Neo4J that is supposed to be the one source of truth for metadata associated with samples. Currently, these responsibilities are managed by Beagle.

LIMS -> Laboratory information management system -- when sequencing is complete, metadata and file information on the sequences is first input into this system.

-> triggers and monitors workflows. There's also a part of it that tracks files and metadata associated with those files which may be splintered out in the future. A request typically comes from LIMS which begins the workflow process.

-> a nicer interface for Beagle (sample workflow tracking) and soon the MDB (make updates to metadata)

-> An HTTP API for Toil which Beagle is reliant on for interacting with workflows.

-> A workflow engine that interfaces well with LSF.

Voyager -> The suite of applications built by the Voyager team, including Ridgeback, Beagle, and Hermes.

-> Distributed high-performance computing that's managed in-house. Most data processing and files are housed here.

-> IBM's Platform Computing (Load Sharing Facility) tool for scheduling workflows on HPC. It has a specific to managing workloads.

-> a YAML-like language for defining workflows.

-> Reactive workflow framework and a programming that eases the writing of data-intensive computational pipelines.

-> Sequencing of the Deoxyribonucleic acid ()

-> Sequencing of the Ribonucleic acid ()