Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The CMO Cell-Free DNA Informatics (CCI) group’s mission is to develop and apply computational methods to organize, analyze and understand genomic data generated from cfDNA assays such as MSK-ACCESS. The group is responsible for all computational infrastructure needed to deploy, run, and deliver results for research cfDNA assays in a production setting for CMO.
We treat everyone we encounter with compassion, seeing the humanity behind their problems and experiences.
We do not take advantage of our users' attention and adopt mindful working practices so that we can create safe spaces both in our working environment and in our products themselves.
We challenge our own and others' assumptions through qualitative and quantitative research. Not sure about an idea? Test it.
What to do first?
The first thing to do once you have joined is to visit https://mskcc.github.io/on-boarding/ and finish of task necessary for compliance & initiate the process of getting access to various systems.
To learn more about the cluster and its resources visit the MIRO board
You need to be on the internal network to access msk-confluence, you can request access to it on The Spot
Developed by scientists in the CMO Technology Innovation Lab and Department of Pathology, this high-sensitivity assay is offered by the CMO to MSK researchers for profiling circulating tumor DNA derived from blood plasma. The inclusion of matched buffy coat DNA enables the identification and elimination of germline variants and mutations associated with clonal hematopoiesis, a significant confounder of most commercial assays. The assay is available for clinical use in the Molecular Diagnostics Service and for research projects in the Integrated Genomics Operations (IGO). CCI supports the data processing and analysis of research projects utilizing MSK-ACCESS in IGO and leads the ongoing development of the MSK-ACCESS pipeline for all applications. The current version of the pipeline is available here: mskcc/ACCESS-Pipeline: cfDNA Sequencing Pipeline with UMI (github.com), and more details about the assay and analysis are described in this paper below as well as here:
Developed by scientists in the CMO Technology Innovation Lab in collaboration with CCI and Clonal Hematopoiesis (CH) program, Diagnostic Molecular Pathology, Precision Interception, and Prevention Initiative & CCI, this assay utilizes the same barcoding and ultra-deep sequencing technology as MSK-ACCESS to detect CH mutations in white blood cells at high sensitivity. CMO-CH is offered by the CMO to MSK researchers for profiling white blood cell DNA to detect mutations in the most commonly altered CH-associated genes. The assay is run in IGO, and CCI supports the data processing and analysis. You can learn more about it here
For both projects, additional analysis packages and development versions of the workflows can be found here: CMO Cell-Free DNA (cfDNA) Informatics Team (github.com)
This wiki explains the CMO-CH V1 assay
CMO-CH is offered by the Center for Molecular Oncology (CMO) to MSK researchers for profiling white blood cell DNA to detect mutations in the most commonly altered clonal hematopoiesis (CH) associated genes.
596 targets capturing 58% of CH and 90.4% of CH-PD mutations identified in the latest CH dataset from 40K patients
Total size = 1,143 probes (0.14 Mb)
Full gene coverage for TP53, TET2, ASXL1, DNMT3A, PPM1D, CHEK2, ASXL1, ATM, SF3B1, SRSF2, U2AF1, and U2AF2•Additional targets with hotspot positions from IMPACT heme assay
SNP tiling around TP53, CBL, MPL, JAK2, EZH2, TET2, RUNX1, and ATM (+/-10kb) to identify allelic-imbalances
40 fingerprint SNPs that are shared with all other NGS assays (IMPACT, ACCESS, WES etc.) to detect sample mismatches
Applications/Tools the CCI is responsible for at MSKCC in CMO
Workflows associated with version 1 of the Assay
Github Location -> https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/ACCESS_pipeline.cwl
Tools Used:
Github Location -> https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/subworkflows/snps_and_indels.cwl
Tools Used:
Github Location -> https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/subworkflows/call_cnv.cwl
Refer to Bioinformatics Pipeline to Detect CNA's section in this paper for details:
Github Location -> https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/subworkflows/manta.cwl
Tool Used:
Github Location -> https://github.com/mskcc/ACCESS-Pipeline/blob/master/workflows/subworkflows/msi.cwl
Tool Used:
Voyager has all our configurations in the jinja template, it includes all the paths for various files and tools associated with the workflows, all location are on JUNO:
Meet some fabulous people who make things happen
👋 Lead Scientist — 💌 shahr2@mskcc.org — 🇺🇸 New York
I am responsible for leading a team of Computational Biologists and Bioinformatics Software Engineers who develop, maintain, and operate bioinformatics pipelines and databases in the Center for Molecular Oncology. We also perform collaborative research with other labs and clinicians both within MSKCC and in the broader research community. On a daily basis, we analyze blood samples from patients with tissue-based cancers, using patients’ circulating tumor DNA extracted from blood, avoiding the need for tumor biopsies. More specifically, I lead the team in designing, developing, and implementing software tools for processing and analyzing high throughput, next-generation sequencing data, specifically for liquid biopsy applications (MSK-ACCESS & Clonal Hematopoiesis Panel). Previously in my time as MSK, I helped develop the workflow to analyze MSK-IMACT data for both clinical and research implementation, which is still being used. You can read more about my background here, also here is the link to my google scholar profile
👋 Senior Computational Biologist — 💌 sivaprk@mskcc.org — 🇺🇸 Texas
Good to know: Encourage employees to write a succinct bio that can help new hires learn about them and how they like to work.
👋 Computational Biologist — 💌 charalk@mskcc.org — 🇺🇸 New York
I am a Computational Biologist as part of the Center of Molecular Oncology Informatics cfDNA group. I am responsible for the development of new and existing tools, maintaining the cfDNA pipeline, and processing samples for both clinical and research work. Previously, I have worked at the NHS trust Cambridge University Hospitals as a Bioinformatician with primary responsibility for the development of the Hemato-Oncology clinical assay. You can read more about my educational background, work experience, and research interests here and you can access my Linkedin profile.
👋 Bioinformatics Engineer II — 💌 buehlere@mskcc.org — 🇺🇸 New York
Good to know: Encourage employees to write a succinct bio that can help new hires learn about them and how they like to work.
👋 Bioinformatics Engineer IV — 💌 vanna1@mskcc.org — 🇺🇸 New York
Good to know: Encourage employees to write a succinct bio that can help new hires learn about them and how they like to work.
Things to know for MSK-ACCESS V1 for Research
It is a hybrid capture panel designed for Analysis of Circulating cfDNA to Evaluate Somatic Status using the Unique Molecular Index (UMIs) for high sensitivity. MSK-ACCESS is 13% as large, captures 47% of all mutations detected by MSK-IMPACT.
Selected exons of 129 genes for mutation detection
OncoKB Level 1-4
High rates of mutations
SNPs of zygosity & copy number of 12 genes
Common SNPs for genome-wide copy number
Introns for structural variants of 10 genes
Clonal hematopoiesis genes
Matched cfDNA-WBC (”tumor-normal”) assay to detect somatic alterations
Sensitivity for mutation calling depends on ‘duplex’ collapsed coverage
Different sensitivities for different classes of alterations
Genotyping +++++
De novo mutations, indels ++++
MSI +++
Rearrangements +++
Copy number ++
Tumor mutation burden ○
+ -> sensitivity for that event type
○ -> cannot be calculated
ACCESS-Pipeline
This has workflows for BAM generation based on , , small variant calling, micro-satellite instabilty calling, copy number variant calling & structural variant calling for Version 1 of MSK-ACCESS assay
ACCESS_SV
This has the core workflow used for structural variant calling in MSK-ACCESS assay
ADMIE
This is the algorithm used for calling MSI status for sample associated with MSK-ACCESS assay
Nucleo
This is the BAM generation workflow for any assay that deals with Unique Molecular Indexs (UMIs) based on
ACCESS Quality Cotrol (For version 1 of the Assay)
This is the version 2 of the ACCESS QC generation and you can read more about it here
ACCESS data analysis
This repos helps with downstream data analysis of MSK-ACCESS data, you can read more about it here:
Biometrics
Python package to calculate various sample contamination metrics.
sequence_qc
Package for doing various ad-hoc quality control steps from MSK-ACCESS generated FASTQ or BAM files
This is a wiki for analysis of MSK-ACCESS data
This pages
To request time off, just fill in things at MSK TIME, and also please email your manager of the same.
You need to be on the MSK network to access these resources
This will help you to find correct people to connect with from the subgroups.
In all the groups there are multiple amazing individuals involved, but listing just a few
CCI works closely with CMO project managers to track the progress of cfDNA projects submitted to CMO by MSK researchers. CMO project managers support CCI by facilitating and coordinating project initiation, sample collection, and metadata collection.
CCI works with to make sure that the data generated for the cfDNA assays is of the highest standards.
CCI works closely with CMO Software Engineering (CSE) team within CI to support the development of Voyager & Hermes. The CSE/CAS team supports the development, integration, processing of CCI’s workflows implemented in Voyager.
CCI collaborates with scientists in various CMO groups to improve the existing workflows and to analyze the data in a consistent manner.
CCI works closely with the ClinBx group in the Molecular Diagnostics Service to deploy core workflows and maintain consistency among analyses performed on research and clinical cfDNA samples. We work together to improve the workflows in sync and to learn from one another about how to identify artifacts and interpret the data. CCI is also working with the ClinBx Software group to port an instance of the mPATH system used by the ClinBx team to sign-out cases for research.
CCI works with HPC to request resources and support w.r.t JUNO cluster and virtual machines that enable various aspects of our goals.
Slack Channel
hpc-request@cbio.mskcc.org
Please visit this page once you have done things necessary here: https://mskcc.github.io/on-boarding/
The best place to start would be to understand the data flow:
LIMS/SMILE -> Voyager -> JUNO
Learn more about the codebases listed below that form Voyager
Learn about the codebases the form mPATH
ACCESS/Voyager team uses this to manage sprints/stories. In addition, ACCESS uses this to keep track of the statuses of samples.
Auto Track Sample Status
Internal Gitlab where MPath and other software is hosted.
Code specific to ACCESS team pipelines.
Code for other CMO/MSK teams
Information on HPC/LSF
Beagle
Hermes
Ridgeback
LIMS
Beagle API
http://silo:4001 (Staging)
http://voyager:5001 (Production)
Ridgeback API
http://silo:4003 (Staging)
http://voyager:5003 (Production)
Flower (Celery Task UI) for Voyager, for Production only
http://voyager:4001
ELK for Voyager
MPath (for Clinical)
MPath (for Research)
http://access01:7331/api/ui/
CVR (to be replaced by MPath)
cvr.mskcc.org:8083/
Projects, who to speak with, slack channel
MPath & CVR (Aijazuddin Syed/Anoop Balakrishnan Rema)
Voyager Projects [Beagle/Seqosystem/Ridgeback] (Sinisa Ivkovic/Nikhil Kumar/Allan Bolipata)
#voyager
LIMS (David Mcmanamon)
MDB (Angelica Ochoa/Benjamin Gross/Allan Bolipata)
#metadb-informatics
Toil/CWL (Nikhil Kumar)
ACCESS (Ronak Shah)
#msk-access
ACCESS Servers (HPC Request/Neeraj Paramasivam)
Please visit this page once you have done things necessary here: https://mskcc.github.io/on-boarding/
The best place to start is to learn more about MSK-ACCESS and for that please read the paper:
Learn more about the current collapsing method Marianas
Learn more about the Quality Control V1
Understand the updated version for the above using these Quality Control V2
Learn about ACCESS Data analysis scripts that help with downstream analysis
Learn about IGV for viewing BAM files to distinguish real variants from artifacts
Below are resources that would be handy for you to learn more about all the tools described in the paper.
MSK-ACCESS
CMO-CH
CMO Cell-Free Informatics (CCI)
MSK-ACCESS V1 (Marianas)
CCI organization on Github
cBioPortal DMP data
Quality Control for ACCESS V1
Downstream analysis of ACCESS Data
Fingerpriting using Biometrics
High Performance Computing
Nucleo (Fgbio)
Quality Control for ACCESS V2
BAM
/juno/work/access/production/data/bams/{cmo_patient_id}/{cmo_sample_id}/current/
Small Variant (SNV’s/INDEL’s)
/juno/work/access/production/data/small_variants/{cmo_patient_id}/{cmo_sample_id}/current/
Microsatellite Instability(MSI)
/juno/work/access/production/data/microsatellite_instability/{cmo_patient_id}/{cmo_sample_id}/current/
Structural Variant (SV)
/juno/work/access/production/data/structural_variants/{cmo_patient_id}/{cmo_sample_id}/current/
Copy Number Variants (CNV)
/juno/work/access/production/data/copy_number_variants/{cmo_patient_id}/{cmo_sample_id}/current/
NYS validation data
/work/access/production/runs/NYS_validation/current
CMO-ACCESS
/work/access/production/
/work/access/production/resources/
CMO-CH
/work/ch/
Berger Lab
/work/bergerm1/bergerlab
admie
- Files used for microsatellite instability detection tool ADMIE for MSK-ACCESS
cosmic
- VCF file of cosmic used in MSK-ACCESS workflows
dbSNP
- VCF file of dbSNP used in MSK-ACCESS workflows
exac
- VCF file of ExAC used in MSK-ACCESS workflows
mills-and-1000g
- VCF file of mills-and-1000g used in MSK-ACCESS
reference
- reference genome file used in MSK-ACCESS workflows
tools
- general packages used in MSK-ACCESS workflows
msk-access
- Data-specific resources for MSK-ACCESS workflows. This includes the following:
hiseq4000_curated_duplex_bams_dmp
- curated DMP duplex BAMS from HiSeq 4000
novaseq_curated_simplex_bams_dmp
- curated DMP simplex BAM from NovaSeq.
hiseq4000_curated_simplex_bams_dmp
- curated DMP simplex BAM from HiSeq 4000
novaseq_curated_standard_bams_dmp
- curated DMP standard BAM from NovaSeq
hiseq4000_curated_standard_bams_dmp
- curated DMP standard BAM from HiSeq 4000
novaseq_curated_unfiltered_bams_dmp
- curated DMP unfiltered BAM from NovaSeq
hiseq4000_curated_unfiltered_bams_dmp
- curated DMP unfiltered BAM from HiSeq 4000
novaseq_unmatched_normal_plasma_duplex_bams_dmp
- DMP unmatched normal plasma duplex BAM from NovaSeq
hiseq4000_unmatched_normal_plasma_duplex_bams_dmp
- DMP unmatched normal plasma duplex BAM from HiSeq 4000
novaseq_unmatched_normal_plasma_standard_bams_dmp
- DMP unmatched normal plasma standard BAM from NovaSeq
hiseq4000_unmatched_normal_plasma_standard_bams_dmp
- DMP unmatched normal plasma standard BAM from HiSeq 4000
novaseq_curated_duplex_bams_dmp
- curated DMP duplex BAMS from NovaSeq
regions_of_interest
- Different interval files describing regions of interest for MSK-ACCESS assay
If we can justify adding data/tools to the above-mentioned location please contact Ronak Shah.
Terms you might encounter
(forked from mpath repositories)
For Authentication ask your Manager:
- Request access once you have your msk email id
InternalLink:
To get reimbursed for your expenses, just fill in the expense report online.
You need to be on the MSK network to access these resources