CMO ACCESS Data Analysis
  • Home
  • Github codebase
  • Setup
    • Installation
    • Setup for Running Analysis
    • Resources
    • CNA Result Processing
  • Analysis
    • Overview of Analysis Workflow
    • Compile Reads
    • Filter Calls
    • SV Incorporation
    • CNA Processing
    • Create Patient Report
    • Intermediate File Organization
  • VAF Overview Plot Script
  • Swimmer Plot Scripts
  • Run create_report.R
  • Miscellaneous Utility Scripts
    • Convert CSV to MAF
    • Get cBioPortal Variants
    • Convert dates to days
  • Manifest Update Script
Powered by GitBook
On this page
Export as PDF
  1. Analysis

Filter Calls

Step 2 -- filtering

PreviousCompile ReadsNextSV Incorporation

Last updated 4 years ago

Was this helpful?

CtrlK
  • Usage
  • Default
  • What filter_calls.R does
  • For each patient
  • Example of the patient level table:

Was this helpful?

The second step takes all the genotypes generated from the first step and organized into a patient level variants table with VAFs and call status for each variant of each sample.

Each call is subjected to:

  1. Read depth filter (hotspot vs non-hotspot)

  2. Systematic artifact filter

  3. Germline filters

    1. If any normal exist -- (buffy coat and DMP normal) 2:1 rule

    2. If not -- exac freq < 0.01% and VAF < 30%

  4. CH tag

Usage

Rscript R/filter_calls.R -h                                         
usage: R/filter_calls.R [-h] [-m MASTERREF] [-o RESULTSDIR] [-dmpk DMPKEYPATH]
                        [-ch CHLIST] [-c CRITERIA]

optional arguments:
  -h, --help            show this help message and exit
  -m MASTERREF, --masterref MASTERREF
                        File path to master reference file
  -o RESULTSDIR, --resultsdir RESULTSDIR
                        Output directory
  -ch CHLIST, --chlist CHLIST
                        List of signed out CH calls [default]
  -c CRITERIA, --criteria CRITERIA
                        Calling criteria [default]

Default

Default options can be found here

What filter_calls.R does

Generate a reference of systematic artifacts -- any call with occurrence in more than or equal to 2 donor samples (occurrence defined as more than or equal to 2 duplex reads)

We suggest that you filter out anything with duplex_support_num >= 2

For each patient

  1. Read in sample sheets -- reference for downstream analysis

  2. Generate a preliminary patient level variants table

  3. Read in and merging in hotspots, DMP signed out calls and occurrence in donor samples

  4. Call status annotation

    1. All call passing read depth/genotype filter annotated as

    2. Any call not satisfying germline filters are with 'Not Called'

      1. Calls with zero coverage in plasma sample also annotated as 'Not Covered'

  5. Final processing

    1. duplex and simplex read counts

  6. Write out table

Example of the patient level table:

Hugo_Symbol

Start_position

Variant_Classification

Other variant descriptions

...

C-xxxxxx-L001-d___duplex.called

C-xxxxxx-L001-d___duplex.total

C-xxxxxx-L002-d___duplex.called

C-xxxxxx-L001-d___duplex.total

C-xxxxxx-N001-d___unfilterednormal

P-xxxxxxx-T01-IM6___DMP_Tumor

P-xxxxxxx-T01-IM6___DMP_Normal

KRAS

xxxxxx

Missense Mutation

...

...

Called

15/1500(0.01)

Not Called

0/1800(0)

0/200(0)

200/800(0.25)

1/700(0.001)

'Called' or 'Genotyped'
overwritten
Combining
CH tags