Step 2 -- filtering
The second step takes all the genotypes generated from the first step and organized into a patient level variants table with VAFs and call status for each variant of each sample.
Each call is subjected to:
Read depth filter (hotspot vs non-hotspot)
Systematic artifact filter
Germline filters
If any normal exist -- (buffy coat and DMP normal) 2:1 rule
If not -- exac freq < 0.01% and VAF < 30%
CH tag
Default options can be found here
filter_calls.R
doesGenerate a reference of systematic artifacts -- any call with occurrence in more than or equal to 2 donor samples (occurrence defined as more than or equal to 2 duplex reads)
We suggest that you filter out anything with duplex_support_num >= 2
Read in sample sheets -- reference for downstream analysis
Generate a preliminary patient level variants table
Read in and merging in hotspots, DMP signed out calls and occurrence in donor samples
Call status annotation
All call passing read depth/genotype filter annotated as 'Called' or 'Genotyped'
Any call not satisfying germline filters are overwritten with 'Not Called'
Calls with zero coverage in plasma sample also annotated as 'Not Covered'
Write out table
Hugo_Symbol
Start_position
Variant_Classification
Other variant descriptions
...
C-xxxxxx-L001-d___duplex.called
C-xxxxxx-L001-d___duplex.total
C-xxxxxx-L002-d___duplex.called
C-xxxxxx-L001-d___duplex.total
C-xxxxxx-N001-d___unfilterednormal
P-xxxxxxx-T01-IM6___DMP_Tumor
P-xxxxxxx-T01-IM6___DMP_Normal
KRAS
xxxxxx
Missense Mutation
...
...
Called
15/1500(0.01)
Not Called
0/1800(0)
0/200(0)
200/800(0.25)
1/700(0.001)