Sample Level Noise

Minimizing noise is important for the accuracy of post-collapsing results

Theoretical Method

Noise is calculated in the following manner:

total_depthi=n in A,C,G,Tcount(n) at position igenotypei=max{count(A),count(C),count(G),count(T)} at position ialt_counti=n in A,C,G,Tcount(n) o.w.0 if n = genotypeinoise=100jalt_countjjtotal_depthjwhere j=positions for which alt_countjntotal_depthj<threshold for n in {A,C,G,T}\begin{aligned} &total\_depth_{i} = \sum_{n\ in\ {A, C, G, T}}count(n)\ at\ position\ i\\ \\ &genotype_{i} = max\{count(A), count(C), count(G), count(T)\}\ at\ position\ i\\ \\ &alt\_count_i = \sum_{n\ in\ {A,C,G,T}}{^{0\ if\ n\ =\ genotype_i}_{count(n)\ o.w.}}\\ \\ &noise = 100 \cdot \frac{\sum_j{alt\_count_j}}{\sum_j{total\_depth_j}}\\ \\ &where\ j = positions\ for\ which\ \frac{alt\_count^n_j}{total\_depth_j} < threshold\ for\ n\ in\ {\{A,C,G,T\}}\\ \end{aligned}\\

Our current threshold for this calculation is set to 2%. Therefore it should be noted that there may be certain noisy positions which are wrongfully excluded, and other sites with low-level true mutations which are wrongfully included in the calculation.

In addition, inserted bases will be included in this calculation, but neither deletions, nor masked bases (N) are considered as alt alleles, nor are they counted towards the total depth.

Note: Duplex bams are used for this calculation, and positions are only taken from the Pool A target regions.

Technical Methods

  • Tool Used:

    • Marianas

    • Waltz PileupMetrics

    • calculate_noise.sh

  • Input

    • sample_id-duplex-pileup.txt (for duplex noise calculation)

    • MSK-ACCESS-v1_0-A-good-positions.txt (Pool A bed file with MSI regions removed)

  • Output

    • noise.txt

Interpretations

Noise level can be influenced by a number of factors, including sequencing depth (and therefore coverage), duplex family saturation, and tumor content. We normally see the noise level for Duplex bams in the Pool A regions to be less than .001% (when using a 2% threshold for positions that should be included in the calculation). This threshold is indicated by the yellow dotted line in the graph. Noise higher than this value might be an indicator of a sample processing issue.

Last updated