Major Contamination

Theoretical Method

Major contamination plot is a bar plot of the fraction of heterozygous positions per sample and is done to see if a patient’s sample is contaminated with DNA from an unrelated individual. This analysis also done using the ‘fingerprint’ SNPs in the panel. A SNP is considered heterozygous if the minor allele fraction is > 0.1.

The fraction of heterozygous positions in the sample is found using the formula below:

These calculations were done using All Unique (unfiltered) bams. Allele counts are measured from waltz pileups from Pool A and B

Technical Methods

  • Tool Used:

    • Waltz PileupMetrics

    • fingerprinting.py

  • Input

    • output_dir : Directory to write the Output files to

    • waltz_dir_A: Directory with waltz pileup files for target set A

    • waltz_dir_B: Directory with waltz pileup files for target set B

    • waltz_dir_A_duplex: Directory with waltz pileup files for Duplex target set A

    • waltz_dir_B_duplex: Directory with waltz pileup files for Duplex target set B

    • fp_config: File with information about the SNPs for analysis (MSK-ACCESS-v1_0-TilingaAndFpSNPs.txt)

    • title_file: Title File for the run

  • Output

    • FPResults/majorContamination.txt

    • MajorContaminationRate.pdf

Interpretations

The fraction of heterozygous positions should be around 0.5. If the fraction is greater than 0.6, it is is considered to have major contamination.

Last updated