Major Contamination

Theoretical Method

Major contamination plot is a bar plot of the fraction of heterozygous positions per sample and is done to see if a patient’s sample is contaminated with DNA from an unrelated individual. This analysis also done using the ‘fingerprint’ SNPs in the panel. A SNP is considered heterozygous if the minor allele fraction is > 0.1.

The fraction of heterozygous positions in the sample is found using the formula below:

Fractionheterozygouspositions=(NumberofHeterozygousSites)/(TotalNumberofFingerprintSNPs)Fraction heterozygous positions=(Number of Heterozygous Sites)/(Total Number of Fingerprint SNPs)

These calculations were done using All Unique (unfiltered) bams. Allele counts are measured from waltz pileups from Pool A and B

Technical Methods

  • Tool Used:

    • Waltz PileupMetrics

    • fingerprinting.py

  • Input

    • output_dir : Directory to write the Output files to

    • waltz_dir_A: Directory with waltz pileup files for target set A

    • waltz_dir_B: Directory with waltz pileup files for target set B

    • waltz_dir_A_duplex: Directory with waltz pileup files for Duplex target set A

    • waltz_dir_B_duplex: Directory with waltz pileup files for Duplex target set B

    • fp_config: File with information about the SNPs for analysis (MSK-ACCESS-v1_0-TilingaAndFpSNPs.txt)

    • title_file: Title File for the run

  • Output

    • FPResults/majorContamination.txt

    • MajorContaminationRate.pdf

Interpretations

The fraction of heterozygous positions should be around 0.5. If the fraction is greater than 0.6, it is is considered to have major contamination.

Last updated

Was this helpful?