Sample mix-up
Last updated
Was this helpful?
Last updated
Was this helpful?
Theoretical Method
The sample mix-up heatmap is used to identify any potential mispaired samples within the run. The analysis makes use of the >300 ‘fingerprint’ single nucleotide polymorphisms (SNPs) that are distributed throughout the genome. These SNPs included the 31 SNPs that are in Target Pool A and >250 SNPs located in the tiling probes in Target Pool B. Pairwise comparisons of these SNP sites are done against all samples in the run. Sites, where both samples are homozygous are identified and percent discordance is calculated using the formula below:
where homozygous mismatches are sites that are homozygous in both Reference and Query but do not match each other.
If there are <10 common homozygous sites, the discordance rate can not be calculated since this is a strong indication that coverage is too low and the samples failed other QC.
Any samples with a discordance rate of 5% or higher are considered mismatches.
These calculations were done using All Unique (unfiltered) bams. Allele counts are measured from waltz pileups from Pool A and B
Technical Methods
Tool Used:
Waltz PileupMetrics
fingerprinting.py
Input
output_dir : Directory to write the Output files to
waltz_dir_A: Directory with waltz pileup files for target set A
waltz_dir_B: Directory with waltz pileup files for target set B
waltz_dir_A_duplex: Directory with waltz pileup files for Duplex target set A
waltz_dir_B_duplex: Directory with waltz pileup files for Duplex target set B
fp_config: File with information about the SNPs for analysis (MSK-ACCESS-v1_0-TilingaAndFpSNPs.txt)
title_file: Title File for the run
Output
GenoMatrix.pdf
Geno_compare.txt (All pair-wise genotyping comparison results for the samples in the run, along with their status)
Interpretations
Dark blue indicates a match. Samples from the same patients are expected to match.