Sample mix-up

Theoretical Method

The sample mix-up heatmap is used to identify any potential mispaired samples within the run. The analysis makes use of the >300 ‘fingerprint’ single nucleotide polymorphisms (SNPs) that are distributed throughout the genome. These SNPs included the 31 SNPs that are in Target Pool A and >250 SNPs located in the tiling probes in Target Pool B. Pairwise comparisons of these SNP sites are done against all samples in the run. Sites, where both samples are homozygous are identified and percent discordance is calculated using the formula below:

DiscordanceRate=Numberofhomozygousmismatches/NumberofSNPsiteshomozygousinReferenceDiscordance Rate= Number of homozygous mismatches / Number of SNP sites homozygous in Reference

where homozygous mismatches are sites that are homozygous in both Reference and Query but do not match each other.

If there are <10 common homozygous sites, the discordance rate can not be calculated since this is a strong indication that coverage is too low and the samples failed other QC.

Any samples with a discordance rate of 5% or higher are considered mismatches.

These calculations were done using All Unique (unfiltered) bams. Allele counts are measured from waltz pileups from Pool A and B

Technical Methods

  • Tool Used:

    • Waltz PileupMetrics

    • fingerprinting.py

  • Input

    • output_dir : Directory to write the Output files to

    • waltz_dir_A: Directory with waltz pileup files for target set A

    • waltz_dir_B: Directory with waltz pileup files for target set B

    • waltz_dir_A_duplex: Directory with waltz pileup files for Duplex target set A

    • waltz_dir_B_duplex: Directory with waltz pileup files for Duplex target set B

    • fp_config: File with information about the SNPs for analysis (MSK-ACCESS-v1_0-TilingaAndFpSNPs.txt)

    • title_file: Title File for the run

  • Output

    • GenoMatrix.pdf

    • Geno_compare.txt (All pair-wise genotyping comparison results for the samples in the run, along with their status)

Interpretations

Dark blue indicates a match. Samples from the same patients are expected to match.

Last updated