Access Quality Control (v1)
  • Introduction
  • Meta information per sample
  • Raw read-pair counts (standard BAM)
  • On Target Coverage
  • Fraction of reads mapping to the human genome
  • “On Bait” reads localized to ACCESS panel
  • Coverage vs GC content
  • Insert Size Distribution
  • Distribution of ACCESS panel A coverage values
  • Average Coverage, Sample Level, Pool A Targets
  • UMI Family types Composition (Pool A)
  • Average Coverage, Sample Level, Pool B Targets
  • UMI Family types Composition (Pool B)
  • Base Quality Recalibration Scores
  • UMI family sizes (Simplex reads)
  • UMI family sizes (Duplex reads)
  • Sample Level Noise
  • Noise by Substitution Type
  • Contributing Sites for Noise
  • Hotspots In Normals
  • Sample mix-up
  • (Un)expected (Mis)matches Tables
  • Major Contamination
  • Minor Contamination
  • Duplex Minor Contamination
  • Sex Mismatch
  • FAQ
Powered by GitBook
On this page

Was this helpful?

Export as PDF

Sample mix-up

PreviousHotspots In NormalsNext(Un)expected (Mis)matches Tables

Last updated 4 years ago

Was this helpful?

Theoretical Method

The sample mix-up heatmap is used to identify any potential mispaired samples within the run. The analysis makes use of the >300 ‘fingerprint’ single nucleotide polymorphisms (SNPs) that are distributed throughout the genome. These SNPs included the 31 SNPs that are in Target Pool A and >250 SNPs located in the tiling probes in Target Pool B. Pairwise comparisons of these SNP sites are done against all samples in the run. Sites, where both samples are homozygous are identified and percent discordance is calculated using the formula below:

DiscordanceRate=Numberofhomozygousmismatches/NumberofSNPsiteshomozygousinReferenceDiscordance Rate= Number of homozygous mismatches / Number of SNP sites homozygous in ReferenceDiscordanceRate=Numberofhomozygousmismatches/NumberofSNPsiteshomozygousinReference

where homozygous mismatches are sites that are homozygous in both Reference and Query but do not match each other.

If there are <10 common homozygous sites, the discordance rate can not be calculated since this is a strong indication that coverage is too low and the samples failed other QC.

Any samples with a discordance rate of 5% or higher are considered mismatches.

These calculations were done using All Unique (unfiltered) bams. Allele counts are measured from waltz pileups from Pool A and B

Technical Methods

  • Tool Used:

    • Waltz PileupMetrics

    • fingerprinting.py

  • Input

    • output_dir : Directory to write the Output files to

    • waltz_dir_A: Directory with waltz pileup files for target set A

    • waltz_dir_B: Directory with waltz pileup files for target set B

    • waltz_dir_A_duplex: Directory with waltz pileup files for Duplex target set A

    • waltz_dir_B_duplex: Directory with waltz pileup files for Duplex target set B

    • fp_config: File with information about the SNPs for analysis (MSK-ACCESS-v1_0-TilingaAndFpSNPs.txt)

    • title_file: Title File for the run

  • Output

    • GenoMatrix.pdf

    • Geno_compare.txt (All pair-wise genotyping comparison results for the samples in the run, along with their status)

Interpretations

Dark blue indicates a match. Samples from the same patients are expected to match.

Heatmap comparing discordance rate across all the samples in a given batch