Fingerprinting

Detecting sample swaps.

Introduction

This section contains a table showing the samples clustered into groups, where each row in the table corresponds to one sample. The table will show whether your samples are grouping together in unexpected ways, which would indicate sample mislabelling.

Methods

Tool used: biometrics BAM type: Collapsed BAM Regions: MSK-ACCESS-v1_0-curatedSNPs.vcf

It is a two step process to produce the table: (1) extract SNP genotypes from each sample using biometrics extract command and (2) perform a pairwise comparison of all samples to determine sample relatedness using the biometrics genotype command. Please see the biometrics documentation for further documentation on the methods.

Interpretation

Below is a description of all the columns.

Column Name

Description

sample_name

The sample name.

expected_sample_group

The expected group for the sample based on user input.

predicted_sample_group

The predicted group for the sample based on the clustering results.

cluster_index

The integer cluster index. All rows with the same cluster_index are in the same cluster.

cluster_size

The size of the cluster this sample is in.

avg_discordance

The average discordance between this sample and all other samples in the cluster.

count_expected_matches

The count of expected matches when comparing the sample to all others in the cluster.

count_unexpected_matches

The count of unexpected matches when comparing the sample to all others in the cluster.

count_expected_mismatches

The count of expected mismatches when comparing the sample to all other samples (inside and outside its cluster).

count_unexpected_mismatches

The count of unexpected mismatches when comparing the sample to all other samples (inside and outside its cluster).

Last updated