CMO ACCESS Data Analysis
  • Home
  • Github codebase
  • Setup
    • Installation
    • Setup for Running Analysis
    • Resources
    • CNA Result Processing
  • Analysis
    • Overview of Analysis Workflow
    • Compile Reads
    • Filter Calls
    • SV Incorporation
    • CNA Processing
    • Create Patient Report
    • Intermediate File Organization
  • VAF Overview Plot Script
  • Swimmer Plot Scripts
  • Run create_report.R
  • Miscellaneous Utility Scripts
    • Convert CSV to MAF
    • Get cBioPortal Variants
    • Convert dates to days
  • Manifest Update Script
Powered by GitBook
On this page
Export as PDF

VAF Overview Plot Script

PreviousIntermediate File OrganizationNextSwimmer Plot Scripts

Was this helpful?

CtrlK
  • Overview
  • Features
  • Requirements
  • R Packages
  • Usage
  • Command-Line Arguments
  • Example Command
  • Input File Requirements
  • Outputs
  • Script Workflow
  • Error Handling
  • Example Outputs
  • PDF Plot
  • HTML Plot
  • VAF Statistics
  • Contact

Was this helpful?

Overview

This script, vaf_overview_plot.R, generates Variant Allele Frequency (VAF) overview plots for clinical and variant data. It creates visualizations in both PDF and HTML formats, providing insights into VAF trends, treatment durations, and reasons for stopping treatment for a specified number of patients.

Features

  • Input Parsing: Accepts clinical and variant data files as input.

  • Data Validation: Ensures required columns are present in the input files.

  • Data Processing:

    • Merges clinical and variant data.

    • Filters and categorizes data based on assay type.

    • Calculates VAF statistics (mean, max, relative VAF).

  • Visualization:

    • Generates plots for initial VAF, VAF trends, treatment duration, and reasons for stopping treatment.

    • Combines plots into a grid for each patient chunk.

  • Output:

    • Saves plots in both PDF and HTML formats.

    • Exports VAF statistics as a tab-delimited text file.

Requirements

R Packages

The script requires the following R packages:

  • ggplot2

  • gridExtra

  • tidyr

  • dplyr

  • sqldf

  • RSQLite

  • readr

  • argparse

  • plotly

  • htmlwidgets

  • purrr

Install the required packages using the following command:

install.packages(c("ggplot2", "gridExtra", "tidyr", "dplyr", "sqldf", "RSQLite", "readr", "argparse", "plotly", "htmlwidgets", "purrr"))

Usage

Command-Line Arguments

The script accepts the following arguments:

Argument
Type
Description
Default Value

-o, --resultsdir

character

Output directory where plots and statistics will be saved.

None

-v, --variants

character

File path to the variant data (MAF file).

None

-c, --clinical

character

File path to the clinical data file.

Example Command

Rscript vaf_overview_plot.R -o /path/to/output -v /path/to/variants.maf -c /path/to/clinical.tsv -y mean -n 10

Input File Requirements

Clinical Data File

The clinical data file must be a tab-delimited file containing the following columns:

  • cmoSampleName

  • cmoPatientId

  • PatientId

  • collection_date

  • collection_in_days

  • timepoint

  • treatment_length

  • treatmentName

  • reason_for_tx_stop

Variant Data File

The variant data file must be a tab-delimited file containing the following columns:

  • Hugo_Symbol

  • HGVSp_Short

  • Tumor_Sample_Barcode

  • t_alt_freq

  • covered (optional)

Outputs

  1. Plots:

    • PDF files: One file per patient chunk (e.g., VAF_overview_chunk_1.pdf).

    • HTML files: Interactive plots for each patient chunk (e.g., VAF_overview_chunk_1.html).

  2. Statistics:

    • A tab-delimited text file (vaf_statistics.txt) containing VAF statistics for all patients.

Script Workflow

  1. Input Parsing:

    • Reads the clinical and variant data files.

    • Validates the presence of required columns.

  2. Data Processing:

    • Merges clinical and variant data.

    • Filters and categorizes variants based on assay type.

    • Calculates VAF statistics (mean, max, relative VAF).

  3. Visualization:

    • Splits data into chunks based on the number of patients specified.

    • Generates the following plots for each chunk:

      • Initial VAF

  4. Output:

    • Saves the combined plots and VAF statistics.

Error Handling

The script includes error handling for the following scenarios:

  • Missing required columns in the input files.

  • Empty data frames after filtering.

  • Invalid Y-axis metric.

  • Number of patients per plot exceeding the total number of unique patients.

Example Outputs

PDF Plot

The PDF plot contains the following panels for each patient:

  1. Initial VAF: Bar plot showing the initial VAF.

  2. VAF Trends: Line plot showing VAF trends over time.

  3. Treatment Duration: Bar plot showing the treatment duration in days.

  4. Reason for Stopping Treatment: Tile plot showing the reason for stopping treatment.

HTML Plot

The HTML plot is an interactive version of the PDF plot, allowing users to explore the data dynamically.

VAF Statistics

The vaf_statistics.txt file contains the following columns:

  • cmoSampleName

  • cmoPatientId

  • collection_in_days

  • PatientId

  • treatment_length

  • reason_for_tx_stop

  • AverageVAF

  • MinVAF

  • SDVAF

  • MaxVAF

Contact

For questions or issues, please contact:

  • Author: Carmelina Charalambous, Alexander Ham

  • Date: 11/30/2023

VAF trends over time

  • Treatment duration

  • Reasons for stopping treatment

  • Combines the plots into a grid and saves them as PDF and HTML files.

  • None

    -y, --yaxis

    character

    Y-axis metric for VAF plots (mean, max, or relative).

    mean

    -n, --num_patients

    integer

    Number of patients to include in each plot.

    10