Manifest Update Script

Overview

This Python script processes and updates an ACCESS manifest file by generating paths for various data types (e.g., BAM, MAF, CNA, SV files) and saves the updated manifest in both Excel and CSV formats. It supports both legacy and modern input formats and includes options for handling Protected Health Information (PHI).

Features

  • Input Validation:

    • Ensures required columns are present in the input manifest.

    • Validates date formats and handles missing values.

  • Path Generation:

    • Automatically generates paths for BAM, MAF, CNA, and SV files based on sample type and assay type.

  • PHI Handling:

    • Optionally removes collection dates to comply with privacy regulations.

  • Output:

    • Saves the updated manifest in both Excel and CSV formats.

    • Supports custom output file prefixes.

  • Legacy Support:

    • Handles legacy input file formats with specific path requirements.

Requirements

Python Packages

The script requires the following Python packages:

  • pandas

  • typer

  • rich

  • arrow

  • numpy

  • openpyxl (for Excel file handling)

Install the required packages using the following command:

pip install pandas typer rich arrow numpy openpyxl

Usage

Commands

The script provides two main commands:

  1. make-manifest: Processes the input manifest file to generate paths for various data types and saves the updated manifest.

  2. update-manifest: Updates a legacy ACCESS manifest file with specific paths.

Command-Line Arguments

make-manifest

Argument
Type
Description
Default Value

-i, --input

Path

Path to the input manifest file.

None

-o, --output

str

Prefix name for the output files (without extension).

None

--remove-collection-date

bool

Remove collection date from the output manifest (PHI).

False

-a, --assay-type

str

Assay type, either XS1 or XS2.

XS2

update-manifest

Argument
Type
Description
Default Value

-i, --input

Path

Path to the input manifest file.

None

-o, --output

str

Prefix name for the output files (without extension).

None

Example Commands

make-manifest

python manifest.py make-manifest -i input_manifest.xlsx -o updated_manifest --remove-collection-date -a XS2

update-manifest

python manifest.py update-manifest -i legacy_manifest.xlsx -o updated_legacy_manifest

Input File Requirements

Required Columns

The input manifest file must contain the following columns:

  • CMO Patient ID

  • CMO Sample Name

  • Sample Type

For legacy input files, the following additional columns are required:

  • cmo_patient_id

  • cmo_sample_id_normal

  • cmo_sample_id_plasma

Date Format

The script supports the following date formats:

  • MM/DD/YY

  • M/D/YY

  • MM/D/YYYY

  • YYYY/MM/DD

  • YYYY-MM-DD

Invalid or missing dates will raise an error unless the --remove-collection-date option is used.

Outputs

The script generates two output files:

  1. Excel File: <output_prefix>.xlsx

  2. CSV File: <output_prefix>.csv

Both files contain the updated manifest with the following columns:

  • cmo_patient_id

  • cmo_sample_id_plasma

  • cmo_sample_id_normal

  • bam_path_normal

  • bam_path_plasma_duplex

  • bam_path_plasma_simplex

  • maf_path

  • cna_path

  • sv_path

  • paired

  • sex

  • collection_date

  • dmp_patient_id

Script Workflow

  1. Input Validation:

    • Checks for required columns and missing values.

    • Validates date formats.

  2. Path Generation:

    • Generates paths for BAM, MAF, CNA, and SV files based on sample type and assay type.

  3. DataFrame Creation:

    • Creates separate DataFrames for normal and non-normal samples.

    • Merges the DataFrames to include paired and unpaired samples.

  4. Output:

    • Saves the updated manifest in Excel and CSV formats.

Error Handling

The script includes error handling for the following scenarios:

  • Missing required columns.

  • Missing or invalid date values.

  • File read/write errors.

Example Workflow

  1. Prepare Input Manifest: Ensure the input manifest file contains the required columns and valid date formats.

  2. Run make-manifest:

    python manifest.py make-manifest -i input_manifest.xlsx -o updated_manifest --remove-collection-date -a XS2
  3. Check Outputs: Verify the generated Excel and CSV files in the specified output directory.

Contact

For questions or issues, please contact:

  • Author: Carmelina Charalambous, Ronak Shah (@rhshah)

  • Date: June 21, 2024

Was this helpful?