Architecture

py-gbcms uses a hybrid Python/Rust architecture for maximum performance.

System Overview

flowchart TB
    subgraph Python["🐍 Python Layer"]
        CLI[CLI<br/>cli.py] --> Pipeline[Orchestration<br/>pipeline.py]
        Pipeline --> Reader[Input Adapters<br/>VcfReader, MafReader]
        Pipeline --> Writer[Output Writers<br/>VcfWriter, MafWriter]
    end
    
    subgraph Rust["🦀 Rust Layer (gbcms._rs)"]
        Counter[count_bam<br/>counting.rs] --> CIGAR[CIGAR Parser]
        Counter --> Stats[Strand Bias<br/>stats.rs]
    end
    
    Pipeline -->|"PyO3"| Counter
    Counter -->|"BaseCounts"| Pipeline
    
    style Python fill:#3776ab,color:#fff
    style Rust fill:#dea584,color:#000

Data Flow


Coordinate System

All coordinates normalized to 0-based, half-open internally:

Format
System
Example

VCF input

1-based

chr1:100

Internal

0-based

chr1:99

Output

1-based

chr1:100


Formulas

Variant Allele Frequency (VAF)

Where:

  • AD = Alternate allele read count

  • RD = Reference allele read count

Strand Bias (Fisher's Exact Test)

Low p-value (< 0.05) indicates potential strand bias artifact.


Module Structure


Configuration

All settings via GbcmsConfig (Pydantic model):

See models/core.pyarrow-up-right for definitions.

Last updated