The sub-command pv maf
allows users to perform post-processing on maf files. It has has six sub-commands: annotate
, concat
, filter
, mergetsv
, subset
, tag
At a minimum, each of these commands assumes a MAF file to be a well-defined object with the following characteristics:
a delimited file where the delimiter is either a '\t' or a ','
the file uses one of the following extension: '.maf', '.txt', '.csv', 'tsv'
The delimited file at A minimum includes the following columns: "Chromosome","Start_Position","End_Position","Reference_Allele","Tumor_Seq_Allele2"
The minimum listed columns can be combined into a unique ID for each row.
However, some commands and their sub-commands may require additional columns and may use specific rules in their processing of the MAF file.
Output is a MAF file which is modified as per the operation of each command,
For specifics on these criteria and rules, please find additional documentation on these commands below:
The sub-command pv maf
allows users to perform post-processing on maf files. It has has six sub-commands: annotate
, concat
, filter
, mergetsv
, subset
, tag
At minimum each of these commands assumes a maf file to be a well-defined object with the following characteristics:
a delimited file where the delimiter is either a '\t' or a ','
the file uses one of the following extension: '.maf', '.txt', '.csv', 'tsv'
The delimited file at minimum includes the following columns: "Chromosome","Start_Position","End_Position","Reference_Allele","Tumor_Seq_Allele2"
The minimum listed columns can be combined into a unique id for each row.
These are the minimum requirements for a maf being used in these post-processing commands.
However, some commands and their sub-commands may require additional criteria of the maf file. Additionally, they may also use specific rules in their processsing of the maf file.
For specifics on these criteria and rules, please find additional documentation on these commands below:
maf concat examples:
pv maf concat -f path/to/maf1.maf -f path/to/maf2.maf -o output_maf
pv maf concat -f path/to/maf1.maf -f path/to/maf2.maf -o output_maf -h header.txt
where header.txt
is a header file with names by which the mafs will be row-wise concatenated. See resources/header.txt
for an example.
pv maf -p path/to/paths.txt -o output/path/file
where path/to/paths.txt
is a txt file with maf path locations. See resources/paths.txt
for an example.
maf annotate examples:
pv maf mafbybed -m path/to/maf.maf -b path/to/maf.bed -o output/path/file -c annotation
pv maf annotate mafbytsv -m /path/to/maf.(tsv/csv/maf) -t path/to/tsv.tsv -sep tsv -oc hotspot -v "Yes" "No"
maf tag examples:
pv maf tag cmoch -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf tag common_variant -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf tag germline_status -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf tag prevalence_in_cosmicDB -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf tag truncating_mut_in_TSG -m path/to/maf.maf -o output/path/file -sep "tsv"
maf filter examples:
pv maf filter cmo_ch -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf filter hotspot -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf filter mappable -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf filter non_common_variant -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf filter non_hotspot -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf filter not_complex -m path/to/maf.maf -o output/path/file -sep "tsv"
This package provides a variety of commands for manipulating different types of common outputs (e.g. mafs, vcf and txt files) from different bioinformatic variant callers such as mutect and vardict.
Supported File Types:
For general use you can run: pip install postprocessing_variant_calls
or a tagged version with pip install git+https://github.com/msk-access/postprocessing_variant_calls.git@<version>
For setting up a development environment please see the Setting up a Dev Environment section.
See CLI for commmand line usage of the package.
Have an environment with python >= 3.8 installed.
Install poetry:
Then install project dependencies with Poetry.
To access the environment after initial setup up run:
The Gitbook for this repository is configured so changes are written in Gitbook and synced with the docs
To contribute to the documentation, you can write your changes in Gitbook, request a review, and merge the changes. Keep in mind, you will need access to the organization to contribute.
Each file-type supported should have a section in the Gitbook detailing the implementation of the file-type and a justification of it's operations. For example, the maf
file-type has it's own section, which includes a description of how a maf is defined internally in the package and a justification of it's operations and how to use them.
Beyond file-type sections, you will also notice a section called cli
, which lists all commands in the postprocessing_variant_calls
package. Do not manually edit this section. This section is created using the typer-cli
package, which uses the typer help
parameters specified in typer commands to generate documentation. It is automatically updated by the git-action, .github/workflows/document_package.yml
upon a push to the main
branch. To make sure the cli.md
document updates to include newly added commands, specify all relevant typer help
If you'd like to see a mock of your typer commands as they'll appear in the cli.md
document, you can run: poetry run typer postprocessing_variant_calls.main utils docs > docs/cli.md
in a properly configured dev environment. Note that this file should not be included in your PRs. The cli.md
should be only updated through the git-action, .github/workflows/document_package.yml
This hosts multiple scripts necessary for filtering and processing variant calls in the vcfs/txt file generated by callers.
is the main command for the postprocessing_variant_calls
package see pv --help
to see supported variant callers commands.
The sub-command pv vardict
allows users to perform post-processing on VarDictJava output. The two supported inputs to pv vardict
from VarDictJava are single
and case-control
To specify to pv vardict
, which input type will be used one of the following sub-commands may be used:
pv vardict single
for single sample vcfs
pv vardict case-control
for case-controlled vcfs.
Next the user can specify, what post-processing should be done. Right now, postprocessing_variant_calls
supports filtering:
pv vardict single filter
pv vardict case-control filter
Finally, we can specify the paths and options for our filtering and run our command. Here is an example using the test data provided in this repository:
pv vardict single filter --inputVcf data/Myeloid200-1.vcf --tsampleName Myeloid200-1 -ad 1 -o data/single
There are various options and input specifications for filtering so see pv vardict single filter --help
or pv vardict single case-sontrol --help
for help.
See example_calls.sh
for more example calls.
Template used: https://github.com/yxtay/python-project-template
Use Conda to create a virtual environment and activate it for the project.
Then install project dependencies with Poetry.
To update the environment after initial setup up run:
instead of conda create
, and then re-run make deps-install
Leveraging the PyVcf package the following filtering is performed:
Case 1: Single sample mode
Case 2: Case-control mode
TVF - Tumor Variant Fraction
NVF - Normal Variant Fraction
tmq - tumor minimum quality
nmq - normal minimum quality
tdp - total depth
tad - total allele depth
: Install completion for the current shell.
: Show completion for the current shell, to copy it or customize the installation.
: Show this message and exit.
: operations for manipulating maf files...
: post-processing commands for MuTect...
: post-processing commands for MuTect...
: post-processing commands for VarDict...
main maf
operations for manipulating maf files based on a given input.
: Show this message and exit.
: annotate maf files based on a given input.
: row-wise concatenation for maf files.
: filter maf files based on a given input.
: merge a tsv file onto a maf by a shared id...
: subset maf files.
: tag maf files based on a given input.
main maf annotate
annotate maf files based on a given input.
: Show this message and exit.
: Extract values from an optional blocklist...
: annotate a maf column by a bed file.
: annotate a maf column by a bed file.
main maf annotate extract_blocklist
Extract values from an optional blocklist file if provided. Used in SNVs/indels workflow.
-b, --blocklist_file FILE
: Blocklist text file to extract values from. Needs to be in TSV format [required]
-m, --maf FILE
: MAF file to subset [required]
-sep, --separator TEXT
: Specify a separator for delimited data. [default: tsv]
: Show this message and exit.
main maf annotate mafbybed
annotate a maf column by a bed file.
-m, --maf FILE
: input maf file [required]
-b, --bed FILE
: bed file to annotate maf [required]
-o, --output TEXT
: output maf file [default: output.maf]
-c, --cname TEXT
: name for annotation column [default: annotation]
: Show this message and exit.
main maf annotate mafbytsv
annotate a maf column by a bed file.
-m, --maf FILE
: MAF file to subset [required]
-t, --tsv FILE
: MAF file to subset [required]
-o, --output PATH
: Maf output file name. [default: output.maf]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
-oc, --outcome_column TEXT
: name for outcome column [default: hotspot]
-v, --values <TEXT TEXT>...
: name for annotation column. Defaults to (Yes, No) [default: yes, no]
: Show this message and exit.
main maf concat
row-wise concatenation for maf files.
-f, --files PATH
: MAF file to concatenate. Default assumes MAFs are tsv. MAF inputs are specified here, or using paths parameter
-p, --paths PATH
: A text file containing paths of maf files to concatenate. Default assumes MAFs are tsv. MAF files are specified here, or using files parameter.
-o, --output PATH
: Maf output file name. [default: output.maf]
-h, --header PATH
: A header file containing the columns to concatenate input mafs on. It must be a subset of: Hugo_Symbol, Chromosome, Start_Position, End_Position, Reference_Allele, Tumor_Seq_Allele2. These are also the default columns used for concatenation
-de, --deduplicate
: deduplicate outputted maf file.
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main maf filter
filter maf files based on a given input.
: Show this message and exit.
: Filter a MAF file based on all the...
: Filter a MAF file based on all the...
: Filter a MAF file based on all the parameters
: filter a MAF file based on the presence of...
: Filter a MAF file to retain only mappable...
: Filter a MAF file for common variants and...
: filter a MAF file based on the presence of...
: Filter a MAF filter for complex variants...
main maf filter access_filters
Filter a MAF file based on all the parameters listed in ACCESS filters python script
-f, --fillout_maf FILE
: Fillout MAF file to subset (direct output from traceback subworkflow) [required]
-a, --anno_maf FILE
: Annotated MAF file to subset (direct input file from beginning of traceback subworkflow) [required]
-o, --output PATH
: Maf output file name. [default: output]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
-bl, --blocklist TEXT
: Optional input blocklist file for access filtering criteria. [default: tsv]
-ts, --tumor_samplename TEXT
: Name of Tumor Sample [required]
-ns, --normal_samplename TEXT
: Name of MATCHED normal sample [required]
--tumor_detect_alt_thres TEXT
: The Minimum Alt depth required to be considered detected in fillout [default: 2]
--tumor_detect_alt_thres TEXT
: The Minimum Alt depth required to be considered detected in fillout [default: 2]
--curated_detect_alt_thres TEXT
: The Minimum Alt depth required to be considered detected in fillout [default: 2]
--plasma_detect_alt_thres TEXT
: The Minimum Alt depth required to be considered detected in fillout [default: 2]
--tumor_TD_min TEXT
: The Minimum Total Depth required in tumor to consider a variant Likely Germline [default: 20]
--normal_TD_min TEXT
: The Minimum Total Depth required in Matched Normal to consider a variant Germline [default: 20]
--tumor_vaf_germline_thres TEXT
: The threshold for variant allele fraction required in Tumor to be consider a variant Likely Germline [default: 0.4]
--tumor_vaf_germline_thres TEXT
: The threshold for variant allele fraction required in Matched Normal to be consider a variant Germline [default: 0.4]
--tier_one_alt_min TEXT
: The Minimum Alt Depth required in hotspots [default: 3]
--tier_two_alt_min TEXT
: The Minimum Alt Depth required in non-hotspots [default: 5]
--min_n_curated_samples_alt_detected TEXT
: The Minimum number of curated samples variant is detected to be flagged [default: 2]
--tn_ratio_thres TEXT
: Tumor-Normal variant fraction ratio threshold [default: 5]
: Show this message and exit.
main maf filter access_remove_variants
Filter a MAF file based on all the parameters satisfied by the remove variants by annotations CWL script in the ACCESS pipeline
-m, --maf FILE
: MAF file to subset [required]
-i, --intervals FILE
: Intervals file containing rows of criterion to tag input MAF by [required]
-o, --output PATH
: Maf output file name. [default: output.maf]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main maf filter cmo_ch
Filter a MAF file based on all the parameters
-m, --maf FILE
: MAF file to subset [required]
-o, --output PATH
: Maf output file name. [default: output.maf]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main maf filter hotspot
filter a MAF file based on the presence of Hotspot variants
-m, --maf FILE
: MAF file to subset [required]
-o, --output PATH
: Maf output file name. [default: output.maf]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main maf filter mappable
Filter a MAF file to retain only mappable variants
-m, --maf FILE
: MAF file to subset [required]
-o, --output PATH
: Maf output file name. [default: output.maf]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main maf filter non_common_variant
Filter a MAF file for common variants and retain only uncommo variants
-m, --maf FILE
: MAF file to subset [required]
-o, --output PATH
: Maf output file name. [default: output.maf]
-sep, --separator TEXT
: Specify a separator for delimited data. [default: tsv]
: Show this message and exit.
main maf filter non_hotspot
filter a MAF file based on the presence of Hotspot variants
-m, --maf FILE
: MAF file to subset [required]
-o, --output PATH
: Maf output file name. [default: output.maf]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main maf filter not_complex
Filter a MAF filter for complex variants and retain only simple variants
-m, --maf FILE
: MAF file to subset [required]
-o, --output PATH
: Maf output file name. [default: output.maf]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main maf mergetsv
merge a tsv file onto a maf by a shared id column.
-ma, --mafa FILE
: MAF file to subset
-mb, --mafb FILE
-o, --output PATH
: Maf output file name. [default: merged.maf]
-id, --merge_id TEXT
: id to merge mafs on. [default: id]
-h, --how TEXT
: Type of merge to be performed on mafs. Defaults to left. [default: left]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main maf subset
subset maf files.
-m, --maf FILE
: MAF file to subset
-i, --ids PATH
: List of ids to search for in the 'Tumor_Sample_Barcode' column. Header of this file is 'sample_id'
--sid TEXT
: Identifiers to search for in the 'Tumor_Sample_Barcode' column. Can be given multiple times
-o, --output TEXT
: Name of the output file [default: output_subset.maf]
-c, --cname TEXT
: Name of the column header to be used for sub-setting [default: Tumor_Sample_Barcode]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main maf tag
tag maf files based on a given input.
: Show this message and exit.
: Tag a variant in a MAF file based on...
: Tag a variant in a MAF file based on...
: Tag filtered MAF file by variant...
: Tag a variant in MAF file based on all the...
: Tag a variant in a MAF file as common...
: Tag a variant in a MAF file as germline...
: Tag a variant in a MAF file based on...
: Tag a variant in a MAF file with...
: Generate combined count columns between...
: Tag a truncating mutating variant in a MAF...
main maf tag access
Tag a variant in a MAF file based on criterion stated by the SNV/indels ACCESS pipeline workflow
-m, --maf FILE
: MAF file to tag [required]
-r, --rules FILE
: Intervals JSON file containing criterion to tag input MAF by [required]
-h, --hotspots FILE
: Text file containing hotspots to tag input MAF by [required]
-o, --output PATH
: Maf output file name. [default: output_tagged.maf]
-sep, --separator TEXT
: Specify a separator for delimited data. [default: tsv]
: Show this message and exit.
main maf tag by_rules
Tag a variant in a MAF file based on criterion stated by an input rules.json JSON file
-m, --maf FILE
: MAF file to tag [required]
-r, --rules FILE
: Intervals JSON file containing criterion to tag input MAF by [required]
-o, --output PATH
: Maf output file name. [default: output_tagged.maf]
-sep, --separator TEXT
: Specify a separator for delimited data. [default: tsv]
: Show this message and exit.
main maf tag by_variant_classification
Tag filtered MAF file by variant classifications and subset into individual text files.
-m, --maf FILE
: filtered MAF file to split by annotations with [required]
-tx_ref, --canonical_tx_ref FILE
: Reference canonical transcript file [required]
-o, --output_dir PATH
: Output Directory to export individual text files to. [default: output_dir]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main maf tag cmo_ch
Tag a variant in MAF file based on all the parameters listed
-m, --maf FILE
: MAF file to subset [required]
-o, --output PATH
: Maf output file name. [default: output.maf]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main maf tag common_variant
Tag a variant in a MAF file as common variant based on GNOMAD AF
-m, --maf FILE
: MAF file to subset [required]
-o, --output PATH
: Maf output file name. [default: output.maf]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main maf tag germline_status
Tag a variant in a MAF file as germline based on VAF value
-m, --maf FILE
: MAF file to subset [required]
-o, --output PATH
: Maf output file name. [default: output.maf]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main maf tag hotspots
Tag a variant in a MAF file based on hotspots file
-m, --maf FILE
: MAF file to tag [required]
-h, --hotspots FILE
: Text file containing hotspots to tag input MAF by [required]
-o, --output PATH
: Maf output file name. [default: output_tagged.maf]
-sep, --separator TEXT
: Specify a separator for delimited data. [default: tsv]
: Show this message and exit.
main maf tag prevalence_in_cosmicDB
Tag a variant in a MAF file with prevalence in COSMIC DB
-m, --maf FILE
: MAF file to subset [required]
-o, --output PATH
: Maf output file name. [default: output.maf]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main maf tag traceback
Generate combined count columns between standard and simplex/duplex mafs
-m, --maf FILE
: MAF file to tag [required]
-o, --output PATH
: Maf output file name. [default: output.maf]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
-sheet, --samplesheet PATH
: Samplesheets in nucleovar formatting. See README for more info: https://github.com/mskcc-omics-workflows/nucleovar/blob/main/README.md
. Used to add fillout type information to maf. The sample_id
and type
columns must be present.
: Show this message and exit.
main maf tag truncating_mut_in_TSG
Tag a truncating mutating variant in a MAF file based on its presence in the Tumor Suppressor Gene
-m, --maf FILE
: MAF file to subset [required]
-o, --output PATH
: Maf output file name. [default: output.maf]
-sep, --separator TEXT
: Specify a seperator for delimited data. [default: tsv]
: Show this message and exit.
main mutect1
post-processing commands for MuTect version 1.1.5 VCFs.
: Show this message and exit.
: Post-processing commands for case-control...
main mutect1 case-control
Post-processing commands for case-control filtering of MuTect version 1.1.5 VCF input file.
: Show this message and exit.
: This tool helps to filter MuTect version...
main mutect1 case-control filter
This tool helps to filter MuTect version 1.1.5 VCFs for case-control calling
-i, --inputVcf FILE
: Input vcf generated by MuTect which needs to be processed [required]
-i, --inputTxt FILE
: Input Txt file generated by MuTect which needs to be processed [required]
--refFasta FILE
: Input reference fasta [required]
--tsampleName TEXT
: Name of the tumor sample. [required]
-dp, --totalDepth INTEGER RANGE
: Tumor total depth threshold [default: 20; x>=0]
-ad, --alleledepth INTEGER RANGE
: [default: 1; x>=0]
-tnr, --tnRatio INTEGER RANGE
: Tumor-Normal variant fraction ratio threshold [default: 1; x>=0]
-vf, --variantFraction FLOAT RANGE
: Tumor variant fraction threshold [default: 5e-05; x>=0]
-o, --outDir TEXT
: Full Path to the output dir
: Show this message and exit.
main mutect2
post-processing commands for MuTect version 2 VCFs.
: Show this message and exit.
: Post-processing commands for filtering of...
main mutect2 case-control
Post-processing commands for filtering of MuTect version 2 VCF input file.
: Show this message and exit.
: This tool helps to filter MuTect version 2...
main mutect2 case-control filter
This tool helps to filter MuTect version 2 VCFs for case-control calling
-i, --inputVcf FILE
: Input vcf generated by MuTect2 which needs to be processed [required]
-it, --inputTxt FILE
: Input Txt generated by MuTect which needs to be processed. NOTE, a Txt file will not be used for Mutect2 filtering as it is not provided in standard output. [default: /dev/null]
--refFasta FILE
: Input reference fasta [default: /dev/null]
--tsampleName TEXT
: Name of the tumor sample. [required]
-dp, --totalDepth INTEGER RANGE
: Tumor total depth threshold [default: 20; x>=0]
-ad, --alleleDepth INTEGER RANGE
: [default: 1; x>=0]
-tnr, --tnRatio INTEGER RANGE
: Tumor-Normal variant fraction ratio threshold [default: 1; x>=0]
-vf, --variantFraction FLOAT RANGE
: Tumor variant fraction threshold [default: 5e-05; x>=0]
-o, --outDir TEXT
: Full Path to the output dir
: Show this message and exit.
main vardict
post-processing commands for VarDict version 1.4.6 VCFs.
: Show this message and exit.
: Post-processing commands for a...
: Post-processing commands for a single...
main vardict case-control
Post-processing commands for a case-controlled VarDict version 1.4.6 VCFs
: Show this message and exit.
: This tool helps to filter vardict version...
main vardict case-control filter
This tool helps to filter vardict version 1.4.6 VCFs for case control calling
-i, --inputVcf FILE
: Input vcf generated by vardict which needs to be processed [required]
--tsampleName TEXT
: Name of the tumor Sample [required]
-dp, --totalDepth INTEGER RANGE
: Tumor total depth threshold [default: 20; x>=20]
-ad, --alleledepth INTEGER RANGE
: [x>=1]
-tnr, --tnRatio INTEGER
: Tumor-Normal variant fraction ratio threshold [default: 1]
-vf, --variantFraction FLOAT
: Tumor variant fraction threshold [default: 5e-05]
-mq, --minQual INTEGER
: Minimum variant call quality [default: 0]
-fg, --filterGermline
: Whether to remove calls without 'somatic' status
-o, --outDir TEXT
: Full Path to the output dir
: Show this message and exit.
main vardict single
Post-processing commands for a single sample VarDict version 1.4.6 VCFs
: Show this message and exit.
: This tool helps to filter vardict version...
main vardict single filter
This tool helps to filter vardict version 1.4.6 VCFs for single sample calling
-i, --inputVcf FILE
: Input vcf generated by vardict which needs to be processed [required]
--tsampleName TEXT
: Name of the tumor Sample [required]
-dp, --totalDepth INTEGER RANGE
: Tumor total depth threshold [default: 20; x>=20]
-ad, --alleledepth INTEGER RANGE
: [x>=1]
-tnr, --tnRatio INTEGER
: Tumor-Normal variant fraction ratio threshold [default: 1]
-vf, --variantFraction FLOAT
: Tumor variant fraction threshold [default: 5e-05]
-mq, --minQual INTEGER
: Minimum variant call quality [default: 0]
-fg, --filterGermline
: Whether to remove calls without 'somatic' status
-o, --outDir TEXT
: Full Path to the output dir
: Show this message and exit.