This hosts multiple scripts necessary for filtering and processing variant calls in the vcfs/txt file generated by callers.
pv
is the main command for the postprocessing_variant_calls
package see pv --help
to see supported variant callers commands.
The sub-command pv vardict
allows users to perform post-processing on VarDictJava output. The two supported inputs to pv vardict
from VarDictJava are single
and case-control
vcfs.
To specify to pv vardict
, which input type will be used one of the following sub-commands may be used:
pv vardict single
for single sample vcfs
pv vardict case-control
for case-controlled vcfs.
Next the user can specify, what post-processing should be done. Right now, postprocessing_variant_calls
supports filtering:
pv vardict single filter
pv vardict case-control filter
Finally, we can specify the paths and options for our filtering and run our command. Here is an example using the test data provided in this repository:
pv vardict single filter --inputVcf data/Myeloid200-1.vcf --tsampleName Myeloid200-1 -ad 1 -o data/single
There are various options and input specifications for filtering so see pv vardict single filter --help
or pv vardict single case-sontrol --help
for help.
See example_calls.sh
for more example calls.
Template used: https://github.com/yxtay/python-project-template
[Conda][conda]
[Docker][docker]
[Make][make]
Use Conda to create a virtual environment and activate it for the project.
Then install project dependencies with Poetry.
To update the environment after initial setup up run:
instead of conda create
, and then re-run make deps-install
Leveraging the PyVcf package the following filtering is performed:
Case 1: Single sample mode
Case 2: Case-control mode
Abbreviations
TVF - Tumor Variant Fraction
NVF - Normal Variant Fraction
tmq - tumor minimum quality
nmq - normal minimum quality
tdp - total depth
tad - total allele depth
The sub-command pv maf
allows users to perform post-processing on maf files. It has has six sub-commands: annotate
, concat
, filter
, mergetsv
, subset
, tag
.
At a minimum, each of these commands assumes a MAF file to be a well-defined object with the following characteristics:
a delimited file where the delimiter is either a '\t' or a ','
the file uses one of the following extension: '.maf', '.txt', '.csv', 'tsv'
The delimited file at A minimum includes the following columns: "Chromosome","Start_Position","End_Position","Reference_Allele","Tumor_Seq_Allele2"
The minimum listed columns can be combined into a unique ID for each row.
However, some commands and their sub-commands may require additional columns and may use specific rules in their processing of the MAF file.
Output is a MAF file which is modified as per the operation of each command,
For specifics on these criteria and rules, please find additional documentation on these commands below:
The sub-command pv maf
allows users to perform post-processing on maf files. It has has six sub-commands: annotate
, concat
, filter
, mergetsv
, subset
, tag
.
At minimum each of these commands assumes a maf file to be a well-defined object with the following characteristics:
a delimited file where the delimiter is either a '\t' or a ','
the file uses one of the following extension: '.maf', '.txt', '.csv', 'tsv'
The delimited file at minimum includes the following columns: "Chromosome","Start_Position","End_Position","Reference_Allele","Tumor_Seq_Allele2"
The minimum listed columns can be combined into a unique id for each row.
These are the minimum requirements for a maf being used in these post-processing commands.
However, some commands and their sub-commands may require additional criteria of the maf file. Additionally, they may also use specific rules in their processsing of the maf file.
For specifics on these criteria and rules, please find additional documentation on these commands below:
maf concat examples:
pv maf concat -f path/to/maf1.maf -f path/to/maf2.maf -o output_maf
pv maf concat -f path/to/maf1.maf -f path/to/maf2.maf -o output_maf -h header.txt
where header.txt
is a header file with names by which the mafs will be row-wise concatenated. See resources/header.txt
for an example.
pv maf -p path/to/paths.txt -o output/path/file
where path/to/paths.txt
is a txt file with maf path locations. See resources/paths.txt
for an example.
maf annotate examples:
pv maf mafbybed -m path/to/maf.maf -b path/to/maf.bed -o output/path/file -c annotation
pv maf annotate mafbytsv -m /path/to/maf.(tsv/csv/maf) -t path/to/tsv.tsv -sep tsv -oc hotspot -v "Yes" "No"
maf tag examples:
pv maf tag cmoch -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf tag common_variant -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf tag germline_status -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf tag prevalence_in_cosmicDB -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf tag truncating_mut_in_TSG -m path/to/maf.maf -o output/path/file -sep "tsv"
maf filter examples:
pv maf filter cmo_ch -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf filter hotspot -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf filter mappable -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf filter non_common_variant -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf filter non_hotspot -m path/to/maf.maf -o output/path/file -sep "tsv"
pv maf filter not_complex -m path/to/maf.maf -o output/path/file -sep "tsv"