Run create_report.R
This script enables to run the create_report.R script on multiple patients
Requirements
access_data_analysis=>0.1.2 # works with this repo tag
typer==0.3.2
typing_extensions==3.10.0.0
pandas==1.2.5
rich==12.1.0
run_create_report
Main Script (run_create_report.py)
Usage: run_create_report.py [OPTIONS]
Options:
-r, --repo PATH Base path to where the git repository is
located for access_data_analysis
-s, --script PATH Path to the create_report.R script, fall
back if `--repo` is not given
-t, --template PATH Path to the template.Rmd or
template_days.Rmd to be used with
create_report.R when `--repo` is not given
-m, --manifest FILE File containing meta information per sample.
Require following columns in the header:
cmo_patient_id, sample_id, dmp_patient_id,
collection_date or collection_day,
timepoint. If dmp_sample_id column is given
and has information that will be used to run
facets. If dmp_sample_id is not given and
dmp_patient_id is given than it will be used
to get the Tumor sample with lowest number.
If dmp_sample_id or dmp_patient_id is not
given then it will run without the facet maf
file [required]
-v, --variant-results DIRECTORY
Base path for all results of small variants
as generated by filter_calls.R script in
access_data_analysis (Make sure only High
Confidence calls are included) [required]
-c, --cnv-results DIRECTORY Base path for all results of CNV as
generated by CNV_processing.R script in
access_data_analysis [required]
-f, --facet-repo DIRECTORY Base path for all results of facets on
Clinical MSK-IMPACT samples [default: /juno
/work/ccs/shared/resources/impact/facets/all
/]
-bf, --best-fit If this is set to True then we will attempt
to parse `facets_review.manifest` file to
pick the best fit for a given dmp_sample_id
[default: False]
-l, --tumor-type TEXT Tumor type label for the report [required]
-cfm, --copy-facet-maf If this is set to True then we will copy the
facet maf file in the directory specified in
`copy_facet_dir` [default: False]
-cfd, --copy-facet-dir PATH Directory path where the facet maf file
should be copied.
-d, --template-days If the `--repo` option is specified and if
this is set to True then we will use the
template_days RMarkdown file as the template
[default: False]
-gm, --generate-markdown If given, the create_report.R will be run
with `-md` flag to generate markdown
[default: False]
-ff, --force If this is set to True then we will not stop
if an error is encountered in a given sample
while running create_report.R but keep on
running for the next sample [default:
False]
--install-completion Install completion for the current shell.
--show-completion Show completion for the current shell, to
copy it or customize the installation.
--help Show this message and exit.
Wrapper script to run create_report.R
Arguments:
repo_path
Path, optional - "Base path to where the git repository is located for access_data_analysis".script_path
Path, optional - "Path to the create_report.R script, fall back if--repo
is not given".template_path
Path, optional - "Path to the template.Rmd or template_days.Rmd to be used with create_report.R when--repo
is not given".manifest
Path, required - "File containing meta information per sample. Require following columns in the header:cmo_patient_id
,sample_id
,dmp_patient_id
,collection_date
orcollection_day
,timepoint
. If dmp_sample_id column is given and has information that will be used to run facets. if dmp_sample_id is not given and dmp_patient_id is given than it will be used to get the Tumor sample with lowest number.If dmp_sample_id or dmp_patient_id is not given then it will run without the facet maf file".variant_path
Path, required - "Base path for all results of small variants as generated by filter_calls.R script in access_data_analysis (Make sure only High Confidence calls are included)".cnv_path
Path, required - "Base path for all results of CNV as generated by CNV_processing.R script in access_data_analysis".facet_repo
Path, required - "Base path for all results of facets on Clinical MSK-IMPACT samples".best_fit
bool, optional - "If this is set to True then we will attempt to parsefacets_review.manifest
file to pick the best fit for a given dmp_sample_id".tumor_type
str, required - "Tumor type label for the report".copy_facet
bool, optional - "If this is set to True then we will copy the facet maf file in the directory specified incopy_facet_dir
".copy_facet_dir
Path, optional - "Directory path where the facet maf file should be copied.".template_days
bool, optional - "If the--repo
option is specified and if this is set to True then we will use the template_days RMarkdown file as the template".markdown
bool, optional - "If given, the create_report.R will be run with-md
flag to generate markdown".force
bool, optional - "If this is set to True then we will not stop if an error is encountered in a given sample but keep on running for the next sample".
Usage
Using Generate Markdown, copy facet maf file, use template_days RMarkdown, force flag and best fit for facets
> python python/run_create_report/run_create_report.py \
-m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
-r /home/shahr2/github/access_data_analysis \
-v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
-c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
-l "Melanoma" -gm -d -cfm -ff -bf
Using Generate Markdown, force flag and default fit for facets
> python python/run_create_report/run_create_report.py \
-m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
-r /home/shahr2/github/access_data_analysis \
-v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
-c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
-l "Melanoma" -gm -ff
Submodules
check_required_columns
check_required_columns
def check_required_columns(manifest, template_days=None)
Check if all required columns are present in the sample manifest file
Arguments:
manifest
data_frame - meta information file with information for each sampletemplate_days
bool - True|False if template days RMarkdown will be used
Raises:
typer.Abort
- if "cmo_patient_id" column not providedtyper.Abort
- if "cmo_sample_id/sample_id" column not providedtyper.Abort
- if "dmp_patient_id" column not providedtyper.Abort
- if "collection_date/collection_day" column not providedtyper.Abort
- if "timepoint" column not provided
Returns:
list
- column name for the manifest filedata_frame
- data_frame with unique ids to traverse over
generate_repo_paths
generate_repo_path
def generate_repo_path(repo_path=None, script_path=None, template_path=None, template_days=None)
Generate path to create_report.R and template RMarkdown file
Arguments:
repo_path
pathlib.Path, optional - Path to clone of git repo access_data_analysis. Defaults to None.script_path
pathlib.Path, optional - Path to create_report.R. Defaults to None.template_path
pathlib.Path, optional - Path to template RMarkdown file. Defaults to None.template_days
bool, optional - True|False to use days template if using repo_path. Defaults to None.
Raises:
typer.Abort
- Abort if both repo_path and script_path are not giventyper.Abort
- Abort if both repo_path and template_path are not given
Returns:
str
- Path to create_report.R and path to template markdown file
read_manifest
read_manifest
def read_manifest(manifest)
Read manifest file
Arguments:
manifest
pathlib.PATH - description
Returns:
data_frame
- description
get_row
def get_row(tsv_file)
Function to skip rows
Arguments:
tsv_file
file - file to be read
Returns:
list
- lines to be skipped
get_small_variant_csv
get_small_variant_csv
def get_small_variant_csv(patient_id, csv_path)
Get the path to CSV file to be used for a given patient containing all variants
Arguments:
patient_id
str - patient id used to identify the csv filecsv_path
pathlib.path - base path where the csv file is expected to be present
Raises:
typer.Abort
- if no csv file is returnedtyper.Abort
- if more then one csv file is returned
Returns:
str
- path to csv file containing the variants
run_cmd
run_cmd
def run_cmd(cmd)
Given a system command run it using subprocess
Arguments:
cmd
str - System command to be run as a string
run_multiple_cmd
def run_multiple_cmd(commands, parallel_process=None)
Given a system command run it using subprocess
Arguments:
cmd
list[str] - list of system commands to be run
generate_facet_maf_path
generate_facet_maf_path
def generate_facet_maf_path(facet_path, patient_id, sample_id=None)
Get path of maf associated with facet-suite output
Arguments:
facet_path
pathlib.PATH|str - path to search for the facet filepatient_id
str - patient id to be used to search, default is set to Nonesample_id
str - sample id to be used to search, default is set to None
Returns:
str
- path of the facets maf
get_maf_path
def get_maf_path(maf_path, patient_id, sample_id)
Get the path to the maf file
Arguments:
maf_path
pathlib.Path - Base path of the maf filepatient_id
str: DMP Patient ID for facetssample_id
str - DMP Sample ID if any for facets
Returns:
str
- Path to the maf file
get_best_fit_folder
def get_best_fit_folder(facet_manifest_path)
Get the best fit folder for the given facet manifest path
Arguments:
facet_manifest_path
str - manifest path to be used for determining best fit
Returns:
pathlib.Path
- path to the folder containing best fit maf files
generate_create_report_cmd
generate_create_report_cmd
def generate_create_report_cmd(script, markdown, template_file, cmo_patient_id, csv_file, manifest, cnv_path, dmp_patient_id, dmp_sample_id, dmp_facet_maf, tumor_type=None)
Create the system command that should be run for create_report.R
Arguments:
script
str - path for create_report.Rmarkdown
bool - True|False to generate markdown outputtemplate_file
str - path for the template filecmo_patient_id
str - patient id from CMOcsv_file
str - path to csv file containing variant informationtumor_type
str - tumor type labelmanifest
pathlib.Path - path to the manifest containing meta datacnv_path
pathlib.Path - path to directory having cnv filesdmp_patient_id
str - patient id of the clinical msk-impact sampledmp_sample_id
str - sample id of the clinical msk-impact sampledmp_facet_maf
str - path to the clinical msk-impact maf file annotated for facets results
Returns:
cmd
str - system command to run for create_report.Rhtml_output
pathlib.Path - where the output file should be written
Last updated
Was this helpful?