Run create_report.R
This script enables to run the create_report.R script on multiple patients
Requirements
access_data_analysis=>0.1.2 # works with this repo tag
typer==0.3.2
typing_extensions==3.10.0.0
pandas==1.2.5
rich==12.1.0run_create_report
Main Script (run_create_report.py)
Usage: run_create_report.py [OPTIONS]
Options:
-r, --repo PATH Base path to where the git repository is
located for access_data_analysis
-s, --script PATH Path to the create_report.R script, fall
back if `--repo` is not given
-t, --template PATH Path to the template.Rmd or
template_days.Rmd to be used with
create_report.R when `--repo` is not given
-m, --manifest FILE File containing meta information per sample.
Require following columns in the header:
cmo_patient_id, sample_id, dmp_patient_id,
collection_date or collection_day,
timepoint. If dmp_sample_id column is given
and has information that will be used to run
facets. If dmp_sample_id is not given and
dmp_patient_id is given than it will be used
to get the Tumor sample with lowest number.
If dmp_sample_id or dmp_patient_id is not
given then it will run without the facet maf
file [required]
-v, --variant-results DIRECTORY
Base path for all results of small variants
as generated by filter_calls.R script in
access_data_analysis (Make sure only High
Confidence calls are included) [required]
-c, --cnv-results DIRECTORY Base path for all results of CNV as
generated by CNV_processing.R script in
access_data_analysis [required]
-f, --facet-repo DIRECTORY Base path for all results of facets on
Clinical MSK-IMPACT samples [default: /juno
/work/ccs/shared/resources/impact/facets/all
/]
-bf, --best-fit If this is set to True then we will attempt
to parse `facets_review.manifest` file to
pick the best fit for a given dmp_sample_id
[default: False]
-l, --tumor-type TEXT Tumor type label for the report [required]
-cfm, --copy-facet-maf If this is set to True then we will copy the
facet maf file in the directory specified in
`copy_facet_dir` [default: False]
-cfd, --copy-facet-dir PATH Directory path where the facet maf file
should be copied.
-d, --template-days If the `--repo` option is specified and if
this is set to True then we will use the
template_days RMarkdown file as the template
[default: False]
-gm, --generate-markdown If given, the create_report.R will be run
with `-md` flag to generate markdown
[default: False]
-ff, --force If this is set to True then we will not stop
if an error is encountered in a given sample
while running create_report.R but keep on
running for the next sample [default:
False]
--install-completion Install completion for the current shell.
--show-completion Show completion for the current shell, to
copy it or customize the installation.
--help Show this message and exit.Wrapper script to run create_report.R
Arguments:
repo_pathPath, optional - "Base path to where the git repository is located for access_data_analysis".script_pathPath, optional - "Path to the create_report.R script, fall back if--repois not given".template_pathPath, optional - "Path to the template.Rmd or template_days.Rmd to be used with create_report.R when--repois not given".manifestPath, required - "File containing meta information per sample. Require following columns in the header:cmo_patient_id,sample_id,dmp_patient_id,collection_dateorcollection_day,timepoint. If dmp_sample_id column is given and has information that will be used to run facets. if dmp_sample_id is not given and dmp_patient_id is given than it will be used to get the Tumor sample with lowest number.If dmp_sample_id or dmp_patient_id is not given then it will run without the facet maf file".variant_pathPath, required - "Base path for all results of small variants as generated by filter_calls.R script in access_data_analysis (Make sure only High Confidence calls are included)".cnv_pathPath, required - "Base path for all results of CNV as generated by CNV_processing.R script in access_data_analysis".facet_repoPath, required - "Base path for all results of facets on Clinical MSK-IMPACT samples".best_fitbool, optional - "If this is set to True then we will attempt to parsefacets_review.manifestfile to pick the best fit for a given dmp_sample_id".tumor_typestr, required - "Tumor type label for the report".copy_facetbool, optional - "If this is set to True then we will copy the facet maf file in the directory specified incopy_facet_dir".copy_facet_dirPath, optional - "Directory path where the facet maf file should be copied.".template_daysbool, optional - "If the--repooption is specified and if this is set to True then we will use the template_days RMarkdown file as the template".markdownbool, optional - "If given, the create_report.R will be run with-mdflag to generate markdown".forcebool, optional - "If this is set to True then we will not stop if an error is encountered in a given sample but keep on running for the next sample".
Usage
Using Generate Markdown, copy facet maf file, use template_days RMarkdown, force flag and best fit for facets
> python python/run_create_report/run_create_report.py \
-m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
-r /home/shahr2/github/access_data_analysis \
-v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
-c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
-l "Melanoma" -gm -d -cfm -ff -bfUsing Generate Markdown, force flag and default fit for facets
> python python/run_create_report/run_create_report.py \
-m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
-r /home/shahr2/github/access_data_analysis \
-v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
-c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
-l "Melanoma" -gm -ffSubmodules
check_required_columns
check_required_columns
def check_required_columns(manifest, template_days=None)Check if all required columns are present in the sample manifest file
Arguments:
manifestdata_frame - meta information file with information for each sampletemplate_daysbool - True|False if template days RMarkdown will be used
Raises:
typer.Abort- if "cmo_patient_id" column not providedtyper.Abort- if "cmo_sample_id/sample_id" column not providedtyper.Abort- if "dmp_patient_id" column not providedtyper.Abort- if "collection_date/collection_day" column not providedtyper.Abort- if "timepoint" column not provided
Returns:
list- column name for the manifest filedata_frame- data_frame with unique ids to traverse over
generate_repo_paths
generate_repo_path
def generate_repo_path(repo_path=None, script_path=None, template_path=None, template_days=None)Generate path to create_report.R and template RMarkdown file
Arguments:
repo_pathpathlib.Path, optional - Path to clone of git repo access_data_analysis. Defaults to None.script_pathpathlib.Path, optional - Path to create_report.R. Defaults to None.template_pathpathlib.Path, optional - Path to template RMarkdown file. Defaults to None.template_daysbool, optional - True|False to use days template if using repo_path. Defaults to None.
Raises:
typer.Abort- Abort if both repo_path and script_path are not giventyper.Abort- Abort if both repo_path and template_path are not given
Returns:
str- Path to create_report.R and path to template markdown file
read_manifest
read_manifest
def read_manifest(manifest)Read manifest file
Arguments:
manifestpathlib.PATH - description
Returns:
data_frame- description
get_row
def get_row(tsv_file)Function to skip rows
Arguments:
tsv_filefile - file to be read
Returns:
list- lines to be skipped
get_small_variant_csv
get_small_variant_csv
def get_small_variant_csv(patient_id, csv_path)Get the path to CSV file to be used for a given patient containing all variants
Arguments:
patient_idstr - patient id used to identify the csv filecsv_pathpathlib.path - base path where the csv file is expected to be present
Raises:
typer.Abort- if no csv file is returnedtyper.Abort- if more then one csv file is returned
Returns:
str- path to csv file containing the variants
run_cmd
run_cmd
def run_cmd(cmd)Given a system command run it using subprocess
Arguments:
cmdstr - System command to be run as a string
run_multiple_cmd
def run_multiple_cmd(commands, parallel_process=None)Given a system command run it using subprocess
Arguments:
cmdlist[str] - list of system commands to be run
generate_facet_maf_path
generate_facet_maf_path
def generate_facet_maf_path(facet_path, patient_id, sample_id=None)Get path of maf associated with facet-suite output
Arguments:
facet_pathpathlib.PATH|str - path to search for the facet filepatient_idstr - patient id to be used to search, default is set to Nonesample_idstr - sample id to be used to search, default is set to None
Returns:
str- path of the facets maf
get_maf_path
def get_maf_path(maf_path, patient_id, sample_id)Get the path to the maf file
Arguments:
maf_pathpathlib.Path - Base path of the maf filepatient_idstr: DMP Patient ID for facetssample_idstr - DMP Sample ID if any for facets
Returns:
str- Path to the maf file
get_best_fit_folder
def get_best_fit_folder(facet_manifest_path)Get the best fit folder for the given facet manifest path
Arguments:
facet_manifest_pathstr - manifest path to be used for determining best fit
Returns:
pathlib.Path- path to the folder containing best fit maf files
generate_create_report_cmd
generate_create_report_cmd
def generate_create_report_cmd(script, markdown, template_file, cmo_patient_id, csv_file, manifest, cnv_path, dmp_patient_id, dmp_sample_id, dmp_facet_maf, tumor_type=None)Create the system command that should be run for create_report.R
Arguments:
scriptstr - path for create_report.Rmarkdownbool - True|False to generate markdown outputtemplate_filestr - path for the template filecmo_patient_idstr - patient id from CMOcsv_filestr - path to csv file containing variant informationtumor_typestr - tumor type labelmanifestpathlib.Path - path to the manifest containing meta datacnv_pathpathlib.Path - path to directory having cnv filesdmp_patient_idstr - patient id of the clinical msk-impact sampledmp_sample_idstr - sample id of the clinical msk-impact sampledmp_facet_mafstr - path to the clinical msk-impact maf file annotated for facets results
Returns:
cmdstr - system command to run for create_report.Rhtml_outputpathlib.Path - where the output file should be written
Last updated
Was this helpful?