Run create_report.R
This script enables to run the create_report.R script on multiple patients
Requirements
run_create_report
Main Script (run_create_report.py)
Wrapper script to run create_report.R
Arguments:
repo_path
Path, optional - "Base path to where the git repository is located for access_data_analysis".script_path
Path, optional - "Path to the create_report.R script, fall back if--repo
is not given".template_path
Path, optional - "Path to the template.Rmd or template_days.Rmd to be used with create_report.R when--repo
is not given".manifest
Path, required - "File containing meta information per sample. Require following columns in the header:cmo_patient_id
,sample_id
,dmp_patient_id
,collection_date
orcollection_day
,timepoint
. If dmp_sample_id column is given and has information that will be used to run facets. if dmp_sample_id is not given and dmp_patient_id is given than it will be used to get the Tumor sample with lowest number.If dmp_sample_id or dmp_patient_id is not given then it will run without the facet maf file".variant_path
Path, required - "Base path for all results of small variants as generated by filter_calls.R script in access_data_analysis (Make sure only High Confidence calls are included)".cnv_path
Path, required - "Base path for all results of CNV as generated by CNV_processing.R script in access_data_analysis".facet_repo
Path, required - "Base path for all results of facets on Clinical MSK-IMPACT samples".best_fit
bool, optional - "If this is set to True then we will attempt to parsefacets_review.manifest
file to pick the best fit for a given dmp_sample_id".tumor_type
str, required - "Tumor type label for the report".copy_facet
bool, optional - "If this is set to True then we will copy the facet maf file in the directory specified incopy_facet_dir
".copy_facet_dir
Path, optional - "Directory path where the facet maf file should be copied.".template_days
bool, optional - "If the--repo
option is specified and if this is set to True then we will use the template_days RMarkdown file as the template".markdown
bool, optional - "If given, the create_report.R will be run with-md
flag to generate markdown".force
bool, optional - "If this is set to True then we will not stop if an error is encountered in a given sample but keep on running for the next sample".
Usage
Using Generate Markdown, copy facet maf file, use template_days RMarkdown, force flag and best fit for facets
Using Generate Markdown, force flag and default fit for facets
Submodules
check_required_columns
check_required_columns
Check if all required columns are present in the sample manifest file
Arguments:
manifest
data_frame - meta information file with information for each sampletemplate_days
bool - True|False if template days RMarkdown will be used
Raises:
typer.Abort
- if "cmo_patient_id" column not providedtyper.Abort
- if "cmo_sample_id/sample_id" column not providedtyper.Abort
- if "dmp_patient_id" column not providedtyper.Abort
- if "collection_date/collection_day" column not providedtyper.Abort
- if "timepoint" column not provided
Returns:
list
- column name for the manifest filedata_frame
- data_frame with unique ids to traverse over
generate_repo_paths
generate_repo_path
Generate path to create_report.R and template RMarkdown file
Arguments:
repo_path
pathlib.Path, optional - Path to clone of git repo access_data_analysis. Defaults to None.script_path
pathlib.Path, optional - Path to create_report.R. Defaults to None.template_path
pathlib.Path, optional - Path to template RMarkdown file. Defaults to None.template_days
bool, optional - True|False to use days template if using repo_path. Defaults to None.
Raises:
typer.Abort
- Abort if both repo_path and script_path are not giventyper.Abort
- Abort if both repo_path and template_path are not given
Returns:
str
- Path to create_report.R and path to template markdown file
read_manifest
read_manifest
Read manifest file
Arguments:
manifest
pathlib.PATH - description
Returns:
data_frame
- description
get_row
Function to skip rows
Arguments:
tsv_file
file - file to be read
Returns:
list
- lines to be skipped
get_small_variant_csv
get_small_variant_csv
Get the path to CSV file to be used for a given patient containing all variants
Arguments:
patient_id
str - patient id used to identify the csv filecsv_path
pathlib.path - base path where the csv file is expected to be present
Raises:
typer.Abort
- if no csv file is returnedtyper.Abort
- if more then one csv file is returned
Returns:
str
- path to csv file containing the variants
run_cmd
run_cmd
Given a system command run it using subprocess
Arguments:
cmd
str - System command to be run as a string
run_multiple_cmd
Given a system command run it using subprocess
Arguments:
cmd
list[str] - list of system commands to be run
generate_facet_maf_path
generate_facet_maf_path
Get path of maf associated with facet-suite output
Arguments:
facet_path
pathlib.PATH|str - path to search for the facet filepatient_id
str - patient id to be used to search, default is set to Nonesample_id
str - sample id to be used to search, default is set to None
Returns:
str
- path of the facets maf
get_maf_path
Get the path to the maf file
Arguments:
maf_path
pathlib.Path - Base path of the maf filepatient_id
str: DMP Patient ID for facetssample_id
str - DMP Sample ID if any for facets
Returns:
str
- Path to the maf file
get_best_fit_folder
Get the best fit folder for the given facet manifest path
Arguments:
facet_manifest_path
str - manifest path to be used for determining best fit
Returns:
pathlib.Path
- path to the folder containing best fit maf files
generate_create_report_cmd
generate_create_report_cmd
Create the system command that should be run for create_report.R
Arguments:
script
str - path for create_report.Rmarkdown
bool - True|False to generate markdown outputtemplate_file
str - path for the template filecmo_patient_id
str - patient id from CMOcsv_file
str - path to csv file containing variant informationtumor_type
str - tumor type labelmanifest
pathlib.Path - path to the manifest containing meta datacnv_path
pathlib.Path - path to directory having cnv filesdmp_patient_id
str - patient id of the clinical msk-impact sampledmp_sample_id
str - sample id of the clinical msk-impact sampledmp_facet_maf
str - path to the clinical msk-impact maf file annotated for facets results
Returns:
cmd
str - system command to run for create_report.Rhtml_output
pathlib.Path - where the output file should be written
Last updated