Run create_report.R

This script enables to run the create_report.R script on multiple patients

Requirements

access_data_analysis=>0.1.2 # works with this repo tag
typer==0.3.2
typing_extensions==3.10.0.0
pandas==1.2.5
rich==12.1.0

run_create_report

Main Script (run_create_report.py)

Usage: run_create_report.py [OPTIONS]

Options:
  -r, --repo PATH                 Base path to where the git repository is
                                  located for access_data_analysis

  -s, --script PATH               Path to the create_report.R script, fall
                                  back if `--repo` is not given

  -t, --template PATH             Path to the template.Rmd or
                                  template_days.Rmd to be used with
                                  create_report.R when `--repo` is not given

  -m, --manifest FILE             File containing meta information per sample.
                                  Require following columns in the header:
                                  cmo_patient_id, sample_id, dmp_patient_id,
                                  collection_date or collection_day,
                                  timepoint. If dmp_sample_id column is given
                                  and has information that will be used to run
                                  facets. If dmp_sample_id is not given and
                                  dmp_patient_id is given than it will be used
                                  to get the Tumor sample with lowest number.
                                  If dmp_sample_id or dmp_patient_id is not
                                  given then it will run without the facet maf
                                  file  [required]

  -v, --variant-results DIRECTORY
                                  Base path for all results of small variants
                                  as generated by filter_calls.R script in
                                  access_data_analysis (Make sure only High
                                  Confidence calls are included)  [required]

  -c, --cnv-results DIRECTORY     Base path for all results of CNV as
                                  generated by CNV_processing.R script in
                                  access_data_analysis  [required]

  -f, --facet-repo DIRECTORY      Base path for all results of facets on
                                  Clinical MSK-IMPACT samples  [default: /juno
                                  /work/ccs/shared/resources/impact/facets/all
                                  /]

  -bf, --best-fit                 If this is set to True then we will attempt
                                  to parse `facets_review.manifest` file to
                                  pick the best fit for a given dmp_sample_id
                                  [default: False]

  -l, --tumor-type TEXT           Tumor type label for the report  [required]
  -cfm, --copy-facet-maf          If this is set to True then we will copy the
                                  facet maf file in the directory specified in
                                  `copy_facet_dir`  [default: False]

  -cfd, --copy-facet-dir PATH     Directory path where the facet maf file
                                  should be copied.

  -d, --template-days             If the `--repo` option is specified and if
                                  this is set to True then we will use the
                                  template_days RMarkdown file as the template
                                  [default: False]

  -gm, --generate-markdown        If given, the create_report.R will be run
                                  with `-md` flag to generate markdown
                                  [default: False]

  -ff, --force                    If this is set to True then we will not stop
                                  if an error is encountered in a given sample
                                  while running create_report.R but keep on
                                  running for the next sample  [default:
                                  False]

  --install-completion            Install completion for the current shell.
  --show-completion               Show completion for the current shell, to
                                  copy it or customize the installation.

  --help                          Show this message and exit.

Wrapper script to run create_report.R

Arguments:

repo_path Path, optional - "Base path to where the git repository is located for access_data_analysis".
script_path Path, optional - "Path to the create_report.R script, fall back if --repo is not given".
template_path Path, optional - "Path to the template.Rmd or template_days.Rmd to be used with create_report.R when --repo is not given".
manifest Path, required - "File containing meta information per sample. Require following columns in the header: cmo_patient_id, sample_id, dmp_patient_id, collection_date or collection_day, timepoint. If dmp_sample_id column is given and has information that will be used to run facets. if dmp_sample_id is not given and dmp_patient_id is given than it will be used to get the Tumor sample with lowest number.If dmp_sample_id or dmp_patient_id is not given then it will run without the facet maf file".
variant_path Path, required - "Base path for all results of small variants as generated by filter_calls.R script in access_data_analysis (Make sure only High Confidence calls are included)".
cnv_path Path, required - "Base path for all results of CNV as generated by CNV_processing.R script in access_data_analysis".
facet_repo Path, required - "Base path for all results of facets on Clinical MSK-IMPACT samples".
best_fit bool, optional - "If this is set to True then we will attempt to parse facets_review.manifest file to pick the best fit for a given dmp_sample_id".
tumor_type str, required - "Tumor type label for the report".
copy_facet bool, optional - "If this is set to True then we will copy the facet maf file in the directory specified in copy_facet_dir".
copy_facet_dir Path, optional - "Directory path where the facet maf file should be copied.".
template_days bool, optional - "If the --repo option is specified and if this is set to True then we will use the template_days RMarkdown file as the template".
markdown bool, optional - "If given, the create_report.R will be run with -md flag to generate markdown".
force bool, optional - "If this is set to True then we will not stop if an error is encountered in a given sample but keep on running for the next sample".

Usage

Using Generate Markdown, copy facet maf file, use template_days RMarkdown, force flag and best fit for facets

> python python/run_create_report/run_create_report.py \
-m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
-r /home/shahr2/github/access_data_analysis \
-v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
-c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
-l "Melanoma" -gm -d -cfm -ff -bf

Using Generate Markdown, force flag and default fit for facets

> python python/run_create_report/run_create_report.py \
-m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
-r /home/shahr2/github/access_data_analysis \
-v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
-c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
-l "Melanoma" -gm -ff

Submodules

check_required_columns

def check_required_columns(manifest, template_days=None)

Check if all required columns are present in the sample manifest file

Arguments:

manifest data_frame - meta information file with information for each sample
template_days bool - True|False if template days RMarkdown will be used

Raises:

typer.Abort - if "cmo_patient_id" column not provided
typer.Abort - if "cmo_sample_id/sample_id" column not provided
typer.Abort - if "dmp_patient_id" column not provided
typer.Abort - if "collection_date/collection_day" column not provided
typer.Abort - if "timepoint" column not provided

Returns:

list - column name for the manifest file
data_frame - data_frame with unique ids to traverse over

generate_repo_paths

generate_repo_path

def generate_repo_path(repo_path=None, script_path=None, template_path=None, template_days=None)

Generate path to create_report.R and template RMarkdown file

Arguments:

repo_path pathlib.Path, optional - Path to clone of git repo access_data_analysis. Defaults to None.
script_path pathlib.Path, optional - Path to create_report.R. Defaults to None.
template_path pathlib.Path, optional - Path to template RMarkdown file. Defaults to None.
template_days bool, optional - True|False to use days template if using repo_path. Defaults to None.

Raises:

typer.Abort - Abort if both repo_path and script_path are not given
typer.Abort - Abort if both repo_path and template_path are not given

Returns:

str - Path to create_report.R and path to template markdown file

read_manifest

def read_manifest(manifest)

Read manifest file

Arguments:

manifest pathlib.PATH - description

Returns:

data_frame - description

get_row

def get_row(tsv_file)

Function to skip rows

Arguments:

tsv_file file - file to be read

Returns:

list - lines to be skipped

get_small_variant_csv

def get_small_variant_csv(patient_id, csv_path)

Get the path to CSV file to be used for a given patient containing all variants

Arguments:

patient_id str - patient id used to identify the csv file
csv_path pathlib.path - base path where the csv file is expected to be present

Raises:

typer.Abort - if no csv file is returned
typer.Abort - if more then one csv file is returned

Returns:

str - path to csv file containing the variants

run_cmd

def run_cmd(cmd)

Given a system command run it using subprocess

Arguments:

cmd str - System command to be run as a string

run_multiple_cmd

def run_multiple_cmd(commands, parallel_process=None)

Given a system command run it using subprocess

Arguments:

cmd list[str] - list of system commands to be run

generate_facet_maf_path

def generate_facet_maf_path(facet_path, patient_id, sample_id=None)

Get path of maf associated with facet-suite output

Arguments:

facet_path pathlib.PATH|str - path to search for the facet file
patient_id str - patient id to be used to search, default is set to None
sample_id str - sample id to be used to search, default is set to None

Returns:

str - path of the facets maf

get_maf_path

def get_maf_path(maf_path, patient_id, sample_id)

Get the path to the maf file

Arguments:

maf_path pathlib.Path - Base path of the maf file
patient_id str: DMP Patient ID for facets
sample_id str - DMP Sample ID if any for facets

Returns:

str - Path to the maf file

get_best_fit_folder

def get_best_fit_folder(facet_manifest_path)

Get the best fit folder for the given facet manifest path

Arguments:

facet_manifest_path str - manifest path to be used for determining best fit

Returns:

pathlib.Path - path to the folder containing best fit maf files

generate_create_report_cmd

def generate_create_report_cmd(script, markdown, template_file, cmo_patient_id, csv_file, manifest, cnv_path, dmp_patient_id, dmp_sample_id, dmp_facet_maf, tumor_type=None)

Create the system command that should be run for create_report.R

Arguments:

script str - path for create_report.R
markdown bool - True|False to generate markdown output
template_file str - path for the template file
cmo_patient_id str - patient id from CMO
csv_file str - path to csv file containing variant information
tumor_type str - tumor type label
manifest pathlib.Path - path to the manifest containing meta data
cnv_path pathlib.Path - path to directory having cnv files
dmp_patient_id str - patient id of the clinical msk-impact sample
dmp_sample_id str - sample id of the clinical msk-impact sample
dmp_facet_maf str - path to the clinical msk-impact maf file annotated for facets results

Returns:

cmd str - system command to run for create_report.R
html_output pathlib.Path - where the output file should be written

PreviousSwimmer Plot Scripts NextConvert CSV to MAF

Last updated 2 years ago

Was this helpful?