Run create_report.R

This script enables to run the create_report.R script on multiple patients

Requirements

access_data_analysis=>0.1.2 # works with this repo tag
typer==0.3.2
typing_extensions==3.10.0.0
pandas==1.2.5
rich==12.1.0

run_create_report

Main Script (run_create_report.py)

Usage: run_create_report.py [OPTIONS]

Options:
  -r, --repo PATH                 Base path to where the git repository is
                                  located for access_data_analysis

  -s, --script PATH               Path to the create_report.R script, fall
                                  back if `--repo` is not given

  -t, --template PATH             Path to the template.Rmd or
                                  template_days.Rmd to be used with
                                  create_report.R when `--repo` is not given

  -m, --manifest FILE             File containing meta information per sample.
                                  Require following columns in the header:
                                  cmo_patient_id, sample_id, dmp_patient_id,
                                  collection_date or collection_day,
                                  timepoint. If dmp_sample_id column is given
                                  and has information that will be used to run
                                  facets. If dmp_sample_id is not given and
                                  dmp_patient_id is given than it will be used
                                  to get the Tumor sample with lowest number.
                                  If dmp_sample_id or dmp_patient_id is not
                                  given then it will run without the facet maf
                                  file  [required]

  -v, --variant-results DIRECTORY
                                  Base path for all results of small variants
                                  as generated by filter_calls.R script in
                                  access_data_analysis (Make sure only High
                                  Confidence calls are included)  [required]

  -c, --cnv-results DIRECTORY     Base path for all results of CNV as
                                  generated by CNV_processing.R script in
                                  access_data_analysis  [required]

  -f, --facet-repo DIRECTORY      Base path for all results of facets on
                                  Clinical MSK-IMPACT samples  [default: /juno
                                  /work/ccs/shared/resources/impact/facets/all
                                  /]

  -bf, --best-fit                 If this is set to True then we will attempt
                                  to parse `facets_review.manifest` file to
                                  pick the best fit for a given dmp_sample_id
                                  [default: False]

  -l, --tumor-type TEXT           Tumor type label for the report  [required]
  -cfm, --copy-facet-maf          If this is set to True then we will copy the
                                  facet maf file in the directory specified in
                                  `copy_facet_dir`  [default: False]

  -cfd, --copy-facet-dir PATH     Directory path where the facet maf file
                                  should be copied.

  -d, --template-days             If the `--repo` option is specified and if
                                  this is set to True then we will use the
                                  template_days RMarkdown file as the template
                                  [default: False]

  -gm, --generate-markdown        If given, the create_report.R will be run
                                  with `-md` flag to generate markdown
                                  [default: False]

  -ff, --force                    If this is set to True then we will not stop
                                  if an error is encountered in a given sample
                                  while running create_report.R but keep on
                                  running for the next sample  [default:
                                  False]

  --install-completion            Install completion for the current shell.
  --show-completion               Show completion for the current shell, to
                                  copy it or customize the installation.

  --help                          Show this message and exit.

Wrapper script to run create_report.R

Arguments:

  • repo_path Path, optional - "Base path to where the git repository is located for access_data_analysis".

  • script_path Path, optional - "Path to the create_report.R script, fall back if --repo is not given".

  • template_path Path, optional - "Path to the template.Rmd or template_days.Rmd to be used with create_report.R when --repo is not given".

  • manifest Path, required - "File containing meta information per sample. Require following columns in the header: cmo_patient_id, sample_id, dmp_patient_id, collection_date or collection_day, timepoint. If dmp_sample_id column is given and has information that will be used to run facets. if dmp_sample_id is not given and dmp_patient_id is given than it will be used to get the Tumor sample with lowest number.If dmp_sample_id or dmp_patient_id is not given then it will run without the facet maf file".

  • variant_path Path, required - "Base path for all results of small variants as generated by filter_calls.R script in access_data_analysis (Make sure only High Confidence calls are included)".

  • cnv_path Path, required - "Base path for all results of CNV as generated by CNV_processing.R script in access_data_analysis".

  • facet_repo Path, required - "Base path for all results of facets on Clinical MSK-IMPACT samples".

  • best_fit bool, optional - "If this is set to True then we will attempt to parse facets_review.manifest file to pick the best fit for a given dmp_sample_id".

  • tumor_type str, required - "Tumor type label for the report".

  • copy_facet bool, optional - "If this is set to True then we will copy the facet maf file in the directory specified in copy_facet_dir".

  • copy_facet_dir Path, optional - "Directory path where the facet maf file should be copied.".

  • template_days bool, optional - "If the --repo option is specified and if this is set to True then we will use the template_days RMarkdown file as the template".

  • markdown bool, optional - "If given, the create_report.R will be run with -md flag to generate markdown".

  • force bool, optional - "If this is set to True then we will not stop if an error is encountered in a given sample but keep on running for the next sample".

Usage

  • Using Generate Markdown, copy facet maf file, use template_days RMarkdown, force flag and best fit for facets

> python python/run_create_report/run_create_report.py \
-m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
-r /home/shahr2/github/access_data_analysis \
-v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
-c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
-l "Melanoma" -gm -d -cfm -ff -bf
  • Using Generate Markdown, force flag and default fit for facets

> python python/run_create_report/run_create_report.py \
-m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
-r /home/shahr2/github/access_data_analysis \
-v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
-c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
-l "Melanoma" -gm -ff

Submodules

check_required_columns

check_required_columns

def check_required_columns(manifest, template_days=None)

Check if all required columns are present in the sample manifest file

Arguments:

  • manifest data_frame - meta information file with information for each sample

  • template_days bool - True|False if template days RMarkdown will be used

Raises:

  • typer.Abort - if "cmo_patient_id" column not provided

  • typer.Abort - if "cmo_sample_id/sample_id" column not provided

  • typer.Abort - if "dmp_patient_id" column not provided

  • typer.Abort - if "collection_date/collection_day" column not provided

  • typer.Abort - if "timepoint" column not provided

Returns:

  • list - column name for the manifest file

  • data_frame - data_frame with unique ids to traverse over

generate_repo_paths

generate_repo_path

def generate_repo_path(repo_path=None, script_path=None, template_path=None, template_days=None)

Generate path to create_report.R and template RMarkdown file

Arguments:

  • repo_path pathlib.Path, optional - Path to clone of git repo access_data_analysis. Defaults to None.

  • script_path pathlib.Path, optional - Path to create_report.R. Defaults to None.

  • template_path pathlib.Path, optional - Path to template RMarkdown file. Defaults to None.

  • template_days bool, optional - True|False to use days template if using repo_path. Defaults to None.

Raises:

  • typer.Abort - Abort if both repo_path and script_path are not given

  • typer.Abort - Abort if both repo_path and template_path are not given

Returns:

  • str - Path to create_report.R and path to template markdown file

read_manifest

read_manifest

def read_manifest(manifest)

Read manifest file

Arguments:

  • manifest pathlib.PATH - description

Returns:

  • data_frame - description

get_row

def get_row(tsv_file)

Function to skip rows

Arguments:

  • tsv_file file - file to be read

Returns:

  • list - lines to be skipped

get_small_variant_csv

get_small_variant_csv

def get_small_variant_csv(patient_id, csv_path)

Get the path to CSV file to be used for a given patient containing all variants

Arguments:

  • patient_id str - patient id used to identify the csv file

  • csv_path pathlib.path - base path where the csv file is expected to be present

Raises:

  • typer.Abort - if no csv file is returned

  • typer.Abort - if more then one csv file is returned

Returns:

  • str - path to csv file containing the variants

run_cmd

run_cmd

def run_cmd(cmd)

Given a system command run it using subprocess

Arguments:

  • cmd str - System command to be run as a string

run_multiple_cmd

def run_multiple_cmd(commands, parallel_process=None)

Given a system command run it using subprocess

Arguments:

  • cmd list[str] - list of system commands to be run

generate_facet_maf_path

generate_facet_maf_path

def generate_facet_maf_path(facet_path, patient_id, sample_id=None)

Get path of maf associated with facet-suite output

Arguments:

  • facet_path pathlib.PATH|str - path to search for the facet file

  • patient_id str - patient id to be used to search, default is set to None

  • sample_id str - sample id to be used to search, default is set to None

Returns:

  • str - path of the facets maf

get_maf_path

def get_maf_path(maf_path, patient_id, sample_id)

Get the path to the maf file

Arguments:

  • maf_path pathlib.Path - Base path of the maf file

  • patient_id str: DMP Patient ID for facets

  • sample_id str - DMP Sample ID if any for facets

Returns:

  • str - Path to the maf file

get_best_fit_folder

def get_best_fit_folder(facet_manifest_path)

Get the best fit folder for the given facet manifest path

Arguments:

  • facet_manifest_path str - manifest path to be used for determining best fit

Returns:

  • pathlib.Path - path to the folder containing best fit maf files

generate_create_report_cmd

generate_create_report_cmd

def generate_create_report_cmd(script, markdown, template_file, cmo_patient_id, csv_file, manifest, cnv_path, dmp_patient_id, dmp_sample_id, dmp_facet_maf, tumor_type=None)

Create the system command that should be run for create_report.R

Arguments:

  • script str - path for create_report.R

  • markdown bool - True|False to generate markdown output

  • template_file str - path for the template file

  • cmo_patient_id str - patient id from CMO

  • csv_file str - path to csv file containing variant information

  • tumor_type str - tumor type label

  • manifest pathlib.Path - path to the manifest containing meta data

  • cnv_path pathlib.Path - path to directory having cnv files

  • dmp_patient_id str - patient id of the clinical msk-impact sample

  • dmp_sample_id str - sample id of the clinical msk-impact sample

  • dmp_facet_maf str - path to the clinical msk-impact maf file annotated for facets results

Returns:

  • cmd str - system command to run for create_report.R

  • html_output pathlib.Path - where the output file should be written

Last updated