arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Run create_report.R

This script enables to run the create_report.R script on multiple patients

  • Requirements

  • run_create_report

hashtag
Requirements

hashtag
run_create_report

hashtag
Main Script (run_create_report.py)

Wrapper script to run create_report.R

Arguments:

  • repo_path Path, optional - "Base path to where the git repository is located for access_data_analysis".

  • script_path Path, optional - "Path to the create_report.R script, fall back if --repo is not given".

hashtag
Usage

  • Using Generate Markdown, copy facet maf file, use template_days RMarkdown, force flag and best fit for facets

  • Using Generate Markdown, force flag and default fit for facets

hashtag
Submodules

hashtag
check_required_columns

hashtag
check_required_columns

Check if all required columns are present in the sample manifest file

Arguments:

  • manifest data_frame - meta information file with information for each sample

  • template_days bool - True|False if template days RMarkdown will be used

Raises:

  • typer.Abort - if "cmo_patient_id" column not provided

  • typer.Abort - if "cmo_sample_id/sample_id" column not provided

  • typer.Abort

Returns:

  • list - column name for the manifest file

  • data_frame - data_frame with unique ids to traverse over

hashtag
generate_repo_paths

hashtag
generate_repo_path

Generate path to create_report.R and template RMarkdown file

Arguments:

  • repo_path pathlib.Path, optional - Path to clone of git repo access_data_analysis. Defaults to None.

  • script_path pathlib.Path, optional - Path to create_report.R. Defaults to None.

Raises:

  • typer.Abort - Abort if both repo_path and script_path are not given

  • typer.Abort - Abort if both repo_path and template_path are not given

Returns:

  • str - Path to create_report.R and path to template markdown file

hashtag
read_manifest

hashtag
read_manifest

Read manifest file

Arguments:

  • manifest pathlib.PATH - description

Returns:

  • data_frame - description

hashtag
get_row

Function to skip rows

Arguments:

  • tsv_file file - file to be read

Returns:

  • list - lines to be skipped

hashtag
get_small_variant_csv

hashtag
get_small_variant_csv

Get the path to CSV file to be used for a given patient containing all variants

Arguments:

  • patient_id str - patient id used to identify the csv file

  • csv_path pathlib.path - base path where the csv file is expected to be present

Raises:

  • typer.Abort - if no csv file is returned

  • typer.Abort - if more then one csv file is returned

Returns:

  • str - path to csv file containing the variants

hashtag
run_cmd

hashtag
run_cmd

Given a system command run it using subprocess

Arguments:

  • cmd str - System command to be run as a string

hashtag
run_multiple_cmd

Given a system command run it using subprocess

Arguments:

  • cmd list[str] - list of system commands to be run

hashtag
generate_facet_maf_path

hashtag
generate_facet_maf_path

Get path of maf associated with facet-suite output

Arguments:

  • facet_path pathlib.PATH|str - path to search for the facet file

  • patient_id str - patient id to be used to search, default is set to None

Returns:

  • str - path of the facets maf

hashtag
get_maf_path

Get the path to the maf file

Arguments:

  • maf_path pathlib.Path - Base path of the maf file

  • patient_id str: DMP Patient ID for facets

  • sample_id

Returns:

  • str - Path to the maf file

hashtag
get_best_fit_folder

Get the best fit folder for the given facet manifest path

Arguments:

  • facet_manifest_path str - manifest path to be used for determining best fit

Returns:

  • pathlib.Path - path to the folder containing best fit maf files

hashtag
generate_create_report_cmd

hashtag
generate_create_report_cmd

Create the system command that should be run for create_report.R

Arguments:

  • script str - path for create_report.R

  • markdown bool - True|False to generate markdown output

  • template_file

Returns:

  • cmd str - system command to run for create_report.R

  • html_output pathlib.Path - where the output file should be written

template_path Path, optional - "Path to the template.Rmd or template_days.Rmd to be used with create_report.R when --repo is not given".
  • manifest Path, required - "File containing meta information per sample. Require following columns in the header: cmo_patient_id, sample_id, dmp_patient_id, collection_date or collection_day, timepoint. If dmp_sample_id column is given and has information that will be used to run facets. if dmp_sample_id is not given and dmp_patient_id is given than it will be used to get the Tumor sample with lowest number.If dmp_sample_id or dmp_patient_id is not given then it will run without the facet maf file".

  • variant_path Path, required - "Base path for all results of small variants as generated by filter_calls.R script in access_data_analysis (Make sure only High Confidence calls are included)".

  • cnv_path Path, required - "Base path for all results of CNV as generated by CNV_processing.R script in access_data_analysis".

  • facet_repo Path, required - "Base path for all results of facets on Clinical MSK-IMPACT samples".

  • best_fit bool, optional - "If this is set to True then we will attempt to parse facets_review.manifest file to pick the best fit for a given dmp_sample_id".

  • tumor_type str, required - "Tumor type label for the report".

  • copy_facet bool, optional - "If this is set to True then we will copy the facet maf file in the directory specified in copy_facet_dir".

  • copy_facet_dir Path, optional - "Directory path where the facet maf file should be copied.".

  • template_days bool, optional - "If the --repo option is specified and if this is set to True then we will use the template_days RMarkdown file as the template".

  • markdown bool, optional - "If given, the create_report.R will be run with -md flag to generate markdown".

  • force bool, optional - "If this is set to True then we will not stop if an error is encountered in a given sample but keep on running for the next sample".

  • - if "dmp_patient_id" column not provided
  • typer.Abort - if "collection_date/collection_day" column not provided

  • typer.Abort - if "timepoint" column not provided

  • template_path pathlib.Path, optional - Path to template RMarkdown file. Defaults to None.
  • template_days bool, optional - True|False to use days template if using repo_path. Defaults to None.

  • sample_id
    str
    - sample id to be used to search, default is set to None
    str
    - DMP Sample ID if any for facets
    str
    - path for the template file
  • cmo_patient_id str - patient id from CMO

  • csv_file str - path to csv file containing variant information

  • tumor_type str - tumor type label

  • manifest pathlib.Path - path to the manifest containing meta data

  • cnv_path pathlib.Path - path to directory having cnv files

  • dmp_patient_id str - patient id of the clinical msk-impact sample

  • dmp_sample_id str - sample id of the clinical msk-impact sample

  • dmp_facet_maf str - path to the clinical msk-impact maf file annotated for facets results

  • Main Script (run_create_report.py)
    Submodules
    access_data_analysis=>0.1.2 # works with this repo tag
    typer==0.3.2
    typing_extensions==3.10.0.0
    pandas==1.2.5
    rich==12.1.0
    Usage: run_create_report.py [OPTIONS]
    
    Options:
      -r, --repo PATH                 Base path to where the git repository is
                                      located for access_data_analysis
    
      -s, --script PATH               Path to the create_report.R script, fall
                                      back if `--repo` is not given
    
      -t, --template PATH             Path to the template.Rmd or
                                      template_days.Rmd to be used with
                                      create_report.R when `--repo` is not given
    
      -m, --manifest FILE             File containing meta information per sample.
                                      Require following columns in the header:
                                      cmo_patient_id, sample_id, dmp_patient_id,
                                      collection_date or collection_day,
                                      timepoint. If dmp_sample_id column is given
                                      and has information that will be used to run
                                      facets. If dmp_sample_id is not given and
                                      dmp_patient_id is given than it will be used
                                      to get the Tumor sample with lowest number.
                                      If dmp_sample_id or dmp_patient_id is not
                                      given then it will run without the facet maf
                                      file  [required]
    
      -v, --variant-results DIRECTORY
                                      Base path for all results of small variants
                                      as generated by filter_calls.R script in
                                      access_data_analysis (Make sure only High
                                      Confidence calls are included)  [required]
    
      -c, --cnv-results DIRECTORY     Base path for all results of CNV as
                                      generated by CNV_processing.R script in
                                      access_data_analysis  [required]
    
      -f, --facet-repo DIRECTORY      Base path for all results of facets on
                                      Clinical MSK-IMPACT samples  [default: /juno
                                      /work/ccs/shared/resources/impact/facets/all
                                      /]
    
      -bf, --best-fit                 If this is set to True then we will attempt
                                      to parse `facets_review.manifest` file to
                                      pick the best fit for a given dmp_sample_id
                                      [default: False]
    
      -l, --tumor-type TEXT           Tumor type label for the report  [required]
      -cfm, --copy-facet-maf          If this is set to True then we will copy the
                                      facet maf file in the directory specified in
                                      `copy_facet_dir`  [default: False]
    
      -cfd, --copy-facet-dir PATH     Directory path where the facet maf file
                                      should be copied.
    
      -d, --template-days             If the `--repo` option is specified and if
                                      this is set to True then we will use the
                                      template_days RMarkdown file as the template
                                      [default: False]
    
      -gm, --generate-markdown        If given, the create_report.R will be run
                                      with `-md` flag to generate markdown
                                      [default: False]
    
      -ff, --force                    If this is set to True then we will not stop
                                      if an error is encountered in a given sample
                                      while running create_report.R but keep on
                                      running for the next sample  [default:
                                      False]
    
      --install-completion            Install completion for the current shell.
      --show-completion               Show completion for the current shell, to
                                      copy it or customize the installation.
    
      --help                          Show this message and exit.
    > python python/run_create_report/run_create_report.py \
    -m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
    -r /home/shahr2/github/access_data_analysis \
    -v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
    -c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
    -l "Melanoma" -gm -d -cfm -ff -bf
    > python python/run_create_report/run_create_report.py \
    -m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
    -r /home/shahr2/github/access_data_analysis \
    -v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
    -c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
    -l "Melanoma" -gm -ff
    def check_required_columns(manifest, template_days=None)
    def generate_repo_path(repo_path=None, script_path=None, template_path=None, template_days=None)
    def read_manifest(manifest)
    def get_row(tsv_file)
    def get_small_variant_csv(patient_id, csv_path)
    def run_cmd(cmd)
    def run_multiple_cmd(commands, parallel_process=None)
    def generate_facet_maf_path(facet_path, patient_id, sample_id=None)
    def get_maf_path(maf_path, patient_id, sample_id)
    def get_best_fit_folder(facet_manifest_path)
    def generate_create_report_cmd(script, markdown, template_file, cmo_patient_id, csv_file, manifest, cnv_path, dmp_patient_id, dmp_sample_id, dmp_facet_maf, tumor_type=None)