> For the complete documentation index, see [llms.txt](https://cmo-ci.gitbook.io/cmo-access-data-analysis/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://cmo-ci.gitbook.io/cmo-access-data-analysis/run-create_report.r.md).

# Run create\_report.R

* [Requirements](#requirements)
* [run\_create\_report](#run_create_report)
  * [Main Script (run\_create\_report.py)](#main-script-run_create_report.py)
* [Submodules](#submodules)

## Requirements

```bash
access_data_analysis=>0.1.2 # works with this repo tag
typer==0.3.2
typing_extensions==3.10.0.0
pandas==1.2.5
rich==12.1.0
```

## run\_create\_report

### Main Script (run\_create\_report.py)

```bash
Usage: run_create_report.py [OPTIONS]

Options:
  -r, --repo PATH                 Base path to where the git repository is
                                  located for access_data_analysis

  -s, --script PATH               Path to the create_report.R script, fall
                                  back if `--repo` is not given

  -t, --template PATH             Path to the template.Rmd or
                                  template_days.Rmd to be used with
                                  create_report.R when `--repo` is not given

  -m, --manifest FILE             File containing meta information per sample.
                                  Require following columns in the header:
                                  cmo_patient_id, sample_id, dmp_patient_id,
                                  collection_date or collection_day,
                                  timepoint. If dmp_sample_id column is given
                                  and has information that will be used to run
                                  facets. If dmp_sample_id is not given and
                                  dmp_patient_id is given than it will be used
                                  to get the Tumor sample with lowest number.
                                  If dmp_sample_id or dmp_patient_id is not
                                  given then it will run without the facet maf
                                  file  [required]

  -v, --variant-results DIRECTORY
                                  Base path for all results of small variants
                                  as generated by filter_calls.R script in
                                  access_data_analysis (Make sure only High
                                  Confidence calls are included)  [required]

  -c, --cnv-results DIRECTORY     Base path for all results of CNV as
                                  generated by CNV_processing.R script in
                                  access_data_analysis  [required]

  -f, --facet-repo DIRECTORY      Base path for all results of facets on
                                  Clinical MSK-IMPACT samples  [default: /juno
                                  /work/ccs/shared/resources/impact/facets/all
                                  /]

  -bf, --best-fit                 If this is set to True then we will attempt
                                  to parse `facets_review.manifest` file to
                                  pick the best fit for a given dmp_sample_id
                                  [default: False]

  -l, --tumor-type TEXT           Tumor type label for the report  [required]
  -cfm, --copy-facet-maf          If this is set to True then we will copy the
                                  facet maf file in the directory specified in
                                  `copy_facet_dir`  [default: False]

  -cfd, --copy-facet-dir PATH     Directory path where the facet maf file
                                  should be copied.

  -d, --template-days             If the `--repo` option is specified and if
                                  this is set to True then we will use the
                                  template_days RMarkdown file as the template
                                  [default: False]

  -gm, --generate-markdown        If given, the create_report.R will be run
                                  with `-md` flag to generate markdown
                                  [default: False]

  -ff, --force                    If this is set to True then we will not stop
                                  if an error is encountered in a given sample
                                  while running create_report.R but keep on
                                  running for the next sample  [default:
                                  False]

  --install-completion            Install completion for the current shell.
  --show-completion               Show completion for the current shell, to
                                  copy it or customize the installation.

  --help                          Show this message and exit.
```

Wrapper script to run create\_report.R

**Arguments**:

* `repo_path` *Path, optional* - "Base path to where the git repository is located for access\_data\_analysis".
* `script_path` *Path, optional* - "Path to the create\_report.R script, fall back if `--repo` is not given".
* `template_path` *Path, optional* - "Path to the template.Rmd or template\_days.Rmd to be used with create\_report.R when `--repo` is not given".
* `manifest` *Path, required* - "File containing meta information per sample. Require following columns in the header: `cmo_patient_id`, `sample_id`, `dmp_patient_id`, `collection_date` or `collection_day`, `timepoint`. If dmp\_sample\_id column is given and has information that will be used to run facets. if dmp\_sample\_id is not given and dmp\_patient\_id is given than it will be used to get the Tumor sample with lowest number.If dmp\_sample\_id or dmp\_patient\_id is not given then it will run without the facet maf file".
* `variant_path` *Path, required* - "Base path for all results of small variants as generated by filter\_calls.R script in access\_data\_analysis (Make sure only High Confidence calls are included)".
* `cnv_path` *Path, required* - "Base path for all results of CNV as generated by CNV\_processing.R script in access\_data\_analysis".
* `facet_repo` *Path, required* - "Base path for all results of facets on Clinical MSK-IMPACT samples".
* `best_fit` *bool, optional* - "If this is set to True then we will attempt to parse `facets_review.manifest` file to pick the best fit for a given dmp\_sample\_id".
* `tumor_type` *str, required* - "Tumor type label for the report".
* `copy_facet` *bool, optional* - "If this is set to True then we will copy the facet maf file in the directory specified in `copy_facet_dir`".
* `copy_facet_dir` *Path, optional* - "Directory path where the facet maf file should be copied.".
* `template_days` *bool, optional* - "If the `--repo` option is specified and if this is set to True then we will use the template\_days RMarkdown file as the template".
* `markdown` *bool, optional* - "If given, the create\_report.R will be run with `-md` flag to generate markdown".
* `force` *bool, optional* - "If this is set to True then we will not stop if an error is encountered in a given sample but keep on running for the next sample".

#### **Usage**

* Using Generate Markdown, copy facet maf file, use template\_days RMarkdown, force flag and best fit for facets

```bash
> python python/run_create_report/run_create_report.py \
-m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
-r /home/shahr2/github/access_data_analysis \
-v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
-c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
-l "Melanoma" -gm -d -cfm -ff -bf
```

* Using Generate Markdown, force flag and default fit for facets

```bash
> python python/run_create_report/run_create_report.py \
-m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
-r /home/shahr2/github/access_data_analysis \
-v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
-c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
-l "Melanoma" -gm -ff
```

## Submodules

### check\_required\_columns

#### check\_required\_columns

```python
def check_required_columns(manifest, template_days=None)
```

Check if all required columns are present in the sample manifest file

**Arguments**:

* `manifest` *data\_frame* - meta information file with information for each sample
* `template_days` *bool* - True|False if template days RMarkdown will be used

**Raises**:

* `typer.Abort` - if "cmo\_patient\_id" column not provided
* `typer.Abort` - if "cmo\_sample\_id/sample\_id" column not provided
* `typer.Abort` - if "dmp\_patient\_id" column not provided
* `typer.Abort` - if "collection\_date/collection\_day" column not provided
* `typer.Abort` - if "timepoint" column not provided

**Returns**:

* `list` - column name for the manifest file
* `data_frame` - data\_frame with unique ids to traverse over

### generate\_repo\_paths

#### generate\_repo\_path

```python
def generate_repo_path(repo_path=None, script_path=None, template_path=None, template_days=None)
```

Generate path to create\_report.R and template RMarkdown file

**Arguments**:

* `repo_path` *pathlib.Path, optional* - Path to clone of git repo access\_data\_analysis. Defaults to None.
* `script_path` *pathlib.Path, optional* - Path to create\_report.R. Defaults to None.
* `template_path` *pathlib.Path, optional* - Path to template RMarkdown file. Defaults to None.
* `template_days` *bool, optional* - True|False to use days template if using repo\_path. Defaults to None.

**Raises**:

* `typer.Abort` - Abort if both repo\_path and script\_path are not given
* `typer.Abort` - Abort if both repo\_path and template\_path are not given

**Returns**:

* `str` - Path to create\_report.R and path to template markdown file

### read\_manifest

#### read\_manifest

```python
def read_manifest(manifest)
```

Read manifest file

**Arguments**:

* `manifest` *pathlib.PATH* - *description*

**Returns**:

* `data_frame` - *description*

#### get\_row

```python
def get_row(tsv_file)
```

Function to skip rows

**Arguments**:

* `tsv_file` *file* - file to be read

**Returns**:

* `list` - lines to be skipped

### get\_small\_variant\_csv

#### get\_small\_variant\_csv

```python
def get_small_variant_csv(patient_id, csv_path)
```

Get the path to CSV file to be used for a given patient containing all variants

**Arguments**:

* `patient_id` *str* - patient id used to identify the csv file
* `csv_path` *pathlib.path* - base path where the csv file is expected to be present

**Raises**:

* `typer.Abort` - if no csv file is returned
* `typer.Abort` - if more then one csv file is returned

**Returns**:

* `str` - path to csv file containing the variants

### run\_cmd

#### run\_cmd

```python
def run_cmd(cmd)
```

Given a system command run it using subprocess

**Arguments**:

* `cmd` *str* - System command to be run as a string

#### run\_multiple\_cmd

```python
def run_multiple_cmd(commands, parallel_process=None)
```

Given a system command run it using subprocess

**Arguments**:

* `cmd` *list\[str]* - list of system commands to be run

### generate\_facet\_maf\_path

#### generate\_facet\_maf\_path

```python
def generate_facet_maf_path(facet_path, patient_id, sample_id=None)
```

Get path of maf associated with facet-suite output

**Arguments**:

* `facet_path` *pathlib.PATH|str* - path to search for the facet file
* `patient_id` *str* - patient id to be used to search, default is set to None
* `sample_id` *str* - sample id to be used to search, default is set to None

**Returns**:

* `str` - path of the facets maf

#### get\_maf\_path

```python
def get_maf_path(maf_path, patient_id, sample_id)
```

Get the path to the maf file

**Arguments**:

* `maf_path` *pathlib.Path* - Base path of the maf file
* `patient_id` *str*: DMP Patient ID for facets
* `sample_id` *str* - DMP Sample ID if any for facets

**Returns**:

* `str` - Path to the maf file

#### get\_best\_fit\_folder

```python
def get_best_fit_folder(facet_manifest_path)
```

Get the best fit folder for the given facet manifest path

**Arguments**:

* `facet_manifest_path` *str* - manifest path to be used for determining best fit

**Returns**:

* `pathlib.Path` - path to the folder containing best fit maf files

### generate\_create\_report\_cmd

#### generate\_create\_report\_cmd

```python
def generate_create_report_cmd(script, markdown, template_file, cmo_patient_id, csv_file, manifest, cnv_path, dmp_patient_id, dmp_sample_id, dmp_facet_maf, tumor_type=None)
```

Create the system command that should be run for create\_report.R

**Arguments**:

* `script` *str* - path for create\_report.R
* `markdown` *bool* - True|False to generate markdown output
* `template_file` *str* - path for the template file
* `cmo_patient_id` *str* - patient id from CMO
* `csv_file` *str* - path to csv file containing variant information
* `tumor_type` *str* - tumor type label
* `manifest` *pathlib.Path* - path to the manifest containing meta data
* `cnv_path` *pathlib.Path* - path to directory having cnv files
* `dmp_patient_id` *str* - patient id of the clinical msk-impact sample
* `dmp_sample_id` *str* - sample id of the clinical msk-impact sample
* `dmp_facet_maf` *str* - path to the clinical msk-impact maf file annotated for facets results

**Returns**:

* `cmd` *str* - system command to run for create\_report.R
* `html_output` *pathlib.Path* - where the output file should be written


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://cmo-ci.gitbook.io/cmo-access-data-analysis/run-create_report.r.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
