# Run create\_report.R

* [Requirements](#requirements)
* [run\_create\_report](#run_create_report)
  * [Main Script (run\_create\_report.py)](#main-script-run_create_report.py)
* [Submodules](#submodules)

## Requirements

```bash
access_data_analysis=>0.1.2 # works with this repo tag
typer==0.3.2
typing_extensions==3.10.0.0
pandas==1.2.5
rich==12.1.0
```

## run\_create\_report

### Main Script (run\_create\_report.py)

```bash
Usage: run_create_report.py [OPTIONS]

Options:
  -r, --repo PATH                 Base path to where the git repository is
                                  located for access_data_analysis

  -s, --script PATH               Path to the create_report.R script, fall
                                  back if `--repo` is not given

  -t, --template PATH             Path to the template.Rmd or
                                  template_days.Rmd to be used with
                                  create_report.R when `--repo` is not given

  -m, --manifest FILE             File containing meta information per sample.
                                  Require following columns in the header:
                                  cmo_patient_id, sample_id, dmp_patient_id,
                                  collection_date or collection_day,
                                  timepoint. If dmp_sample_id column is given
                                  and has information that will be used to run
                                  facets. If dmp_sample_id is not given and
                                  dmp_patient_id is given than it will be used
                                  to get the Tumor sample with lowest number.
                                  If dmp_sample_id or dmp_patient_id is not
                                  given then it will run without the facet maf
                                  file  [required]

  -v, --variant-results DIRECTORY
                                  Base path for all results of small variants
                                  as generated by filter_calls.R script in
                                  access_data_analysis (Make sure only High
                                  Confidence calls are included)  [required]

  -c, --cnv-results DIRECTORY     Base path for all results of CNV as
                                  generated by CNV_processing.R script in
                                  access_data_analysis  [required]

  -f, --facet-repo DIRECTORY      Base path for all results of facets on
                                  Clinical MSK-IMPACT samples  [default: /juno
                                  /work/ccs/shared/resources/impact/facets/all
                                  /]

  -bf, --best-fit                 If this is set to True then we will attempt
                                  to parse `facets_review.manifest` file to
                                  pick the best fit for a given dmp_sample_id
                                  [default: False]

  -l, --tumor-type TEXT           Tumor type label for the report  [required]
  -cfm, --copy-facet-maf          If this is set to True then we will copy the
                                  facet maf file in the directory specified in
                                  `copy_facet_dir`  [default: False]

  -cfd, --copy-facet-dir PATH     Directory path where the facet maf file
                                  should be copied.

  -d, --template-days             If the `--repo` option is specified and if
                                  this is set to True then we will use the
                                  template_days RMarkdown file as the template
                                  [default: False]

  -gm, --generate-markdown        If given, the create_report.R will be run
                                  with `-md` flag to generate markdown
                                  [default: False]

  -ff, --force                    If this is set to True then we will not stop
                                  if an error is encountered in a given sample
                                  while running create_report.R but keep on
                                  running for the next sample  [default:
                                  False]

  --install-completion            Install completion for the current shell.
  --show-completion               Show completion for the current shell, to
                                  copy it or customize the installation.

  --help                          Show this message and exit.
```

Wrapper script to run create\_report.R

**Arguments**:

* `repo_path` *Path, optional* - "Base path to where the git repository is located for access\_data\_analysis".
* `script_path` *Path, optional* - "Path to the create\_report.R script, fall back if `--repo` is not given".
* `template_path` *Path, optional* - "Path to the template.Rmd or template\_days.Rmd to be used with create\_report.R when `--repo` is not given".
* `manifest` *Path, required* - "File containing meta information per sample. Require following columns in the header: `cmo_patient_id`, `sample_id`, `dmp_patient_id`, `collection_date` or `collection_day`, `timepoint`. If dmp\_sample\_id column is given and has information that will be used to run facets. if dmp\_sample\_id is not given and dmp\_patient\_id is given than it will be used to get the Tumor sample with lowest number.If dmp\_sample\_id or dmp\_patient\_id is not given then it will run without the facet maf file".
* `variant_path` *Path, required* - "Base path for all results of small variants as generated by filter\_calls.R script in access\_data\_analysis (Make sure only High Confidence calls are included)".
* `cnv_path` *Path, required* - "Base path for all results of CNV as generated by CNV\_processing.R script in access\_data\_analysis".
* `facet_repo` *Path, required* - "Base path for all results of facets on Clinical MSK-IMPACT samples".
* `best_fit` *bool, optional* - "If this is set to True then we will attempt to parse `facets_review.manifest` file to pick the best fit for a given dmp\_sample\_id".
* `tumor_type` *str, required* - "Tumor type label for the report".
* `copy_facet` *bool, optional* - "If this is set to True then we will copy the facet maf file in the directory specified in `copy_facet_dir`".
* `copy_facet_dir` *Path, optional* - "Directory path where the facet maf file should be copied.".
* `template_days` *bool, optional* - "If the `--repo` option is specified and if this is set to True then we will use the template\_days RMarkdown file as the template".
* `markdown` *bool, optional* - "If given, the create\_report.R will be run with `-md` flag to generate markdown".
* `force` *bool, optional* - "If this is set to True then we will not stop if an error is encountered in a given sample but keep on running for the next sample".

#### **Usage**

* Using Generate Markdown, copy facet maf file, use template\_days RMarkdown, force flag and best fit for facets

```bash
> python python/run_create_report/run_create_report.py \
-m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
-r /home/shahr2/github/access_data_analysis \
-v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
-c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
-l "Melanoma" -gm -d -cfm -ff -bf
```

* Using Generate Markdown, force flag and default fit for facets

```bash
> python python/run_create_report/run_create_report.py \
-m /home/shahr2/bergerlab/Project_10619_D/small_variants/manifest_noDate_days.tsv \
-r /home/shahr2/github/access_data_analysis \
-v /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/results_stringent/ \
-c /home/shahr2/bergerlab/Project_10619_D/small_variants/results_20Jan2023/CNA_final_call_set \
-l "Melanoma" -gm -ff
```

## Submodules

### check\_required\_columns

#### check\_required\_columns

```python
def check_required_columns(manifest, template_days=None)
```

Check if all required columns are present in the sample manifest file

**Arguments**:

* `manifest` *data\_frame* - meta information file with information for each sample
* `template_days` *bool* - True|False if template days RMarkdown will be used

**Raises**:

* `typer.Abort` - if "cmo\_patient\_id" column not provided
* `typer.Abort` - if "cmo\_sample\_id/sample\_id" column not provided
* `typer.Abort` - if "dmp\_patient\_id" column not provided
* `typer.Abort` - if "collection\_date/collection\_day" column not provided
* `typer.Abort` - if "timepoint" column not provided

**Returns**:

* `list` - column name for the manifest file
* `data_frame` - data\_frame with unique ids to traverse over

### generate\_repo\_paths

#### generate\_repo\_path

```python
def generate_repo_path(repo_path=None, script_path=None, template_path=None, template_days=None)
```

Generate path to create\_report.R and template RMarkdown file

**Arguments**:

* `repo_path` *pathlib.Path, optional* - Path to clone of git repo access\_data\_analysis. Defaults to None.
* `script_path` *pathlib.Path, optional* - Path to create\_report.R. Defaults to None.
* `template_path` *pathlib.Path, optional* - Path to template RMarkdown file. Defaults to None.
* `template_days` *bool, optional* - True|False to use days template if using repo\_path. Defaults to None.

**Raises**:

* `typer.Abort` - Abort if both repo\_path and script\_path are not given
* `typer.Abort` - Abort if both repo\_path and template\_path are not given

**Returns**:

* `str` - Path to create\_report.R and path to template markdown file

### read\_manifest

#### read\_manifest

```python
def read_manifest(manifest)
```

Read manifest file

**Arguments**:

* `manifest` *pathlib.PATH* - *description*

**Returns**:

* `data_frame` - *description*

#### get\_row

```python
def get_row(tsv_file)
```

Function to skip rows

**Arguments**:

* `tsv_file` *file* - file to be read

**Returns**:

* `list` - lines to be skipped

### get\_small\_variant\_csv

#### get\_small\_variant\_csv

```python
def get_small_variant_csv(patient_id, csv_path)
```

Get the path to CSV file to be used for a given patient containing all variants

**Arguments**:

* `patient_id` *str* - patient id used to identify the csv file
* `csv_path` *pathlib.path* - base path where the csv file is expected to be present

**Raises**:

* `typer.Abort` - if no csv file is returned
* `typer.Abort` - if more then one csv file is returned

**Returns**:

* `str` - path to csv file containing the variants

### run\_cmd

#### run\_cmd

```python
def run_cmd(cmd)
```

Given a system command run it using subprocess

**Arguments**:

* `cmd` *str* - System command to be run as a string

#### run\_multiple\_cmd

```python
def run_multiple_cmd(commands, parallel_process=None)
```

Given a system command run it using subprocess

**Arguments**:

* `cmd` *list\[str]* - list of system commands to be run

### generate\_facet\_maf\_path

#### generate\_facet\_maf\_path

```python
def generate_facet_maf_path(facet_path, patient_id, sample_id=None)
```

Get path of maf associated with facet-suite output

**Arguments**:

* `facet_path` *pathlib.PATH|str* - path to search for the facet file
* `patient_id` *str* - patient id to be used to search, default is set to None
* `sample_id` *str* - sample id to be used to search, default is set to None

**Returns**:

* `str` - path of the facets maf

#### get\_maf\_path

```python
def get_maf_path(maf_path, patient_id, sample_id)
```

Get the path to the maf file

**Arguments**:

* `maf_path` *pathlib.Path* - Base path of the maf file
* `patient_id` *str*: DMP Patient ID for facets
* `sample_id` *str* - DMP Sample ID if any for facets

**Returns**:

* `str` - Path to the maf file

#### get\_best\_fit\_folder

```python
def get_best_fit_folder(facet_manifest_path)
```

Get the best fit folder for the given facet manifest path

**Arguments**:

* `facet_manifest_path` *str* - manifest path to be used for determining best fit

**Returns**:

* `pathlib.Path` - path to the folder containing best fit maf files

### generate\_create\_report\_cmd

#### generate\_create\_report\_cmd

```python
def generate_create_report_cmd(script, markdown, template_file, cmo_patient_id, csv_file, manifest, cnv_path, dmp_patient_id, dmp_sample_id, dmp_facet_maf, tumor_type=None)
```

Create the system command that should be run for create\_report.R

**Arguments**:

* `script` *str* - path for create\_report.R
* `markdown` *bool* - True|False to generate markdown output
* `template_file` *str* - path for the template file
* `cmo_patient_id` *str* - patient id from CMO
* `csv_file` *str* - path to csv file containing variant information
* `tumor_type` *str* - tumor type label
* `manifest` *pathlib.Path* - path to the manifest containing meta data
* `cnv_path` *pathlib.Path* - path to directory having cnv files
* `dmp_patient_id` *str* - patient id of the clinical msk-impact sample
* `dmp_sample_id` *str* - sample id of the clinical msk-impact sample
* `dmp_facet_maf` *str* - path to the clinical msk-impact maf file annotated for facets results

**Returns**:

* `cmd` *str* - system command to run for create\_report.R
* `html_output` *pathlib.Path* - where the output file should be written
