# Get cBioPortal Variants

## Table of Contents

* [get\_cbioportal\_variants](#get_cbioportal_variants)&#x20;
  * [subset\_cpt](#subset_cpt)&#x20;
  * [subset\_cst](#subset_cst)&#x20;
  * [subset\_cna](#subset_cna)&#x20;
  * [subset\_sv](#subset_sv)&#x20;
  * [subset\_maf](#subset_maf)&#x20;
* [Sub-modules](#sub-modules)

## get\_cbioportal\_variants

Requirement:

* pandas
* typing
* typer
* bed\_lookup(<https://github.com/msk-access/python_bed_lookup>)

#### Example command

```bash
python get_cbioportal_variants.py  subset-maf --sid "Test1" --sid "Test2" --sid "Test3"
```

```bash
python get_cbioportal_variants.py  subset-maf --ids /path/to/ids.txt
```

```bash
Usage: get_cbioportal_variants.py [OPTIONS] COMMAND [ARGS]...

Options:
  --install-completion  Install completion for the current shell.
  --show-completion     Show completion for the current shell, to copy it or
                        customize the installation.

  --help                Show this message and exit.

Commands:
  subset-cna  Subset data_CNA.txt file for given set of sample ids.
  subset-cpt  Subset data_clinical_patient.txt file for given set of
              patient...

  subset-cst  Subset data_clinical_samples.txt file for given set of sample...
  subset-maf  Subset MAF/TSV file and mark if an alteration is covered by...
  subset-sv   Subset data_sv.txt file for given set of sample ids.
```

### **subset\_cpt**

```bash
Usage: get_cbioportal_variants.py subset-cpt [OPTIONS]

  Subset data_clinical_patient.txt file for given set of patient ids.

  Tool to do the following operations: A. Get subset of clinical information
  for samples based on PATIENT_ID in data_clinical_patient.txt file

  Requirement: pandas; typing; typer; bed_lookup(https://github.com/msk-
  access/python_bed_lookup)

Options:
  -p, --cpt FILE    Clinical Patient file generated by cBioportal repo
                    [default: /work/access/production/resources/cbioportal/cur
                    rent/msk_solid_heme/data_clinical_patient.txt]

  -i, --ids PATH    List of ids to search for in the 'PATIENT_ID' column.
                    Header of this file is 'sample_id'  [default: ]

  --sid TEXT        Identifiers to search for in the 'PATIENT_ID' column. Can
                    be given multiple times  [default: ]

  -n, --name TEXT   Name of the output file  [default:
                    output_clinical_patient.txt]

  -c, --cname TEXT  Name of the column header to be used for sub-setting
                    [default: PATIENT_ID]

  --help            Show this message and exit.
```

### **subset\_cst**

```bash
Usage: get_cbioportal_variants.py subset-cst [OPTIONS]

  Subset data_clinical_samples.txt file for given set of sample ids.

  Tool to do the following operations: A. Get subset of clinical information
  for samples based on SAMPLE_ID in data_clinical_sample.txt file

  Requirement: pandas; typing; typer; bed_lookup(https://github.com/msk-
  access/python_bed_lookup)

Options:
  -s, --cst FILE    Clinical Sample file generated by cBioportal repo
                    [default: /work/access/production/resources/cbioportal/cur
                    rent/msk_solid_heme/data_clinical_sample.txt]

  -i, --ids PATH    List of ids to search for in the 'SAMPLE_ID' column.
                    Header of this file is 'sample_id'  [default: ]

  --sid TEXT        Identifiers to search for in the 'SAMPLE_ID' column. Can
                    be given multiple times  [default: ]

  -n, --name TEXT   Name of the output file  [default:
                    output_clinical_samples.txt]

  -c, --cname TEXT  Name of the column header to be used for sub-setting
                    [default: SAMPLE_ID]

  --help            Show this message and exit.
```

### **subset\_cna**

```bash
Usage: get_cbioportal_variants.py subset-cna [OPTIONS]

  Subset data_CNA.txt file for given set of sample ids.

  Tool to do the following operations: A. Get subset of samples based on
  column header in data_CNA.txt file

  Requirement: pandas; typing; typer; bed_lookup(https://github.com/msk-
  access/python_bed_lookup)

Options:
  -c, --cna FILE   Copy Number Variant file generated by cBioportal repo
                   [default: /work/access/production/resources/cbioportal/curr
                   ent/msk_solid_heme/data_CNA.txt]

  -i, --ids PATH   List of ids to search for in the 'header' of the file.
                   Header of this file is 'sample_id'  [default: ]

  --sid TEXT       Identifiers to search for in the 'header' of the file. Can
                   be given multiple times  [default: ]

  -n, --name TEXT  Name of the output file  [default: output_CNA.txt]
  --help           Show this message and exit.
```

### **subset\_sv**

```bash
Usage: get_cbioportal_variants.py subset-sv [OPTIONS]

  Subset data_sv.txt file for given set of sample ids.

  Tool to do the following operations: A. Get subset of structural variants
  based on Sample_ID in data_sv.txt file

  Requirement: pandas; typing; typer; bed_lookup(https://github.com/msk-
  access/python_bed_lookup)

Options:
  -s, --sv FILE     Structural Variant file generated by cBioportal repo
                    [default: /work/access/production/resources/cbioportal/cur
                    rent/msk_solid_heme/data_sv.txt]

  -i, --ids PATH    List of ids to search for in the 'Sample_ID' column.
                    Header of this file is 'sample_id'  [default: ]

  --sid TEXT        Identifiers to search for in the 'Sample_ID' column. Can
                    be given multiple times  [default: ]

  -n, --name TEXT   Name of the output file  [default: output_sv.txt]
  -c, --cname TEXT  Name of the column header to be used for sub-setting
                    [default: Sample_ID]

  --help            Show this message and exit.
```

### **subset\_maf**

```bash
Usage: get_cbioportal_variants.py subset-maf [OPTIONS]

  Subset MAF/TSV file and mark if an alteration is covered by BED file or
  not

  Tool to do the following operations: A. Get subset of variants based on
  Tumor_Sample_Barcode in data_mutations_extended.txt file B. Mark the
  variants as overlapping with BED file as covered [yes/no], by appending
  "covered" column to the subset MAF

  Requirement: pandas; typing; typer; bed_lookup(https://github.com/msk-
  access/python_bed_lookup)

Options:
  -m, --maf FILE    MAF file generated by cBioportal repo  [default: /work/acc
                    ess/production/resources/cbioportal/current/msk_solid_heme
                    /data_mutations_extended.txt]

  -i, --ids PATH    List of ids to search for in the 'Tumor_Sample_Barcode'
                    column. Header of this file is 'sample_id'  [default: ]

  --sid TEXT        Identifiers to search for in the 'Tumor_Sample_Barcode'
                    column. Can be given multiple times  [default: ]

  -b, --bed FILE    BED file to find overlapping variants  [default:
                    /work/access/production/resources/msk-
                    access/current/regions_of_interest/current/MSK-
                    ACCESS-v1_0-probe-A.sorted.bed]

  -n, --name TEXT   Name of the output file  [default: output.maf]
  -c, --cname TEXT  Name of the column header to be used for sub-setting
                    [default: Tumor_Sample_Barcode]

  --help            Show this message and exit.
```

### Sub-modules

#### **read\_tsv**

```python
def read_tsv(tsv)
```

Read a tsv file

**Arguments**:

* `maf` *File* - Input MAF/tsv like format file

**Returns**:

* `data_frame` - Output a data frame containing the MAF/tsv

#### **read\_ids**

```python
def read_ids(sid, ids)
```

make a list of ids

**Arguments**:

* `sid` *tuple* - Multiple ids as tuple
* `ids` *File* - File containing multiple ids

**Returns**:

* `list` - List containing all ids

#### **filter\_by\_columns**

```python
def filter_by_columns(sid, tsv_df)
```

Filter data by columns

**Arguments**:

* `sid` *list* - list of columns to subset over
* `tsv_df` *data\_frame* - data\_frame to subset from

**Returns**:

* `data_frame` - A copy of the subset of the data\_frame

#### **filter\_by\_rows**

```python
def filter_by_rows(sid, tsv_df, col_name)
```

Filter the data by rows

**Arguments**:

* `sid` *list* - list of row names to subset over
* `tsv_df` *data\_frame* - data\_frame to subset from
* `col_name` *string* - name of the column to filter using names in the sid

**Returns**:

* `data_frame` - A copy of the subset of the data\_frame

#### **read\_bed**

```python
def read_bed(bed)
```

Read BED file using bed\_lookup

**Arguments**:

* `bed` *file* - File ins BED format to read

**Returns**:

object : bed file object to use for filtering

#### **check\_if\_covered**

```python
def check_if_covered(bedObj, mafObj)
```

Function to check if a variant is covered in a given bed file

**Arguments**:

* `bedObj` *object* - BED file object to check coverage
* `mafObj` *data\_frame* - data frame to check coverage against coordinates using column 'Chromosome' and position column is 'Start\_Position'

**Returns**:

* `data_frame` - *description*

#### **get\_row**

```python
def get_row(tsv_file)
```

Function to skip rows

**Arguments**:

* `tsv_file` *file* - file to be read

**Returns**:

* `list` - lines to be skipped


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://cmo-ci.gitbook.io/cmo-access-data-analysis/miscellaneous-utility-scripts/get-cbioportal-variants.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
