# Get cBioPortal Variants

## Table of Contents

* [get\_cbioportal\_variants](#get_cbioportal_variants)&#x20;
  * [subset\_cpt](#subset_cpt)&#x20;
  * [subset\_cst](#subset_cst)&#x20;
  * [subset\_cna](#subset_cna)&#x20;
  * [subset\_sv](#subset_sv)&#x20;
  * [subset\_maf](#subset_maf)&#x20;
* [Sub-modules](#sub-modules)

## get\_cbioportal\_variants

Requirement:

* pandas
* typing
* typer
* bed\_lookup(<https://github.com/msk-access/python_bed_lookup>)

#### Example command

```bash
python get_cbioportal_variants.py  subset-maf --sid "Test1" --sid "Test2" --sid "Test3"
```

```bash
python get_cbioportal_variants.py  subset-maf --ids /path/to/ids.txt
```

```bash
Usage: get_cbioportal_variants.py [OPTIONS] COMMAND [ARGS]...

Options:
  --install-completion  Install completion for the current shell.
  --show-completion     Show completion for the current shell, to copy it or
                        customize the installation.

  --help                Show this message and exit.

Commands:
  subset-cna  Subset data_CNA.txt file for given set of sample ids.
  subset-cpt  Subset data_clinical_patient.txt file for given set of
              patient...

  subset-cst  Subset data_clinical_samples.txt file for given set of sample...
  subset-maf  Subset MAF/TSV file and mark if an alteration is covered by...
  subset-sv   Subset data_sv.txt file for given set of sample ids.
```

### **subset\_cpt**

```bash
Usage: get_cbioportal_variants.py subset-cpt [OPTIONS]

  Subset data_clinical_patient.txt file for given set of patient ids.

  Tool to do the following operations: A. Get subset of clinical information
  for samples based on PATIENT_ID in data_clinical_patient.txt file

  Requirement: pandas; typing; typer; bed_lookup(https://github.com/msk-
  access/python_bed_lookup)

Options:
  -p, --cpt FILE    Clinical Patient file generated by cBioportal repo
                    [default: /work/access/production/resources/cbioportal/cur
                    rent/msk_solid_heme/data_clinical_patient.txt]

  -i, --ids PATH    List of ids to search for in the 'PATIENT_ID' column.
                    Header of this file is 'sample_id'  [default: ]

  --sid TEXT        Identifiers to search for in the 'PATIENT_ID' column. Can
                    be given multiple times  [default: ]

  -n, --name TEXT   Name of the output file  [default:
                    output_clinical_patient.txt]

  -c, --cname TEXT  Name of the column header to be used for sub-setting
                    [default: PATIENT_ID]

  --help            Show this message and exit.
```

### **subset\_cst**

```bash
Usage: get_cbioportal_variants.py subset-cst [OPTIONS]

  Subset data_clinical_samples.txt file for given set of sample ids.

  Tool to do the following operations: A. Get subset of clinical information
  for samples based on SAMPLE_ID in data_clinical_sample.txt file

  Requirement: pandas; typing; typer; bed_lookup(https://github.com/msk-
  access/python_bed_lookup)

Options:
  -s, --cst FILE    Clinical Sample file generated by cBioportal repo
                    [default: /work/access/production/resources/cbioportal/cur
                    rent/msk_solid_heme/data_clinical_sample.txt]

  -i, --ids PATH    List of ids to search for in the 'SAMPLE_ID' column.
                    Header of this file is 'sample_id'  [default: ]

  --sid TEXT        Identifiers to search for in the 'SAMPLE_ID' column. Can
                    be given multiple times  [default: ]

  -n, --name TEXT   Name of the output file  [default:
                    output_clinical_samples.txt]

  -c, --cname TEXT  Name of the column header to be used for sub-setting
                    [default: SAMPLE_ID]

  --help            Show this message and exit.
```

### **subset\_cna**

```bash
Usage: get_cbioportal_variants.py subset-cna [OPTIONS]

  Subset data_CNA.txt file for given set of sample ids.

  Tool to do the following operations: A. Get subset of samples based on
  column header in data_CNA.txt file

  Requirement: pandas; typing; typer; bed_lookup(https://github.com/msk-
  access/python_bed_lookup)

Options:
  -c, --cna FILE   Copy Number Variant file generated by cBioportal repo
                   [default: /work/access/production/resources/cbioportal/curr
                   ent/msk_solid_heme/data_CNA.txt]

  -i, --ids PATH   List of ids to search for in the 'header' of the file.
                   Header of this file is 'sample_id'  [default: ]

  --sid TEXT       Identifiers to search for in the 'header' of the file. Can
                   be given multiple times  [default: ]

  -n, --name TEXT  Name of the output file  [default: output_CNA.txt]
  --help           Show this message and exit.
```

### **subset\_sv**

```bash
Usage: get_cbioportal_variants.py subset-sv [OPTIONS]

  Subset data_sv.txt file for given set of sample ids.

  Tool to do the following operations: A. Get subset of structural variants
  based on Sample_ID in data_sv.txt file

  Requirement: pandas; typing; typer; bed_lookup(https://github.com/msk-
  access/python_bed_lookup)

Options:
  -s, --sv FILE     Structural Variant file generated by cBioportal repo
                    [default: /work/access/production/resources/cbioportal/cur
                    rent/msk_solid_heme/data_sv.txt]

  -i, --ids PATH    List of ids to search for in the 'Sample_ID' column.
                    Header of this file is 'sample_id'  [default: ]

  --sid TEXT        Identifiers to search for in the 'Sample_ID' column. Can
                    be given multiple times  [default: ]

  -n, --name TEXT   Name of the output file  [default: output_sv.txt]
  -c, --cname TEXT  Name of the column header to be used for sub-setting
                    [default: Sample_ID]

  --help            Show this message and exit.
```

### **subset\_maf**

```bash
Usage: get_cbioportal_variants.py subset-maf [OPTIONS]

  Subset MAF/TSV file and mark if an alteration is covered by BED file or
  not

  Tool to do the following operations: A. Get subset of variants based on
  Tumor_Sample_Barcode in data_mutations_extended.txt file B. Mark the
  variants as overlapping with BED file as covered [yes/no], by appending
  "covered" column to the subset MAF

  Requirement: pandas; typing; typer; bed_lookup(https://github.com/msk-
  access/python_bed_lookup)

Options:
  -m, --maf FILE    MAF file generated by cBioportal repo  [default: /work/acc
                    ess/production/resources/cbioportal/current/msk_solid_heme
                    /data_mutations_extended.txt]

  -i, --ids PATH    List of ids to search for in the 'Tumor_Sample_Barcode'
                    column. Header of this file is 'sample_id'  [default: ]

  --sid TEXT        Identifiers to search for in the 'Tumor_Sample_Barcode'
                    column. Can be given multiple times  [default: ]

  -b, --bed FILE    BED file to find overlapping variants  [default:
                    /work/access/production/resources/msk-
                    access/current/regions_of_interest/current/MSK-
                    ACCESS-v1_0-probe-A.sorted.bed]

  -n, --name TEXT   Name of the output file  [default: output.maf]
  -c, --cname TEXT  Name of the column header to be used for sub-setting
                    [default: Tumor_Sample_Barcode]

  --help            Show this message and exit.
```

### Sub-modules

#### **read\_tsv**

```python
def read_tsv(tsv)
```

Read a tsv file

**Arguments**:

* `maf` *File* - Input MAF/tsv like format file

**Returns**:

* `data_frame` - Output a data frame containing the MAF/tsv

#### **read\_ids**

```python
def read_ids(sid, ids)
```

make a list of ids

**Arguments**:

* `sid` *tuple* - Multiple ids as tuple
* `ids` *File* - File containing multiple ids

**Returns**:

* `list` - List containing all ids

#### **filter\_by\_columns**

```python
def filter_by_columns(sid, tsv_df)
```

Filter data by columns

**Arguments**:

* `sid` *list* - list of columns to subset over
* `tsv_df` *data\_frame* - data\_frame to subset from

**Returns**:

* `data_frame` - A copy of the subset of the data\_frame

#### **filter\_by\_rows**

```python
def filter_by_rows(sid, tsv_df, col_name)
```

Filter the data by rows

**Arguments**:

* `sid` *list* - list of row names to subset over
* `tsv_df` *data\_frame* - data\_frame to subset from
* `col_name` *string* - name of the column to filter using names in the sid

**Returns**:

* `data_frame` - A copy of the subset of the data\_frame

#### **read\_bed**

```python
def read_bed(bed)
```

Read BED file using bed\_lookup

**Arguments**:

* `bed` *file* - File ins BED format to read

**Returns**:

object : bed file object to use for filtering

#### **check\_if\_covered**

```python
def check_if_covered(bedObj, mafObj)
```

Function to check if a variant is covered in a given bed file

**Arguments**:

* `bedObj` *object* - BED file object to check coverage
* `mafObj` *data\_frame* - data frame to check coverage against coordinates using column 'Chromosome' and position column is 'Start\_Position'

**Returns**:

* `data_frame` - *description*

#### **get\_row**

```python
def get_row(tsv_file)
```

Function to skip rows

**Arguments**:

* `tsv_file` *file* - file to be read

**Returns**:

* `list` - lines to be skipped
