Compile Reads

Step 1 -- intra-patient genotyping

There are two variantion:

compile_reads.R : Works with Research ACCESS and Clinical IMPACT
compile_reads_all.R: Works with Research ACCESS, Clinical ACCESS and Clinical IMPACT

The first step of the pipeline is to genotype all the variants of interest in the included samples (this means plasma, buffy coat, DMP tumor, DMP normal, and donor samples). Once we obtained the read counts at every loci of every sample, we then generate a table of VAFs and call status for each variant in all samples within a patient in the next step.

Usage compile_reads.R

Rscript R/compile_reads.R -h                                        
usage: R/compile_reads.R [-h] [-m MASTERREF] [-o RESULTSDIR]
                         [-pb POOLEDBAMDIR] [-fa FASTAPATH]
                         [-gt GENOTYPERPATH] [-dmp DMPDIR] [-mb MIRRORBAMDIR]
                         [-dmpk DMPKEYPATH]

optional arguments:
  -h, --help            show this help message and exit
  -m MASTERREF, --masterref MASTERREF
                        File path to master reference file
  -o RESULTSDIR, --resultsdir RESULTSDIR
                        Output directory
  -pb POOLEDBAMDIR, --pooledbamdir POOLEDBAMDIR
                        Directory for all pooled bams [default]
  -fa FASTAPATH, --fastapath FASTAPATH
                        Reference fasta path [default]
  -gt GENOTYPERPATH, --genotyperpath GENOTYPERPATH
                        Genotyper executable path [default]
  -dmp DMPDIR, --dmpdir DMPDIR
                        Directory of clinical DMP IMPACT repository [default]
  -mb MIRRORBAMDIR, --mirrorbamdir MIRRORBAMDIR
                        Mirror BAM file directory [default]
  -dmpk DMPKEYPATH, --dmpkeypath DMPKEYPATH
                        DMP mirror BAM key file [default]

Usage compile_reads_all.R

Rscript R/compile_reads_all.R -h
usage: R/compile_reads_all.R [-h] [-m MASTERREF] [-o RESULTSDIR]
                             [-pid PROJECTID] [-pb POOLEDBAMDIR]
                             [-fa FASTAPATH] [-gt GENOTYPERPATH] [-dmp DMPDIR]
                             [-mb MIRRORBAMDIR] [-mab MIRRORACCESSBAMDIR]
                             [-dmpk DMPKEYPATH] [-dmpak DMPACCESSKEYPATH]

optional arguments:
  -h, --help            show this help message and exit
  -m MASTERREF, --masterref MASTERREF
                        File path to master reference file
  -o RESULTSDIR, --resultsdir RESULTSDIR
                        Output directory
  -pid PROJECTID, --projectid PROJECTID
                        Project ID for submitted jobs involved in this run
  -pb POOLEDBAMDIR, --pooledbamdir POOLEDBAMDIR
                        Directory for all pooled bams [default]
  -fa FASTAPATH, --fastapath FASTAPATH
                        Reference fasta path [default]
  -gt GENOTYPERPATH, --genotyperpath GENOTYPERPATH
                        Genotyper executable path [default]
  -dmp DMPDIR, --dmpdir DMPDIR
                        Directory of clinical DMP repository [default]
  -mb MIRRORBAMDIR, --mirrorbamdir MIRRORBAMDIR
                        Mirror BAM file directory [default]
  -mab MIRRORACCESSBAMDIR, --mirroraccessbamdir MIRRORACCESSBAMDIR
                        Mirror BAM file directory for MSK-ACCESS [default]
  -dmpk DMPKEYPATH, --dmpkeypath DMPKEYPATH
                        DMP mirror BAM key file [default]
  -dmpak DMPACCESSKEYPATH, --dmpaccesskeypath DMPACCESSKEYPATH
                        DMP mirror BAM key file for MSK-ACCESS [default]

Default

Default options can be found here

What `compile_reads` does

For each patient

Create a sample sheet -- similar to the one for genotype-variants

Sample_Barcode

duplex_bams

simplex_bams

standard_bam

Sample_Type

dmp_patient_id

plasma sample id

/duplex/bam

/simplex/bam

duplex

P-xxxxxxx

buffy coat id

/unfiltered/bam

unfilterednormal

P-xxxxxxx

DMP Tumor ID

/DMP/bam

DMP_Tumor

P-xxxxxxx

DMP Normal ID

/DMP/bam

DMP_Normal

P-xxxxxxx

Generate all variants of interests
- DMP calls from cbio repo
- ACCESS calls from SNV pipeline
Generate unique variants list
Tag hotspots on unique variants
Genotype with genotype-variants

Afterwards, for donor bams

Obtain all variants genotyped in any patient, generate a all unique list of variants
Genotype with genotype-variants

PreviousOverview of Analysis Workflow NextFilter Calls

Last updated 2 years ago

Was this helpful?

Usage compile_reads.R

Usage compile_reads_all.R

Default

What compile_reads does

For each patient

Afterwards, for donor bams

What `compile_reads` does