MarkDuplicates v4.1.8.1

Version of tools in docker image

Tool

Version

Location

gatk

4.1.8.1

broadinstitute/gatk:4.1.8.1

CWL

CWL specification 1.0
Use example_inputs.yaml to see the inputs to the cwl
Example Command using toil:

    > toil-cwl-runner picard_mark_duplicates_4.1.8.1.cwl example_inputs.yaml

Usage

usage: picard_mark_duplicates_4.1.8.1.cwl [-h]

positional arguments:
  job_order             Job input json file

optional arguments:
  -h, --help            show this help message and exit
  --memory_per_job MEMORY_PER_JOB
                        Memory per job in megabytes
  --memory_overhead MEMORY_OVERHEAD
                        Memory overhead per job in megabytes
  --number_of_threads NUMBER_OF_THREADS
  --input INPUT         Input file (bam or sam). Required.
  --output_file_name OUTPUT_FILE_NAME
                        Output file (bam or sam).
  --duplication_metrics DUPLICATION_METRICS
                        File to write duplication metrics to Required.
  --assume_sort_order ASSUME_SORT_ORDER
                        Optional sort order to output in. If not supplied
                        OUTPUT is in the same order as INPUT.Default value:
                        null. Possible values: {unsorted, queryname,
                        coordinate}
  --tmp_dir TMP_DIR     This option may be specified 0 or more times
  --validation_stringency VALIDATION_STRINGENCY
                        Validation stringency for all SAM files read by this
                        program. Setting stringency to SILENT can improve
                        performance when processing a BAM file in which
                        variable-length data (read, qualities, tags) do not
                        otherwise need to be decoded. Default value: STRICT.
                        This option can be set to 'null' to clear the default
                        value. Possible values: {STRICT,LENIENT, SILENT}
  --bam_compression_level BAM_COMPRESSION_LEVEL
                        Compression level for all compressed files created
                        (e.g. BAM and GELI). Default value:5. This option can
                        be set to 'null' to clear the default value.
  --create_bam_index    Whether to create a BAM index when writing a
                        coordinate-sorted BAM file. Default value:false. This
                        option can be set to 'null' to clear the default
                        value. Possible values:{true, false}
  --read_name_regex READ_NAME_REGEX
                        MarkDuplicates can use the tile and cluster positions
                        to estimate the rate of optical duplication in
                        addition to the dominant source of duplication, PCR,
                        to provide a more accurate estimation of library size.
                        By default (with no READ_NAME_REGEX specified),
                        MarkDuplicates will attempt to extract coordinates
                        using a split on ':' (see Note below). Set
                        READ_NAME_REGEX to 'null' to disable optical duplicate
                        detection. Note that without optical duplicate counts,
                        library size estimation will be less accurate. If the
                        read name does not follow a standard Illumina colon-
                        separation convention, but does contain tile and x,y
                        coordinates, a regular expression can be specified to
                        extract three variables: tile/region, x coordinate and
                        y coordinate from a read name. The regular expression
                        must contain three capture groups for the three
                        variables, in order. It must match the entire read
                        name. e.g. if field names were separated by semi-colon
                        (';') this example regex could be specified
                        (?:.*;)?([0-9]+)[^;]*;([0-9]+)[^;]*;([0-9]+)[^;]*$
                        Note that if no READ_NAME_REGEX is specified, the read
                        name is split on ':'. For 5 element names, the 3rd,
                        4th and 5th elements are assumed to be tile, x and y
                        values. For 7 element names (CASAVA 1.8), the 5th,
                        6th, and 7th elements are assumed to be tile, x and y
                        values.
  --sorting_collection_size_ratio SORTING_COLLECTION_SIZE_RATIO
                        This number, plus the maximum RAM available to the
                        JVM, determine the memory footprint used by some of
                        the sorting collections. If you are running out of
                        memory, try reducing this number.
  --use_jdk_deflater    Use the JDK Deflater instead of the Intel Deflater for
                        writing compressed output
  --use_jdk_inflater    Whether to create a BAM index when writing a
                        coordinate-sorted BAM file. Default value:false. This
                        option can be set to 'null' to clear the default
                        value. Possible values:{true, false}
  --duplicate_scoring_strategy DUPLICATE_SCORING_STRATEGY
                        The scoring strategy for choosing the non-duplicate
                        among candidates. Default value:SUM_OF_BASE_QUALITIES.
                        This option can be set to 'null' to clear the default
                        value.Possible values: {SUM_OF_BASE_QUALITIES,
                        TOTAL_MAPPED_REFERENCE_LENGTH, RANDOM}
  --optical_duplicate_pixel_distance OPTICAL_DUPLICATE_PIXEL_DISTANCE
                        The maximum offset between two duplicate clusters in
                        order to consider them optical duplicates. The default
                        is appropriate for unpatterned versions of the
                        Illumina platform. For the patterned flowcell models,
                        2500 is moreappropriate. For other platforms and
                        models, users should experiment to find what works
                        best. Default value: 100. This option can be set to
                        'null' to clear the default value.

PreviousMarkDuplicates v2.21.2 NextPostprocessing variant calls

Last updated 3 years ago