gatk-3.8/public/doc/README

87 lines
4.2 KiB
Plaintext
Raw Normal View History

The Genome Analysis Toolkit (GATK)
Copyright (c) 2009 The Broad Institute
Overview
--------
The Genome Analysis Toolkit (GATK) is a structured programming
framework designed to enable rapid development of efficient and robust
analysis tools for next-generation DNA sequencers. The GATK solves
the data management challenge by separating data access patterns from
analysis algorithms, using the functional programming philosophy of
Map/Reduce. Consequently, the GATK is structured into data traversals
and data walkers that interact through a programming contract in which
the traversal provides a series of units of data to the walker, and
the walker consumes each datum to generate an output for each datum.
Because many tools to analyze next-generation sequencing data access
the data in a very similar way, the GATK can provide a small but
nearly comprehensive set of traversal types that satisfying the data
access needs of the majority of analysis tools. For example,
traversals "by each sequencer read" and "by every read covering
each locus in a genome" are common throughout many tools such as
counting reads, building base quality histograms, reporting average
coverage of the genome, and calling SNPs. The small number of these
traversals, shared among many tools enables the core GATK development
team to optimize such traversals for correctness, stability, CPU
performance, memory footprint, and in many cases to even automatically
parallelize calculations. Moreover, since the traversal engine
encapsulates the complexity of efficiently accessing the
next-generation sequencing data, researchers and developers are free
to focus on their specific analysis algorithms. This not only vastly
improves productivity of the developers, who can quickly write new
analyses, but also results in tools that are efficient and robust and
can benefit from improvement to a common data management engine.
Capabilities
------------
The GenomeAnalysisTK development environment is currently provided as
a platform-independent Java programming language library. The core
system works with the nascent standard Sequence Alignment/Map (SAM)
format to represent reads using a production-quality SAM library
developed at the Broad. The system can access a variety of metadata
files such as dbSNP, Hapmap, RefSeq as well as work with genotype and
SNP files in GLF, Geli, and other common formats. The core system
handles read data from Illumina/Solexa, SOLiD, and Roche/454. The
current GATK engine can process all of the 1000 genomes data
representing ~5Tb of data from these three technologies produced from
multiple sequencing centers and aligned to the human reference genome
with multiple aligners. The GATK currently provides traversals by
each read (ByRead traversal), by all reads covering each locus in the
genome (ByLoci traversal), and by all reads within pre-specified
intervals on the genome (ByWindow traversal).
Dependencies
------------
The GATK relies on a Java 6-compatible JRE. At the time of this writing,
the GATK team tests with Sun JRE version 1.6.0_12-b04. Additionally, the
GATK requires as inputs a sorted, indexed BAM file containing aligned reads
and a fasta-format reference with associated dictionary file (.dict)and
index (.fasta.fai).
Instructions for preparing input files are available here:
Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs) GATK-73 updated docs for bqsr args GATK-9 differentiate CountRODs from CountRODsByRef GATK-76 generate GATKDoc for CatVariants GATK-4 made resource arg required GATK-10 added -o, some docs to CountMales; some docs to CountLoci GATK-11 fixed by MC's -o change; straightened out the docs. GATK-77 fixed references to wiki GATK-76 Added Ami's doc block GATK-14 Added note that these annotations can only be used with VariantAnnotator GATK-15 specified required=false for two arguments GATK-23 Added documentation block GATK-33 Added documentation GATK-34 Added documentation GATK-32 Corrected arg name and docstring in DiffObjects GATK-32 Added note to DO doc about reference (required but unused) GATK-29 Added doc block to CountIntervals GATK-31 Added @Output PrintStream to enable -o GATK-35 Touched up docs GATK-36 Touched up docs, specified verbosity is optional GATK-60 Corrected GContent annot module location in gatkdocs GATK-68 touched up docs and arg docstrings GATK-16 Added note of caution about calling RODRequiringAnnotations as a group GATK-61 Added run requirements (num samples, min genotype quality) Tweaked template and generic doc block formatting (h2 to h3 titles) GATK-62 Added a caveat to HR annot Made experimental annotation hidden GATK-75 Added setup info regarding BWA GATK-22 Clarified some argument requirements GATK-48 Clarified -G doc comments GATK-67 Added arg requirement GATK-58 Added annotation and usage docs GSATDG-96 Corrected doc Updated MD5 for DiffObjectsIntegrationTests (only change is link in table title)
2013-03-06 02:58:50 +08:00
http://www.broadinstitute.org/gatk/guide/article?id=1204
The bundled 'resources' directory contains an example BAM and fasta.
Getting Started
---------------
The GATK is distributed with a few standard analyses, including PrintReads,
Pileup, and DepthOfCoverage. More information on the included walkers is
available here:
Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs) GATK-73 updated docs for bqsr args GATK-9 differentiate CountRODs from CountRODsByRef GATK-76 generate GATKDoc for CatVariants GATK-4 made resource arg required GATK-10 added -o, some docs to CountMales; some docs to CountLoci GATK-11 fixed by MC's -o change; straightened out the docs. GATK-77 fixed references to wiki GATK-76 Added Ami's doc block GATK-14 Added note that these annotations can only be used with VariantAnnotator GATK-15 specified required=false for two arguments GATK-23 Added documentation block GATK-33 Added documentation GATK-34 Added documentation GATK-32 Corrected arg name and docstring in DiffObjects GATK-32 Added note to DO doc about reference (required but unused) GATK-29 Added doc block to CountIntervals GATK-31 Added @Output PrintStream to enable -o GATK-35 Touched up docs GATK-36 Touched up docs, specified verbosity is optional GATK-60 Corrected GContent annot module location in gatkdocs GATK-68 touched up docs and arg docstrings GATK-16 Added note of caution about calling RODRequiringAnnotations as a group GATK-61 Added run requirements (num samples, min genotype quality) Tweaked template and generic doc block formatting (h2 to h3 titles) GATK-62 Added a caveat to HR annot Made experimental annotation hidden GATK-75 Added setup info regarding BWA GATK-22 Clarified some argument requirements GATK-48 Clarified -G doc comments GATK-67 Added arg requirement GATK-58 Added annotation and usage docs GSATDG-96 Corrected doc Updated MD5 for DiffObjectsIntegrationTests (only change is link in table title)
2013-03-06 02:58:50 +08:00
http://www.broadinstitute.org/gatk/gatkdocs
To print the reads of the included sample data, untar the package into
the GenomeAnalysisTK directory and run the following command:
java -jar GenomeAnalysisTK/GenomeAnalysisTK.jar \
-T PrintReads \
-R GenomeAnalysisTK/resources/exampleFASTA.fasta \
-I GenomeAnalysisTK/resources/exampleBAM.bam
Support
-------
Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs) GATK-73 updated docs for bqsr args GATK-9 differentiate CountRODs from CountRODsByRef GATK-76 generate GATKDoc for CatVariants GATK-4 made resource arg required GATK-10 added -o, some docs to CountMales; some docs to CountLoci GATK-11 fixed by MC's -o change; straightened out the docs. GATK-77 fixed references to wiki GATK-76 Added Ami's doc block GATK-14 Added note that these annotations can only be used with VariantAnnotator GATK-15 specified required=false for two arguments GATK-23 Added documentation block GATK-33 Added documentation GATK-34 Added documentation GATK-32 Corrected arg name and docstring in DiffObjects GATK-32 Added note to DO doc about reference (required but unused) GATK-29 Added doc block to CountIntervals GATK-31 Added @Output PrintStream to enable -o GATK-35 Touched up docs GATK-36 Touched up docs, specified verbosity is optional GATK-60 Corrected GContent annot module location in gatkdocs GATK-68 touched up docs and arg docstrings GATK-16 Added note of caution about calling RODRequiringAnnotations as a group GATK-61 Added run requirements (num samples, min genotype quality) Tweaked template and generic doc block formatting (h2 to h3 titles) GATK-62 Added a caveat to HR annot Made experimental annotation hidden GATK-75 Added setup info regarding BWA GATK-22 Clarified some argument requirements GATK-48 Clarified -G doc comments GATK-67 Added arg requirement GATK-58 Added annotation and usage docs GSATDG-96 Corrected doc Updated MD5 for DiffObjectsIntegrationTests (only change is link in table title)
2013-03-06 02:58:50 +08:00
Documentation for the GATK is available at http://www.broadinstitute.org/gatk/guide.
For help using the GATK, developing analyses with the GATK, bug reports,
Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs) GATK-73 updated docs for bqsr args GATK-9 differentiate CountRODs from CountRODsByRef GATK-76 generate GATKDoc for CatVariants GATK-4 made resource arg required GATK-10 added -o, some docs to CountMales; some docs to CountLoci GATK-11 fixed by MC's -o change; straightened out the docs. GATK-77 fixed references to wiki GATK-76 Added Ami's doc block GATK-14 Added note that these annotations can only be used with VariantAnnotator GATK-15 specified required=false for two arguments GATK-23 Added documentation block GATK-33 Added documentation GATK-34 Added documentation GATK-32 Corrected arg name and docstring in DiffObjects GATK-32 Added note to DO doc about reference (required but unused) GATK-29 Added doc block to CountIntervals GATK-31 Added @Output PrintStream to enable -o GATK-35 Touched up docs GATK-36 Touched up docs, specified verbosity is optional GATK-60 Corrected GContent annot module location in gatkdocs GATK-68 touched up docs and arg docstrings GATK-16 Added note of caution about calling RODRequiringAnnotations as a group GATK-61 Added run requirements (num samples, min genotype quality) Tweaked template and generic doc block formatting (h2 to h3 titles) GATK-62 Added a caveat to HR annot Made experimental annotation hidden GATK-75 Added setup info regarding BWA GATK-22 Clarified some argument requirements GATK-48 Clarified -G doc comments GATK-67 Added arg requirement GATK-58 Added annotation and usage docs GSATDG-96 Corrected doc Updated MD5 for DiffObjectsIntegrationTests (only change is link in table title)
2013-03-06 02:58:50 +08:00
or feature requests, please visit our support forum at http://gatkforums.broadinstitute.org/