gatk-3.8/public
Mark DePristo cc7680e601 NA12878 knowledge base backed by MongoDB
-- Idea is simply to create a persistent database of all TP/FP sites on chr20 in NA12878.  Individual callsets can be imported, and a consensus algorithm is run over all callsets in the database to create a consensus collection, which can be used to assess NA12878 callsets for GATK and methods development
-- Framework for representing simple VariantContexts and Genotypes in MongoDB, querying for records, and iterating over them in the GATK
-- Not hooked up to Tribble, but could be done reasonably easily now (future TODO)
-- Tools to import callsets, create consensus callsets, import and export reviews
-- Scripts to reset the knowledge base and repopulate it with the standard data files (Eric will expand)
-- Actually scales to all of chr20, includes AssessNA12878 that reads a VCF and itemizes it against the truth data set
-- ImportCallset can load OMNI, HM3, CEU best practices, mills/devine sites and genotypes, properly marking sites as poly/mono/unk as well as TP/FP/UNK based on command line parameters
-- Added shell scripts that start up a local mongo db, that connect to a local or BI hosted mongo for NA12878.db for debugging, and a setupNA12878db script that can load OMNI, HM3, CEU best practices, Mills/Devine into the db and then update the consensus.
-- Reviewed sites can be exported to a VCF, and imported again, as a mechanism to safely store the only non-recoverable data from the Mongo DB.
-- Created a NA12878DBWalker that manages the outer DB interaction, and that all MongoDB interacting walkers inherit from.  Added a NA12878DBArgumentCollection.java consolating all of the common command line arguments (though strictly not necessary as all of this occurs in the root walker)

UnitTests
-- Can connect to a test knowledge base for development and unit testing
-- PolymorphicStatus, TruthStatus, SiteIterator
-- NA12878KBUnitTestBase provides simple utilities for connecting to the test mongo db, getting calls, etc
-- MongoVariantContext tests creation, matching, and encoding -> writing -> read -> decoding from the mongodb

AssessNA12878
-- Generic tool for comparing a NA12878 callset against the knowledge base.  See http://gatkforums.broadinstitute.org/discussion/1848/using-the-na12878-knowledge-base for detailed documentation
-- Performs trivial filtering on FS, MQ, QD for SNPs and non-SNPs to separate out variants likely to be filtered from those that are honest-to-goodness FPs

Misc
-- Ability to provide Description for Simplified GATK report
2012-11-20 18:50:52 -05:00
..
R Fix a nasty bug in reading GATK reports with a single line 2012-09-10 20:14:13 -04:00
c
chainFiles
doc
java NA12878 knowledge base backed by MongoDB 2012-11-20 18:50:52 -05:00
keys Public-key authorization scheme to restrict use of NO_ET 2012-03-06 00:09:43 -05:00
packages Removed 'Walker' suffix from packages/GATKEngine.xml that were breaking the packaged release. 2012-07-23 16:32:31 -04:00
perl Split out contig names from Reference .fai file on white space (to support the GATK resource bundle's file human_g1k_v37.fasta.fai.gz, which does not use tab delimiters) 2012-06-07 16:56:32 -04:00
scala Rename hg19 files in bundle to b37 since that's what they are 2012-11-14 11:47:09 -05:00
testdata Reverting move of BQSR tests to public, as per DR's email 2012-07-19 10:02:05 -04:00