gatk-3.8

History

Mark DePristo bf42be44fc Fast DeBruijnGraph creation using the kmer counter -- The previous creation algorithm used the following algorithm: for each kmer1 -> kmer2 in each read add kmers 1 and 2 to the graph add edge kmer1 -> kmer2 in the graph, if it's not present (does check) update edge count by 1 if kmer1 -> kmer2 already existed in the graph -- This algorithm had O(reads * kmers / read * (getEdge cost + addEdge cost)). This is actually pretty expensive because get and add edges is expensive in jgrapht. -- The new approach uses the following algorithm: for each kmer1 -> kmer2 in each read add kmers 1 and 2 to a kmer counter, that counts kmer1+kmer2 in a fast hashmap for each kmer pair 1 and 2 in the hash counter add edge kmer1 -> kmer2 in the graph, if it's not present (does check) with multiplicity count from map update edge count by count from map if kmer1 -> kmer2 already existed in the graph -- This algorithm ensures that we add very much fewer edges -- Additionally, created a fast kmer class that lets us create kmers from larger byte[]s of bases without cutting up the byte[] itself. -- Overall runtimes are greatly reduced using this algorith		2013-04-10 17:10:59 -04:00
..
annotator	Updated all JAVA file licenses accordingly	2013-01-10 17:06:41 -05:00
beagle	Updated all JAVA file licenses accordingly	2013-01-10 17:06:41 -05:00
bqsr	Replace uses of NestedHashMap with NestedIntegerArray.	2013-02-27 14:03:39 -05:00
compression/reducereads	Updated AssessReducedQuals and applied it systematically to all ReduceReads integration tests.	2013-03-31 00:27:14 -04:00
diagnostics	walker to calculate per base coverage distribution	2013-02-07 16:33:05 -05:00
diffengine	Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs)	2013-03-12 10:57:14 -04:00
fasta	Updated all JAVA file licenses accordingly	2013-01-10 17:06:41 -05:00
filters	Don't allow users to specify keys and IDs that contain angle brackets or equals signs (not allowed in VCF spec).	2013-04-05 00:52:32 -04:00
genotyper	Fix caching indices in the PairHMM	2013-04-08 11:05:12 -04:00
haplotypecaller	Fast DeBruijnGraph creation using the kmer counter	2013-04-10 17:10:59 -04:00
indels	Fixed IndelRealigner reference length bug (GSA-774)	2013-02-19 16:00:36 -05:00
phasing	Updated all JAVA file licenses accordingly	2013-01-10 17:06:41 -05:00
validation	MathUtils.randomSubset() now uses Collections.shuffle() (indirectly, through the other methods	2013-03-29 14:52:10 -04:00
varianteval	Move some VCF/VariantContext methods back to the GATK based on feedback	2013-01-29 16:56:55 -05:00
variantrecalibration	Updated all JAVA file licenses accordingly	2013-01-10 17:06:41 -05:00
variantutils	Using --keepOriginalAC in SelectVariants was causing it to emit bad VCFs	2013-04-05 00:53:28 -04:00