gatk-3.8/protected/java/test/org/broadinstitute/sting/gatk/walkers
Mark DePristo bf42be44fc Fast DeBruijnGraph creation using the kmer counter
-- The previous creation algorithm used the following algorithm:

for each kmer1 -> kmer2 in each read
  add kmers 1 and 2 to the graph
  add edge kmer1 -> kmer2 in the graph, if it's not present (does check)
  update edge count by 1 if kmer1 -> kmer2 already existed in the graph

-- This algorithm had O(reads * kmers / read * (getEdge cost + addEdge cost)).  This is actually pretty expensive because get and add edges is expensive in jgrapht.
-- The new approach uses the following algorithm:

for each kmer1 -> kmer2 in each read
  add kmers 1 and 2 to a kmer counter, that counts kmer1+kmer2 in a fast hashmap

for each kmer pair 1 and 2 in the hash counter
  add edge kmer1 -> kmer2 in the graph, if it's not present (does check) with multiplicity count from map
  update edge count by count from map if kmer1 -> kmer2 already existed in the graph

-- This algorithm ensures that we add very much fewer edges
-- Additionally, created a fast kmer class that lets us create kmers from larger byte[]s of bases without cutting up the byte[] itself.
-- Overall runtimes are greatly reduced using this algorith
2013-04-10 17:10:59 -04:00
..
annotator Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
beagle Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
bqsr Replace uses of NestedHashMap with NestedIntegerArray. 2013-02-27 14:03:39 -05:00
compression/reducereads Updated AssessReducedQuals and applied it systematically to all ReduceReads integration tests. 2013-03-31 00:27:14 -04:00
diagnostics walker to calculate per base coverage distribution 2013-02-07 16:33:05 -05:00
diffengine Fixed issues raised by Appistry QA (mostly small fixes, corrections & clarifications to GATKDocs) 2013-03-12 10:57:14 -04:00
fasta Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
filters Don't allow users to specify keys and IDs that contain angle brackets or equals signs (not allowed in VCF spec). 2013-04-05 00:52:32 -04:00
genotyper Fix caching indices in the PairHMM 2013-04-08 11:05:12 -04:00
haplotypecaller Fast DeBruijnGraph creation using the kmer counter 2013-04-10 17:10:59 -04:00
indels Fixed IndelRealigner reference length bug (GSA-774) 2013-02-19 16:00:36 -05:00
phasing Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
validation MathUtils.randomSubset() now uses Collections.shuffle() (indirectly, through the other methods 2013-03-29 14:52:10 -04:00
varianteval Move some VCF/VariantContext methods back to the GATK based on feedback 2013-01-29 16:56:55 -05:00
variantrecalibration Updated all JAVA file licenses accordingly 2013-01-10 17:06:41 -05:00
variantutils Using --keepOriginalAC in SelectVariants was causing it to emit bad VCFs 2013-04-05 00:53:28 -04:00