gatk-3.8

History

Mark DePristo bf42be44fc Fast DeBruijnGraph creation using the kmer counter -- The previous creation algorithm used the following algorithm: for each kmer1 -> kmer2 in each read add kmers 1 and 2 to the graph add edge kmer1 -> kmer2 in the graph, if it's not present (does check) update edge count by 1 if kmer1 -> kmer2 already existed in the graph -- This algorithm had O(reads * kmers / read * (getEdge cost + addEdge cost)). This is actually pretty expensive because get and add edges is expensive in jgrapht. -- The new approach uses the following algorithm: for each kmer1 -> kmer2 in each read add kmers 1 and 2 to a kmer counter, that counts kmer1+kmer2 in a fast hashmap for each kmer pair 1 and 2 in the hash counter add edge kmer1 -> kmer2 in the graph, if it's not present (does check) with multiplicity count from map update edge count by count from map if kmer1 -> kmer2 already existed in the graph -- This algorithm ensures that we add very much fewer edges -- Additionally, created a fast kmer class that lets us create kmers from larger byte[]s of bases without cutting up the byte[] itself. -- Overall runtimes are greatly reduced using this algorith	2013-04-10 17:10:59 -04:00
..
java	Fast DeBruijnGraph creation using the kmer counter	2013-04-10 17:10:59 -04:00
scala/test/org/broadinstitute/sting/queue/pipeline/examples	Fix improper dependencies in QScripts used by pipeline tests, and attempt to fix the flawed MisencodedBaseQualityUnitTest	2013-02-27 04:45:53 -05:00

Mark DePristo bf42be44fc Fast DeBruijnGraph creation using the kmer counter

-- The previous creation algorithm used the following algorithm:

for each kmer1 -> kmer2 in each read
  add kmers 1 and 2 to the graph
  add edge kmer1 -> kmer2 in the graph, if it's not present (does check)
  update edge count by 1 if kmer1 -> kmer2 already existed in the graph

-- This algorithm had O(reads * kmers / read * (getEdge cost + addEdge cost)).  This is actually pretty expensive because get and add edges is expensive in jgrapht.
-- The new approach uses the following algorithm:

for each kmer1 -> kmer2 in each read
  add kmers 1 and 2 to a kmer counter, that counts kmer1+kmer2 in a fast hashmap

for each kmer pair 1 and 2 in the hash counter
  add edge kmer1 -> kmer2 in the graph, if it's not present (does check) with multiplicity count from map
  update edge count by count from map if kmer1 -> kmer2 already existed in the graph

-- This algorithm ensures that we add very much fewer edges
-- Additionally, created a fast kmer class that lets us create kmers from larger byte[]s of bases without cutting up the byte[] itself.
-- Overall runtimes are greatly reduced using this algorith

2013-04-10 17:10:59 -04:00

java

Fast DeBruijnGraph creation using the kmer counter

2013-04-10 17:10:59 -04:00

scala/test/org/broadinstitute/sting/queue/pipeline/examples

Fix improper dependencies in QScripts used by pipeline tests, and attempt to fix the flawed MisencodedBaseQualityUnitTest

2013-02-27 04:45:53 -05:00