Added a simple utility method Utils.optimumHashSize() to calculate the optimum

initial size for a Java hash table (HashMap, HashSet, etc.) given an expected
maximum number of elements. The optimum size is the smallest size that's
guaranteed not to result in any rehash / table-resize operations.

Example Usage:
Map<String, Object> hash = new HashMap<String, Object>(Utils.optimumHashSize(expectedMaxElements));

I think we're paying way too heavy a price in unnecessary rehash operations across
the GATK. If you don't specify an initial size, you get a table of size 16 that gets
completely rehashed and doubles in size every time it becomes 75% full. This means you
do at least twice as much work as you need to in order to populate your table:

(n + n/2 + n/4 + ... 16 ~= (1 + 1/2 + 1/4...) * n ~= 2 * n
This commit is contained in:
David Roazen 2011-08-02 21:59:06 -04:00
parent df37716857
commit d3437e62da
1 changed files with 15 additions and 0 deletions

View File

@ -42,6 +42,21 @@ public class Utils {
/** our log, which we want to capture anything from this class */
private static Logger logger = Logger.getLogger(Utils.class);
public static final float JAVA_DEFAULT_HASH_LOAD_FACTOR = 0.75f;
/**
* Calculates the optimum initial size for a hash table given the maximum number
* of elements it will need to hold. The optimum size is the smallest size that
* is guaranteed not to result in any rehash/table-resize operations.
*
* @param maxElements The maximum number of elements you expect the hash table
* will need to hold
* @return The optimum initial size for the table, given maxElements
*/
public static int optimumHashSize ( int maxElements ) {
return (int)(maxElements / JAVA_DEFAULT_HASH_LOAD_FACTOR) + 2;
}
public static String getClassName(Class c) {
String FQClassName = c.getName();
int firstChar;