-- For the pooled caller we were writing diploid no-calls even when other samples were haploid. Changed maxPloidy function to return a defaultPloidy, rather than 0, in the case where all samples are missing.
-- VCF/BCF Writers now create missing genotypes with the ploidy of other samples, or 2 if none are available at all.
-- Updating integration tests for general ploidy, as previously we wrote ./. even when other calls were 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1/1/1/1/1, but now we write ./././././././././././././././././././././././. (ugly but correct)
-- Previous code was looking for a -1 result from maxPloidy() but the result as actually 0, so instead of writing a diploid no call we were actually writing "unavailable" genotypes, and failing the BCF == VCF test in integration tests. Fixed.
-- Turns out this was consuming 30% of the UG runtime, and causing problems elsewhere.
-- Removed addMissingSamples from VariantcontextUtils, and calls to it
-- Updated VCF / BCF writers to automatically write out a diploid no call for missing samples
-- Added unit tests for this behavior in VariantContextWritersUnitTest
1) SelectVariants could throw a ReviewedStingException (one of the nasty "Bug:") ones if the user requested a sample that wasn't present in the VCF. The walker now
checks for this in the initialize() phase, and throws a more informative error if the situation is detected. If the user simply wants to subset the VCF to
all the samples requested that are actually present in the VCF, the --ALLOW_NONOVERLAPPING_COMMAND_LINE_SAMPLES flag changes this UserException to a Warning,
and does the appropriate subsetting. Added integration tests for this.
2) GenotypeLikelihoods has an unsafe method getLog10GQ(GenotypeType), which is completely broken for multi-allelic sites. I marked that method
as deprecated, and added methods that use the context of the allele ordering (either directly specified or as a VC) to retrieve the appropriate GQ, and
added a unit test to cover this case. VariantsToBinaryPed needs to dynamically calculate the GQ field sometimes (because I have some VCFs with PLs but no GQ).
-- Now prints out a single combined NanoScheduler runtime profile report across all nano schedulers in use. So now if you run with -nt 4 you'll get one combined NanoScheduler profiler across all 4 instances of the NanoScheduler within TraverseXNano.
-- Basically you cannot safely use instance specific ThreadLocal variables, as these cannot be safely cleaned up. The old implementation kept pointers to old writers, with huge tribble block indexes, and eventually we crashed out of integration tests
-- See http://weblogs.java.net/blog/jjviana/archive/2010/06/10/threadlocal-thread-pool-bad-idea-or-dealing-apparent-glassfish-memor for more information
-- New implementation uses a borrow/return schedule with a list of N TraversalEngines managed by the MicroScheduler directly.
-- Can now say -nt 4 and -nct 4 to get 16 threads running for you!
-- TraversalEngines are now ThreadLocal variables in the MicroScheduler.
-- Misc. code cleanup, final variables, some contracts.
-- TraversalProgressMeter now completely generalized, named ProgressMeter in utils.progressmeter. Now just takes "nRecordsProcessed" as an argument to print reads. Completely removes dependence on complex data structures from TraversalProgressMeter. Can be used to measure progress on any task with processing units in genomic locations.
-- a fairly simple, class with no dependency on GATK engine or other features.
-- Currently only used by the TraversalEngine / MicroScheduler but could be used for any purpose now, really.