Too many users (with RNASeq reads) are hitting these exceptions that were never supposed to happen. Let's give them (and us) a better and clearer error message.
-- The new code includes a new mode to write out a BAM containing reads realigned to the called haplotypes from the HC, which can be easily visualized in IGV.
-- Previous functionality maintained, with bug fixes
-- Haplotype BAM writing code now lives in utils
-- Created a base class that includes most of the functionality of writing reads realigned to haplotypes onto haplotypes.
-- Created two subclasses, one that writes all haplotypes (previous functionality) and a CalledHaplotypeBAMWriter that will only write reads aligned to the actually called haplotypes
-- Extended PerReadAlleleLikelihoodMap.getMostLikelyAllele to optionally restrict set of alleles to consider best
-- Massive increase in unit tests in AlignmentUtils, along with several new powerful functions for manipulating cigars
-- Fix bug in SWPairwiseAlignment that produces cigar elements with 0 size, and are now fixed with consolidateCigar in AlignmentUtils
-- HaplotypeCaller now tracks the called haplotypes in the GenotypingEngine, and returns this information to the HC for use in visualization.
-- Added extensive docs to HaplotypeCaller on how to use this capability
-- BUGFIX -- don't modify the read bases in GATKSAMRecord in LikelihoodCalculationEngine in the HC
-- Cleaned up SWPairwiseAlignment. Refactored out the big main and supplementary static methods. Added a unit test with a bug TODO to fix what seems to be an edge case bug in SW
-- Integration test to make sure we can actually write a BAM for each mode. This test only ensures that the code runs and doesn't exception out. It doesn't actually enforce any MD5s
-- HaplotypeBAMWriter also left aligns indels in the reads, as SW can return a random placement of a read against the haplotype. Calls leftAlign to make the alignments more clear, with unit test of real read to cover this case
-- Writes out haplotypes for both all haplotype and called haplotype mode
-- Haplotype writers now get the active region call, regardless of whether an actual call was made. Only emitting called haplotypes is moved down to CalledHaplotypeBAMWriter
Hoping that the higher class of storage will get us down from the current
~40 minutes for a parallel run of the integration tests to the goal of
~20 minutes.
-accept global timeout as a command-line argument
-kill outstanding jobs when timeout reached
-print job output files to stdout so that they get recorded in bamboo's logs
-periodically print number of jobs outstanding during run
-documentation / comments
This is to facilitate the current experiment with class-level test
suite parallelism. It's our hope that with these changes, we can get
the runtime of the integration test suite down to 20 minutes or so.
-UnifiedGenotyper tests: these divided nicely into logical categories
that also happened to distribute the runtime fairly evenly
-UnifiedGenotyperPloidy: these had to be divided arbitrarily into two
classes in order to halve the runtime
-HaplotypeCaller: turns out that the tests for complex and symbolic
variants make up half the runtime here, so merely moving these into
a separate class was sufficient
-BiasedDownsampling: most of these tests use excessively large intervals
that likely can't be reduced without defeating the goals of the tests. I'm
disabling these tests for now until they can either be redesigned to use smaller
intervals around the variants of interest, or refactored into unit tests
(creating a JIRA for Yossi for this task)
* Fixed GenomeLocSortedSet.add() to ensure that overlapping intervals are detected and an exception is thrown.
* Fixed GenomeLocSortedSet.addRegion() by merging it with the add() method; it now produces sorted inputs in all cases.
* Cleaned up duplicated code throughout the engine to create a list of intervals over all contigs.
* Added more unit tests for add functionality of GLSS.
* Resolves GSA-775.
-script to dispatch one farm job per test class and monitor jobs until completion
-new ant target to run tests without doing ANY compilation or extra steps at all
allows multiple instances of the test suite to share the same working directory
-- updateConsensus now don't call remove when it's updating the entire db from scratch. This radically improves performance when you are simply dropping the entire consensus and rebuilding from scratch, as the server does upon start up
-- Refactor initialization routine into BadSitesWriter. This now adds the GQ and DP genotype header lines which are necessarily if the input VCF doesn't have proper headers
-- GATKVariantContextUtils subset to biallelics now tolerates samples with bad GL values for multi-allelics, where it just removes the PLs and issues a warning.
* Split the cases into reads that don't have a RG at all vs. those with a RG that's not defined in the header.
* Added integration tests to make sure that the correct error is thrown.
* Resolved GSA-407.
* Removed from codebase NestedHashMap since it is unused and untested.
* Integration tests change because the BQSR CSV is now sorted automatically.
* Resolves GSA-732
-- Compares HC and UG performance on single deep genomes and 140 WEx bams over small intervals, as well as deep WGS reduced data and 1000G 4x data
-- Added mode to GATKPerformanceOverTime to include lots of versions, so we can make beautiful graphs of the cost of tools over many versions as well
-- Marginally better plots for multiple iterations in GATKPeformanceOverTime.R
-Some QScripts used by public pipeline tests unnecessarily used the (now protected) UnifiedGenotyper.
Changed them to use PrintReads instead.
-Moved ExampleUnifiedGenotyperPipelineTest to protected
-Attempt to fix the flawed and sporadically failing MisencodedBaseQualityUnitTest:
After looking at this class a bit, I think the problem was the use of global arrays for the quals
shared across all reads in all tests (BAMRecord class definitely does not make a separate copy for
each read!). One test (testFixBadQuals) modifies the bad quals array, and if this happens to run
before the testBadQualsThrowsError test the bad quals array will have been "fixed" and no exception
will be thrown.
-replace unnecessary uses of the UnifiedGenotyper by public integration tests
with PrintReads
-move NanoSchedulerIntegrationTest to protected, since it's completely dependent
on the UnifiedGenotyper