Commit Graph

12 Commits (982192e2e48bcea94c5fbbd342c6a06bc6483e4f)

Author SHA1 Message Date
Mark DePristo 982192e2e4 MD5DB for integrationtest management now writes out a md5mismatches files for clean analysis
-- This file is in integrationtests/md5mismatches.txt, and looks like:

expected        observed        test
7fd0d0c2d1af3b16378339c181e40611        2339d841d3c3c7233ebba9a6ace895fd        test BeagleOutputToVCF
43865f3f0d975ee2c5912b31393842f8        1b9c4734274edd3142a05033e520beac        testBeagleChangesSitesToRef
daead9bfab1a5df72c5e3a239366118e        27be14f9fc951c4e714b4540b045c2df        testDiffObjects:master=/local/dev/depristo/itest/public/testdata/diffTestMaster.vcf,test=/local/dev/depristo/itest/public/testdata/diffTestTest.vcf,md5=daead9bfab1a5df72c5e3a239366118e

-- Associated cleanup with making md5db an instantiated object, rather than a bunch of static methods
2012-06-14 16:42:27 -04:00
Mark DePristo 17fbd103d0 Smarter infrastructure to decode genotypes in BCF
-- Eliminated the large intermediate map from field name to list of list<Integer> values needed to create genotypes without the GenotypeBuilder.  The new code is cleaner and simply fills in an array of GenotypeBuilders as it moves through the column layout in BCF2
-- Now we create once decoders specialized for each GT field (GT, AD, etc) that can be optimized for putting data into the GenotypeBuilder.  In a subsequent commit these will actually use lower level BCF2 decoders to create the low-level ints and int[], avoiding the intermediate List<Integer> form
-- Reduced the amount of data further to be computed in the DiffEngine.  The DiffEngine algorithm needs to be rethought to be efficient...
2012-06-14 16:42:25 -04:00
Mark DePristo 454c8e63e6 Made GQ an int, not a float. Updated VC code and lots of corresponding MD5s
-- VCFWriter / codec now passes the same rigorous UnitTest as the BCF2 writer / codec.  As part of this we now can only test doubles for equivalence in VCFs to 1e-2 (not exactly impressive)
2012-05-28 20:20:05 -04:00
Mark DePristo 86e5a066fc Even more conservative limit on number of differences to summarize at 1000 2012-05-27 11:17:13 -04:00
Mark DePristo 31f4e5b52e Stop unlimited runtimes in DiffEngine when you have lots of differences
-- Added a new parameter to control the maximum number of pairwise differences to generate, which previously could expand to a very large number when there were lots of differences among genotypes, resulting in a n^2 algorithm running with n > 1,000,000
2012-05-27 11:17:13 -04:00
Mark DePristo e9c22b9aad Final updates to integration tests for BCF2
-- Fully working version
-- Use -generateShadowBCF to write out foo.bcf as well as foo.vcf anywhere you use -o foo.vcf
-- Moved MedianUnitTest to its proper home in Utils
-- Added reportng to ivy and testng, so build/report/X/html/ is a nicely formatted output for Unit and Integration tests.  From this website it's easy to see md5 diffs, etc.  This is a vastly better way to manage unit and integration test output
2012-05-24 10:58:59 -04:00
Mark DePristo 404ef741f1 Merged bug fix from Stable into Unstable 2011-10-13 18:02:06 -04:00
Mark DePristo 2ebdff074c Update MD5s for SOLiD recalibration
-- MD5 db had spelling error; fixed
-- Bug in AlignmentUtils resulted in some bases not being color space corrected.  The integration test caught the change, and it's clear that the new version is correct, as the prev. version was not considering the last the N qualities for reads with a ND operation.
2011-10-13 18:01:51 -04:00
Mark DePristo 463eab7604 All MD5 mismatches for test are shown
-- Now for tests like DoC, with 20 output md5s, you see all of the differences before failing.
2011-10-04 15:53:52 -07:00
Mark DePristo f6a5e0e36a Go for global integrationtest path first, if possible. 2011-07-26 17:35:30 -04:00
Mark DePristo 3afcb3415d Max of 1000 records will be loaded and compared to avoid heap size problem. 2011-07-25 14:58:31 -04:00
Mark DePristo d6e2e89f99 Walker test system refactoring. All MD5DB related functions are now in MD5DB.java.
System has the concept of a local and a global MD5 db.  The local one is like it operated previously.  The global one lives in /humgen/gsa-hpprojects/GATK/data/integrationtests.  If the system can find this directory then MD5s will also be read / written to this location.  This means that gsabamboo will print differences as appropriate.  And all users will in effect have access to a complete history of MD5 file results.
A few minor code reshuffles changed VariantRecalibration and VCFHeader test files.
2011-07-18 10:46:01 -04:00