Commit Graph

124 Commits (f4b6afb42cfee1d845010c6e8b4eaa08a9c857db)

Author SHA1 Message Date
aaron 98e3a0bf1a VCF can now be emitted from SSG. The basic's are there (the genotype, read depth, our error estimate), but more fields need to be added for each record as nessasary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1797 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 19:50:04 +00:00
ebanks df8ea8f437 UG integration test. This was the old SSG test with MD5s updated.
I'll need to add some multi-sample tests in a bit...


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1791 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 17:43:58 +00:00
ebanks 008455915a One way of making the integration test stop failing is to remove it...
[waiting for Matt to cringe...]


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1789 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 17:08:41 +00:00
ebanks 04fe50cadd *** We no longer have a separate model for the single-sample case. ***
For now, a single sample input will be special-cased in the EM model - but that will change when the EM model degenerates to the single sample output with a single sample as input.  For now, the EM code for multi-samples isn't finished; I'm planning on checking that in soon.

The SingleSampleIntegrationTest now uses the UnifiedCaller instead of SSG, and so should all of you.  More on that in a separate email.
Other minor cleanups added too.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1785 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 14:08:57 +00:00
aaron f9a0eefe4b GELI_BINARY is now functional, and can be used as a variant type in SSG (-vf=GELI_BINARY). Also fixed the max mapping quality column in both GELI output formats, we haven't been correctly outputing up until now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1774 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 18:20:34 +00:00
aaron 7fc4472e6d A big fix for MergingSamRecordIterator, where we weren't correctly handling the comparisons of SAMRecords correctly (we weren't applying the new reference index first, so sometimes the MT contig would be ID 23, sometimes 24 in different records).
Also a fix to the GLF tests, and a correction to PrintReadsWalker to remove the close() on the output source, the source handles that itself (and you get a double close).

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1758 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 19:35:35 +00:00
ebanks 7249fade05 updated
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1756 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-02 18:10:34 +00:00
aaron 2e4949c4d6 Rev'ing Picard, which includes the update to get all the reads in the query region (GSA-173). With it come a bunch of fixes, including retiring the FourBaseRecaller code, and updated md5 for some walker tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1751 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 20:37:59 +00:00
hanna 70e1aef550 Better integrate the @ArgumentCollection into the command-line argument parser. Walkers can now specify their own @ArgumentCollections. Also cleaned up a bit of the CommandLineProgram template method pattern to minimize duplicate code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1746 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 22:23:19 +00:00
aaron d262cbd41c changes to add VCF to the rod system, fix VCF output in VariantsToVCF, and some other minor changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1715 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 15:16:11 +00:00
ebanks b0fa19a0b2 Fixed recal integration test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1689 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:22:32 +00:00
ebanks 6780476fb5 updated to deal with new dbSNP rod
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1687 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 19:46:32 +00:00
aaron 7bfb5fad27 fixing the dbSNP test. Also removing unnessasary comments from the GenomeLocParser, added some tests, and commented out the performance test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1676 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 23:32:24 +00:00
asivache a6bd509593 Changing the carpet under your feet!! New incremental update to th eROD system has arrived.
all the updated classes now make use of new SeekableRodIterator instead of RODIterator. RODIterator class deleted. This batch makes only trivial updates to tests dictated by the change in the ROD system interface. Few less trivial updates to follow. This is a partial commit; a few walkers also still need to be updated, hold on...

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1667 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 16:55:22 +00:00
aaron 7b39aa4966 Adding the VCF ROD. Also changed the VCF objects to much more user friendly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1658 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 20:19:34 +00:00
hanna 01a9b1c63b Fix for problem where err stream remapped to output stream in certain cases, (hopefully) completing Matt's hat trick of fail. Thanks, unit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1634 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 08:33:56 +00:00
aaron eedf55e94d temp fix for a broken test, we'll fix the test tomorrow. We promise, we're engineers, we love our tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1633 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 04:36:42 +00:00
aaron b401929e41 incremental clean-up and changes for VariantEval, moved DiploidGenotype to a better home, and fixed a spelling error.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1624 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 04:48:42 +00:00
ebanks 6783fda42a Updated unit test to reflect changes to vcf output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1623 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 01:56:08 +00:00
aaron e03fccb223 Changes to switch Variant Eval over to the new Variation system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1611 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 05:34:33 +00:00
aaron 5b41ef5f70 rod DBSNP had a bug where the reference wasn't calculated correctly under certain conditions. Fixed getRefBasesFWD and getRefSnpFWD so that they were more in line with getAltBasesFWD and getAltSnpFWD. Also updated Variant Eval tests to reflect this change.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1609 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 23:48:58 +00:00
depristo 6e13a36059 Framework for ROD walkers -- totally experiment and not working right now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1600 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:13:15 +00:00
ebanks e24c8d00d5 So, the VCF spec allows for an optional meta field in the header representing the date. However, using this field means that integration tests run on the vcf file will fail the MD5 test (which is what happened to the VariantFiltration test this morning after working just fine yesterday).
After consulting our resident expert (Aaron), we're going to (temporarily) remove the date from the vcf output until we can come up with a better solution.  However, this shouldn't cause any short-term problems because the data truly is optional.
VF test's MD5s are updated.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1580 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 14:28:43 +00:00
aaron 5a64a80ab5 changes to the variation class, updates to SSG, updated tests based on changes to the SSGenotypeCall, and added the ability to run a single integration test from using the build script.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1577 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 04:31:33 +00:00
ebanks 1362a56227 Added fasta tests and small fix to cleaner test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1575 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 03:13:11 +00:00
ebanks 8ca89279aa Added a test for VariantFiltration and the VECs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1574 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 02:21:14 +00:00
ebanks bed646e4f6 Adding cleaner test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1561 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 16:05:56 +00:00
depristo d9588e6083 bug fixes to LIBS and LIBH following ultra-aggressive regression testing across 454, solid, and solexa
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1558 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:36:12 +00:00
aaron 0df6a9da5c -Seperating out normal (unit) tests and integration tests. From now on if your test are more of an integration test (i.e. you're testing a walker and all the subunits it relies on) please name the test "______IntegrationTest.java" instead of "______Test.java".
-Bamboo will now run the integration tests once a day, and the normal units tests on each check-in.

-Also added a bunch of unit tests for VariantEval walker

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1555 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:01:40 +00:00
depristo eeb9b6eb13 GenotypeLikelhoods now support a cache per subclass, avoiding genotyping clashes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1554 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 10:39:14 +00:00
ebanks 0cc219c0df -Added unit test for walkers dealing with intervals for cleaning
-I also uncovered a corner case in the cleaner that for some reason was commented out but shouldn't have been.  Hooray for unit tests!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1553 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 02:35:17 +00:00
depristo ec0f6f23c7 LocusIterationByState is now the system deafult. Fixed Aaron's build problem
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1552 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 01:28:05 +00:00
depristo 1c3d67f0f3 Improvements to the CountCovariates and TableRecablirator, as well as regression tests for SLX and 454 data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1539 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 22:26:57 +00:00
depristo 2b0d1c52b2 General WalkerTest framework. Includes some minor changes to GATK core to enable creation of true command-line like GATK modules in the code. Extensive first-pass tests for SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1538 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 19:13:37 +00:00
aaron 0cc634ed5d -Renamed rodVariants to RodGeliText
-Remove KGenomesSNPROD
-Remove rodFLT
-Renamed rodGFF to RodGenotypeChipAsGFF
-Fixed a problem in SSGenotypeCall
-Added basic SSGenotype Test class
-Make VCFHeader constructors public

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1536 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 18:40:43 +00:00
depristo a08c68362e Renaming error to getNegLog10PError(); added Cached clearing method to GL; SSG now has a CallResult that counts calls; No more Adding class to System.out, now to logger.info; First major testing piece (and general approach too) to unit testing of a walker -- SingleSampleGenotyper now knows how many calls to make on a particular 1mb region on NA12878 for each call type and counts the number of calls *AND* the compares the geli MD5 sum to the expected one!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1530 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 12:39:06 +00:00
depristo 49a7babb2c Better organization of Genotype likelihood calculations. NewHotness is now just GenotypeLikelihoods. There are 1, 3, and empirical base error models available as subclasses, along with a simple way to make this (see the factory).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1481 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 19:16:30 +00:00
depristo 8e129d76fd Support for original quality scores OQ flag. pQ flag in TableRecalibation to preserve quality scores below a threshold (defaulting to 5)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1474 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 14:14:21 +00:00
depristo 37a9b84276 corresponding test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1470 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 00:17:42 +00:00
hanna ccdb4a0313 General-purpose management of output streams.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1454 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-23 00:56:02 +00:00
aaron d101c20b30 added the ability to pass in a csv file of ROD triplets (one triplet per line) to the -B option
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1412 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 22:10:20 +00:00
aaron d69ae60b69 fixed two tests affected by my previous commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1408 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 17:57:50 +00:00
hanna dd228880ed Partially implemented NewHotnessGenotypeLikelihoodsTest caused the tests to fail.
Ouch!  So hot it burned me.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1403 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 20:45:44 +00:00
depristo a864c2f025 Updated polarized reference priors, need DiploidGenotypePriors class that is directly used by the NewHotness genotypelikelihoods, more bug fixes and refactoring, etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1390 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 19:00:06 +00:00
depristo bbd7bec5db Continuing cleanup of SSG. GenotypeLikelihoods now have extensive testing routines. DiploidGenotype supports het, homref, etc calculations. SSG has been cleaned up to remove old garbage functionality. Also now supports output to standard output by simply omitting varout
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1387 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 22:25:30 +00:00
hanna 48713e154c Windowed access to the reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1383 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 16:29:15 +00:00
hanna 21d1eba502 Cleaned division of responsibilities between arguments to map function. Reference has been changed
from an array of bases to an object (ReferenceContext), and LocusContext has been renamed to reflect
the fact that it contains contextual information only about the alignments, not the locus in general.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1376 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 21:01:37 +00:00
aaron bca894ebce Adding the intial changes for the new Genotyping interface. The bullet points are:
- SSG is much simpler now
- GeliText has been added as a GenotypeWriter
- AlleleFrequencyWalker will be deleted when I untangle the AlleleMetric's dependance on it
- GenotypeLikelihoods now implements GenotypeGenerator, but could still use cleanup

There is still a lot more work to do, but this is a good initial check-in.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1335 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 19:43:59 +00:00
hanna 7a13647c35 Support for specifying SAMFileReaders and SAMFileWriters as @Arguments directly. *Very*
rough initial implementation, but should provide enough support so that people can stop
creating SAMFileWriters in reduceInit.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1332 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 16:11:45 +00:00
hanna 2db86b7829 Move the cleaned read injector test from playground to core. Remove CovariateCounterTest's dependency on the CleanedReadInjector. Start doing a bit of cleanup on the CLP's FieldParsers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1312 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 19:44:04 +00:00