Commit Graph

587 Commits (ff26f2bf688048bbd6e2b9ffcf31cedce4fa99dd)

Author SHA1 Message Date
Mark DePristo 8b1adb8c95 Removed getVariantContext() code 2011-08-01 13:41:09 -04:00
Mark DePristo f69bff5dd6 Commented out, because these fail the now removed dbSNP conversion. 2011-08-01 13:34:25 -04:00
Mark DePristo 7b07c4e04e RefMetaDataTracker now has get() methods accepting RodBindings
RodBinding no longer duplicates the get() methods in RMDT.  This is just an object now that connects the command line system to the RMDT.
Updated programs to use new style
Added UnitTests for the RodBinding accessors.
2011-07-30 15:34:11 -04:00
Mark DePristo 3b799db61a RefMetaDataTracker cleanup and unit tests
You know have to provide an explicit list of RODRecordLists upfront to the constructor.  RefMetaDataTracker is now immutable.  Changes in engine to incorporate these differences
Extensive UnitTests for RefMetaDataTracker now.
2011-07-29 13:23:17 -04:00
Mark DePristo 39b4e76fde Continuing refactoring of RefMetaDataTracker.
On the path towards converging getVariantContext() and getValues() in tracker so that we can have a single approach to get values from RODs with the new RodBinding() types
2011-07-28 17:48:28 -04:00
Mark DePristo 7c5c656b46 Uncovered fundamental accounting bug in VariantEval. Will be fixed by dev. team
Problem is that Novelty sees multiple records at a site (SNP, INDEL) to calculate whether a site is novel, but VariantEvalWalker makes an arbitrary decision which to use for analysis and CompOverlap may not see a comp record of the same type as eval.  So you get lines where the stratification is known but there are 10 novel sites!
2011-07-28 14:19:27 -04:00
Eric Banks 33b32c4211 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-28 13:57:22 -04:00
Eric Banks 7a2a65155f Merged bug fix from Stable into Unstable 2011-07-28 13:56:43 -04:00
Eric Banks 1afc49a297 There are some really 'interesting' (but apparently valid) records in the Mus musculus dbSNP file. Generalized the handling of complex cases in the dbSNP adaptor to handle it all. I just grabbed the actual Mus musculus dbSNP file as a test, ran it whole genome, and confirmed that we finally produce a valid VCF on it. Should be the last commit needed on this adaptor. 2011-07-28 13:55:58 -04:00
Mark DePristo c83f9432eb Cleaned up RefMetaDataTracker
Renamed many functions to more clearly state what they are actually doing
Removed unnecessary / unused functionality, reducing interface complexity
Updated all uses of this code in GATK
Added generic, type-safe accessors to RefMetaDataTracker such as public <T> List<T> getValues(final String name, Class<T> clazz)
Added standard refMetaDataTracker accessors to RodBinding, so you can do everything you can for generic rods with the tracker directly with with the RodBinding
2011-07-27 23:25:52 -04:00
Eric Banks 1865211b6d Merged bug fix from Stable into Unstable 2011-07-27 22:52:06 -04:00
Eric Banks 6230315ff2 Along with my half-written commit message from earlier, I also forgot to commit the integration test updates. This is what happens when you try to do things 30 seconds before you leave for the day. To finish up from before: complex events weren't being padded with the reference base as per the VCF spec. They are now. 2011-07-27 22:51:21 -04:00
Eric Banks ff31fa7990 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-27 16:15:23 -04:00
Mark DePristo 15be383d5b Merge branch 'master' into rodRefactor 2011-07-27 15:36:49 -04:00
Mark DePristo 38a2518668 Merge branch 'master' into rodRefactor 2011-07-27 15:34:54 -04:00
Kiran V Garimella 405e521d44 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-26 17:56:48 -04:00
Kiran V Garimella 92a11ed8dc Updated MD5 for PhaseByTransmissionIntegrationTest 2011-07-26 17:52:25 -04:00
Mark DePristo f6a5e0e36a Go for global integrationtest path first, if possible. 2011-07-26 17:35:30 -04:00
Eric Banks a53aeb75ab Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-25 15:10:35 -04:00
Eric Banks a29554e565 Removing the Genomic Annotator and its supporting classes 2011-07-25 15:10:25 -04:00
Mark DePristo 3afcb3415d Max of 1000 records will be loaded and compared to avoid heap size problem. 2011-07-25 14:58:31 -04:00
Mark DePristo 2a51543693 Actually should have been gone... 2011-07-25 13:27:42 -04:00
Mark DePristo ebfd8df06c Restoring accidentially deleted unit test 2011-07-25 13:25:30 -04:00
Mark DePristo f3049fba63 refdata directory cleanup
Removing unused files RODRecordIterator, ReferenceOrderedData, QueryableTrack, RMDTrackCreationException, GATKFeatureIterator, ReferenceOrderedDataUnitTest
Refactored dbSNP and refseq utilities to be closer to the other files implementing these features
2011-07-25 13:21:52 -04:00
Kiran V Garimella bbb8473f03 Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-25 10:59:00 -04:00
Mark DePristo 1d3bcce2c4 Merge branch 'master' into NoDistributedGATK 2011-07-23 20:04:50 -04:00
Kiran V Garimella 0b36b6540f Merge branch 'laptop' 2011-07-23 01:44:54 -04:00
Kiran V Garimella e23cb27451 Modified MD5 to account for the triple hets that shouldn't be phased 2011-07-23 01:44:44 -04:00
Kiran V Garimella f366124778 Merge branch 'laptop' 2011-07-23 01:25:36 -04:00
Kiran V Garimella 45f2ca8d99 Changed MD5 to reflect latest changes to PhaseByTransmission. 2011-07-23 01:21:07 -04:00
Kiran V Garimella b5deff48e6 Merge branch 'laptop' 2011-07-23 00:56:50 -04:00
Kiran V Garimella 5638017137 Removed the nofilters argument specification in the integrationtest 2011-07-23 00:56:23 -04:00
Kiran V Garimella ffa361f57f Merge branch 'laptop' 2011-07-23 00:50:38 -04:00
Kiran V Garimella 9417ba8c2c Modified to accept multi-sample VCFs, removed the application of filters, and changed transmission probability field to be a genotype field rather than an INFO field. 2011-07-23 00:48:26 -04:00
Matt Hanna f50145b872 Reinitialize random seed in the bwa bindings from the fixed seed stored in the
BWA support files every time the support files are loaded.
2011-07-22 13:41:53 -04:00
Mark DePristo 172b35372b Moved all of the distributed GATK code to archive. 2011-07-22 09:20:32 -04:00
Khalid Shakir 8b8f121cfb Merge branch 'master' of ssh://gsa3.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-21 23:01:11 -04:00
Khalid Shakir 59eb1f4663 Memory limits changed from Int to Double.
Updated LSF calls to read memory units from config along with tweaks to select hosts.
Moved some common code from GridEngine and LSF to super classes.
2011-07-21 22:57:18 -04:00
Matt Hanna 7054c5342f When using the BWA bindings, you have to explicitly call close() to get the
bindings to release memory.
It may or may not be possible to implicitly close triggered by the GC; I'll add a JIRA.
2011-07-21 12:13:29 -04:00
Christopher Hartl 15610ce0c3 Per Matt's request, disabling BWA-based integration tests so he can assess bamboo memory usage. 2011-07-21 11:04:22 -04:00
Mark DePristo d31b176e15 Removed GATK use of distributed parallelism framework.
Moved distributed GATK prototype code into distributedutils, separating from threading package
2011-07-20 16:26:09 -04:00
Christopher Hartl 5d706c9e92 Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
Removing PSP and CSM

Conflicts:

	public/java/src/org/broadinstitute/sting/gatk/walkers/sequenom/CreateSequenomMask.java
	public/java/src/org/broadinstitute/sting/gatk/walkers/sequenom/PickSequenomProbes.java
2011-07-19 20:25:33 -04:00
Christopher Hartl 92c7cfa1c8 BWA bindings and tests moved to public (was required for ValidationAmplicons)
Integration tests for ValidationAmplicons. New argument to disable BWA, lowercase letters only for repetitiveness instead.
2011-07-19 20:11:31 -04:00
David Roazen baae381acb Revert "Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable"
This reverts commit 039a6bb01f345322ce2be50ae3634308bb24e77e, reversing
changes made to b9c9973d1c638dfc9f8c19b5eb845e99844f9d29.
2011-07-19 18:38:53 -04:00
Mark DePristo 8f0badc52b Updating md5s, as the diffobjects walker now emits the summary in reverse order. 2011-07-18 15:44:21 -04:00
Eric Banks 83ba2c066a Making it deterministic 2011-07-18 13:59:02 -04:00
Eric Banks 80b5c5261a CombineVariants no longer combines records of different types. So now when combining SNP and indel callsets, overlapping calls get their own records. Useful for Khalid in the pipeline. For those interested, it turns out the previous behavior was doing the wrong thing occasionally (and this was even captured in the integration tests). 2011-07-18 13:42:45 -04:00
Mark DePristo 51b0dd01c3 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-18 10:47:29 -04:00
Mark DePristo d6e2e89f99 Walker test system refactoring. All MD5DB related functions are now in MD5DB.java.
System has the concept of a local and a global MD5 db.  The local one is like it operated previously.  The global one lives in /humgen/gsa-hpprojects/GATK/data/integrationtests.  If the system can find this directory then MD5s will also be read / written to this location.  This means that gsabamboo will print differences as appropriate.  And all users will in effect have access to a complete history of MD5 file results.
A few minor code reshuffles changed VariantRecalibration and VCFHeader test files.
2011-07-18 10:46:01 -04:00
Mark DePristo 6f26c07b85 Removed the SpecificDifference class. Now Difference classes always have the option to remember specific master and test values. This means that all summarized differences carry with them specific examples of their differences. Consequently, now even summarized differences give at least one example of the specific difference, even when the count of the difference is > 1. Unit tests updated. Added DiffObjects integrationtest. VCFDiffableReader now specifically reads the first line of the VCF file to capture the version number. 2011-07-18 10:42:35 -04:00
Kiran V Garimella ac9c66138d Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-18 00:20:33 -04:00
Kiran V Garimella 824100e57f Corrected typo in MergeAndMatchHaplotypes integration test 2011-07-17 22:50:54 -04:00
Kiran V Garimella 8167aba601 Moved (poorly named) MergeAndMatchHaplotypes to public. Added integration test 2011-07-17 22:47:32 -04:00
Kiran V Garimella afb506e128 Added MD5s for PhaseByTransmission integration tests 2011-07-17 21:55:33 -04:00
Kiran V Garimella 558e197989 Integration test for PhaseByTransmission 2011-07-17 21:25:08 -04:00
Mark DePristo 9ca9cf52ac Uncommenting a stray commented test. 2011-07-17 15:38:33 -04:00
Mark DePristo 4db2b13e9e Rev tribble.
Just added more documentation for diffEngine and pointer to new wiki:

http://www.broadinstitute.org/gsa/wiki/index.php/DiffEngine
2011-07-17 13:05:04 -04:00
Mark DePristo eacf205f40 Tests needed to be updated to reflect the code reorg of tribble. 2011-07-16 09:22:34 -04:00
Mark DePristo c0bbeb23ba Now providing more information when the index on the fly isn't equal to the one created by reading the file from disk. 2011-07-14 15:12:28 -04:00
Eric Banks 9540df6998 Oops, forgot to update unit test 2011-07-14 14:00:19 -04:00
Eric Banks bb0e3a26fc Added integration test for VCF writing. Also, bug fix for writing the GT-free records. 2011-07-13 14:57:21 -04:00
Eric Banks 6a431da554 Don't output source and ref header lines anymore. Short-term motivation for this is that I'd like this tool when run on a VCF to emit the exact same VCF. Long-term motivation is that these tags should be output by the VCF writer itself for all tools. 2011-07-13 14:40:01 -04:00
Eric Banks 969227c657 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-13 10:01:28 -04:00
Eric Banks 797c50e689 Fixing integration tests I broke yesterday; removing batch merging test since we don't support that anymore. 2011-07-13 10:01:23 -04:00
Ryan Poplin 837fb8f689 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-12 15:39:26 -04:00
Ryan Poplin 5077c94d85 Adding MappingQualityUnavailableReadFilter to the SNP and indel CountCovariates 2011-07-12 15:39:07 -04:00
Mark DePristo 01fd6a6949 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-07-12 15:20:44 -04:00
Mark DePristo ccedd6ff4c Difference is now the general form -- used to be SummarizedDifference. The old Difference class is now a subclass of Difference that includes pointers to specific the master and test DiffElements.
Added a size() function that calculates the number of elements tree from a DiffElement.
2011-07-12 15:20:28 -04:00
Eric Banks a2597e7f00 This commit incorporates several different changes that each pretty much break all the VCF-based integration tests, so I bunched them all together. We now officially emit VCF4.1 files (woo hoo), which means that the VCF headers are now all different (header version is 4.1 plus counts for some of the annotations are 'A' or 'G'). Also, I've added a Read Filter for reads with MQ=255 ('unavailable' in the SAM spec) and have applied this to the UG and the RMS MQ annotation. 2011-07-12 14:11:53 -04:00
Mark DePristo 05212aea62 reader now takes an argument for the maximum number of elements to read from the file. 2011-07-12 08:53:19 -04:00
Mark DePristo f313e14e4e Now deletes the dump directory on ant clean
Moving diffengine tests from private to public
2011-07-12 08:50:58 -04:00
Mark DePristo 5e593793af DiffEngine utility function simpleDiffFiles
printSummaryReport now uses GATKReport for nice formating
Moved print formatting arguments into inner class provided to printing functions themselves, not the class
BAMDiffableReader only reads 1000 entries to avoid performance issue.  Work around for BAM files with non-unique names
Uncommented all of the incorrectly commented out CombineVariants integrationtests
BaseTest now uses DiffEngine to provide inline differences to VCF and BAM files
2011-07-11 23:10:27 -04:00
Mark DePristo ccf34f7e45 (1) Added very useful helper class TestDataProvider to BaseTest that making creating data providers for TestNG far easier
(2) DiffEngine now officially working with with summaries.  Extensive UnitTests all around!
2011-07-06 21:57:22 -04:00
Ryan Poplin 17ff5bb094 Variant records coming out of the VQSR are now annotated with which input annotation was most divergent from the Gaussian mixture model. This gives a general sense for why each variant was removed from the callset. 2011-07-02 09:55:35 -04:00
Khalid Shakir b6bc64a0c8 Cleanup of the utils.broad package.
Using Picard IoUtils on sample names.
2011-07-01 20:47:03 -04:00
David Roazen d647ea4fdc Long-delayed change to CachingIndexedFastaSequenceFile. Made the cache
non-static to avoid problems when multiple references are used within the same
thread (eg., during integration tests). This should kill the intermittent
IndelRealignerIntegrationTest failures.
2011-07-01 16:04:30 -04:00
Eric Banks 761347b8d5 The VariantContext utility method used by SelectVariants wasn't checking the filter status (unfiltered vs. passing filters) and always returned a VC that was passing filters. This is fixed and the md5 from the VCF Streaming test has been re-updated. 2011-06-30 15:26:09 -04:00
Mauricio Carneiro 867056af51 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-06-30 15:03:18 -04:00
Mark A. DePristo defa3cfe85 Moved around private walkers into appropriate directories in private gatk.walkers. Moved a few public walkers into private qc package, and some private qc walkers into the public directory. Removed several obviously broken and/or unused walkers. 2011-06-30 14:59:58 -04:00
Mauricio Carneiro 2cb1376ed0 VCFStreaming was failing integration tests because now select variants outputs the samples in alphabetical order, instead of random as before. Fixed the MD5. 2011-06-30 14:55:39 -04:00
Eric Banks 352c38fc0b Updated to reflect dbsnp conversion fix 2011-06-30 11:55:56 -04:00
David Roazen f18fffd625 Fixing broken paths to the testdata directory throughout the codebase. 2011-06-29 17:36:47 -04:00
Eric Banks 33c67a139c Wrong package; this should have been moved when VC got moved in from Tribble 2011-06-29 14:56:02 -04:00
Guillermo del Angel dee10140dd Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2011-06-29 13:58:04 -04:00
Eric Banks 8586c86bc4 My commit from last week to fix the old dbsnp rod conversion only worked for locus traversals. Updated now to work for all traversals. 2011-06-29 13:56:37 -04:00
Guillermo del Angel f736a1d61b Updated md5's from previous checkin 2011-06-29 13:37:15 -04:00
David Roazen 3c9497788e Reorganized the codebase beneath top-level public and private directories,
removing the playground and oneoffprojects directories in the process. Updated
build.xml accordingly.
2011-06-28 06:55:19 -04:00