Commit Graph

980 Commits (7d092c859fc16a633d401f0ad59e4e55b975c511)

Author SHA1 Message Date
David Roazen 24b72334b3 UnifiedGenotyper now correctly initializes the VariantAnnotator engine.
This allows the annotation classes to perform any necessary initialization/validation.
For example, it allows the SnpEff annotator to (among other things) validate its rod binding.
This will prevent a NullPointerException when SnpEff annotation is requested but no rod binding
is present.

Added an integration test to cover this case so that it doesn't break again.
2011-10-11 12:02:05 -04:00
Guillermo del Angel 0429b38021 Merged bug fix from Stable into Unstable 2011-10-11 11:19:38 -04:00
Guillermo del Angel 1c485d8b5e Forgot that no matter how trivial a change it's a good idea to compile first 2011-10-11 11:18:41 -04:00
Guillermo del Angel 6418f4d69b Merged bug fix from Stable into Unstable 2011-10-11 11:13:18 -04:00
Guillermo del Angel 1975de1b32 Second try: hide --do_indel_quality in AnalyzeCovariates 2011-10-11 11:11:29 -04:00
Guillermo del Angel 6506ea83e8 Revert "Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users"... a hidden passenger change made it through.
This reverts commit 70e10ccb1be90dcff8f4485ae6ee036db2d1ac86.
2011-10-11 11:03:12 -04:00
Guillermo del Angel 4c1d8c8d44 Hide --do_indel_quality argument in AnalyzeCovariates. This shouldn't be documented nor used by external users 2011-10-11 11:01:06 -04:00
Eric Banks 77c983c5b5 No one claimed this walker and it doesn't have integration tests or GATKdocs so it doesn't belong in public. 2011-10-10 15:17:54 -04:00
Mark DePristo fb72bcf732 DiffObjects no longer prints out the file name in the status so MD5 are stable 2011-10-10 15:10:57 -04:00
Mark DePristo e3ff4f4266 Failing MD5 because output now contains absolute path 2011-10-10 11:05:02 -04:00
Mark DePristo 3e6c16d961 CombineVariants preserves allele order 2011-10-10 11:04:38 -04:00
Mark DePristo a4bb842958 RankSum tests have lightly different MD5 results based on allele order
-- UG GENOTYPE_GIVEN_ALLELES now uses the order of alleles in the VCF, so this changes the MD5
2011-10-10 11:04:07 -04:00
Mark DePristo 46e7370128 this.allele, getAlleles(), and getAltAlleles() now return List not set
-- Changes associated code throughout the codebase
-- Updated necessary (but minimal) UnitTests to reflect new behavior
-- Much better makealleles() function in VC.java that enforces a lot of key constraints in VC
2011-10-09 11:45:55 -07:00
Mark DePristo 822654b119 UnitTests for allele getting functions in VC in prep for move from set to list 2011-10-09 10:36:14 -07:00
Mark DePristo c67f6c076b simpleMerge now preserves allele order
-- UnitTests for dangerous PL merging cases in the multi-allelic case.  The new behavior is correct
2011-10-08 17:39:53 -07:00
Mark DePristo e94e6ba101 A UnitTest to ensure that the order of alleles is maintained
-> A, C, T and A, T, C are different and must be maintained.  The constructors were doing this appropriately, so nothing needed to be changed
2011-10-08 08:47:58 -07:00
Mark DePristo ec14a4a606 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-07 08:38:50 -07:00
Matt Hanna 6fbd41724a Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-07 11:20:00 -04:00
Matt Hanna 4514bc350f More reliable way of finding the Tribble jar. 2011-10-07 11:19:29 -04:00
Eric Banks 181c76750e Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-06 22:38:55 -04:00
Eric Banks ca9cd9b688 Minor fix for merging intervals which hadn't been necessary when only merging from the left to right. Added integration tests to cover the parallelization of RTC. 2011-10-06 22:38:44 -04:00
Khalid Shakir f91b015e0e Made the BaseTest.testDir absolute 2011-10-06 22:33:21 -04:00
Mark DePristo c7864c7256 Filter application order is now deterministic, in the order defined by the walker
-- For no apparent reason we were using a HashSet to store the ReadFilters, so the order of operations was really arbitrarily applied.  The order now is

(1) the order of the walker intrinsic filters
(2) read group black list (if provided)
(3) command line filters (if provided)
2011-10-06 18:51:40 -07:00
Mark DePristo 0b88af4af9 Counts of records failing filters are displayed sorted
-- Stops random ordering of the output, as the counts are returned sorted by string name of the class
-- Deleted now unused sh*tty assessors in Utils
2011-10-06 18:42:26 -07:00
Mark DePristo d1e70d6ec2 Removed Nx counting of reads in metrics with -nt > 1 2011-10-06 18:29:26 -07:00
Eric Banks c61804a450 Rename the long version of the argument name to more accurately reflect its purpose. 2011-10-06 16:14:04 -04:00
Eric Banks 61a3dfae24 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-06 15:58:04 -04:00
Eric Banks 6eb87bf58a RTC now caches all intervals as GenomeLocs (which is expected to take < 1Gb whole genome based on back of the envelope calculations with Matt) so that 1) we don't have to worry about emitting outside of the leaves in the hierarchical reductions and 2) we can emit the intervals in sorted order which is a big performance plus for the realigner. Integration tests change only because intervals whose start=stop are now printed as chr:start instead of chr:start-stop. 2011-10-06 15:57:49 -04:00
Mark DePristo 6d9c210460 Updating MD5s for updated BAM with read groups 2011-10-06 12:15:48 -07:00
Mark DePristo ab357ef900 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-06 10:50:02 -07:00
Eric Banks 1b0735f0a3 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-06 13:41:45 -04:00
Eric Banks c4dfc1fb8b Temporary commit of parallelization support for RealignerTargetCreator. Tim begged us for this and I got assurances from Khalid/Matt that this would also be extremely helpful for the whole genome calling pipeline, so I spent a while working on this. Needs to be fixed up though because apparently only the leaves in the hierarchical reduce get their output aggregated. Worked out a better solution with Matt. 2011-10-06 13:41:36 -04:00
Matt Hanna 3961733590 Merged bug fix from Stable into Unstable 2011-10-06 12:54:52 -04:00
Matt Hanna 4fa5045e84 Abandoning classfileset/rootfileset approach due to difficulting managing
classloading of bcel*.jar/ant-apache-bcel*.jar.  Switching instead to manually
specifying a minimal set of packages/classes to include in the vcf.jar via
build.xml, and adding a unit test which creates a limited classloader
only aware of vcf.jar and tribble.jar and tries to use it to load the core
classes in the vcf jar.

Hopefully third time's the charm.
2011-10-06 12:49:51 -04:00
Mark DePristo 73f9d1f217 GATK read group requirement iron hand
-- The GATK will now throw a user exception if it opens a SAM/BAM file that doesn't have at least one RG defined
-- LIBS again throws an error if the complete list of samples isn't provided
-- Updating ExmpleCountLociPipeline test to use the well-formated versions of the exampleBAM and exampleFASTA files in testdata, instead of the old broken ones in validation_data.
-- Convenience constructors for UserExceptions.MalformedBAM
2011-10-06 08:40:35 -07:00
Mark DePristo 23845ac798 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-06 08:17:08 -07:00
Mark DePristo 4b5b9155a9 Fixed bad expected value in PedReaderUnitTest 2011-10-06 08:16:47 -07:00
Mark DePristo daa5999489 Fixed typo in argument description 2011-10-06 08:16:25 -07:00
Guillermo del Angel 8a474e38ff Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-06 10:08:39 -04:00
Guillermo del Angel 93f7e632bd Minor fix/enhancement for VariantEval: if a vcf has symbolic alleles, program would crash ungracefully - now we'll just skip record without processing. This is a big issue since we can't process 1000G integration files with code as is. 2011-10-06 10:07:46 -04:00
Mark DePristo 190be4d0d1 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-10-05 21:27:11 -07:00
Mark DePristo 8e6845806a Allowing empty samples list in LIBS
-- Right now we cannot process BAM files without read groups because we enforce the samples list to not be empty when there's a SAM record.  Now if there are reads and there are no samples we add the "null" sample so that LIBS walks the reads properly
2011-10-05 21:26:21 -07:00
Matt Hanna 180c8f286f Merged bug fix from Stable into Unstable 2011-10-05 20:37:43 -04:00
Matt Hanna 55b9f06527 Ensure that IndelRealigner n-way out option supports MD5 generation. 2011-10-05 20:36:28 -04:00
Mark DePristo be2d29ce69 Final PED documentation 2011-10-05 15:17:41 -07:00
Mark DePristo 3226d5dc0d Merge branch 'master' into ped 2011-10-05 15:03:09 -07:00
Mark DePristo 6a573437af Details documentation arguments for -ped 2011-10-05 15:00:58 -07:00
Mark DePristo e7c80f7c45 Renaming quantitative trait to OtherPhenotype which is now a String not a double
-- we can now use PED file to represent population data or other arbitrary phenotype data, not just doubles
2011-10-05 12:26:33 -07:00
Mark DePristo 51ecc20867 getFamily() and associated methods implemented and tested
-- Sample no longer serializable
-- Sample now implements Comparable
2011-10-05 09:55:05 -07:00
Mark DePristo f4bac58f14 Merged bug fix from Stable into Unstable 2011-10-04 21:00:34 -07:00
Mark DePristo d1d39943d0 Updating MD5 for BAMs that I added a read group to, part 2 2011-10-04 21:00:15 -07:00
Mark DePristo 9bd3ba4c7e Missed one MD5 2011-10-04 16:04:52 -07:00
Mark DePristo ffdfdcde3f Updating MD5s
-- Interval test now uses RG containing BAM
-- DoC sample name ordering has changed.
2011-10-04 15:54:45 -07:00
Mark DePristo a45d985818 TODO method stubs 2011-10-04 15:54:09 -07:00
Mark DePristo 463eab7604 All MD5 mismatches for test are shown
-- Now for tests like DoC, with 20 output md5s, you see all of the differences before failing.
2011-10-04 15:53:52 -07:00
Mark DePristo c642a080d4 Merged bug fix from Stable into Unstable 2011-10-04 14:08:41 -07:00
Mark DePristo 941317167e Updating MD5 for BAMs that I added a read group to 2011-10-04 14:08:00 -07:00
Mark DePristo e1d6c7a50a Updating MD5 that have changed due to sample ordering differences 2011-10-04 09:33:23 -07:00
Mark DePristo 343a7b6b2f Updating UG integration tests for arbitrary impact of sample order changes on downsampling 2011-10-04 08:14:00 -07:00
Mark DePristo fee89e47ff Only throws an error when there are no samples but there are reads
-- Handles the case when you are running a ROD traversal and yet the LIBS is still used to return null everywhere.
2011-10-04 06:50:54 -07:00
Mark DePristo f552aede42 Only provide the sample names in the BAM file for efficiency 2011-10-04 06:50:12 -07:00
Mark DePristo a27641e1fc Cleaned up imports 2011-10-04 06:28:36 -07:00
Mark DePristo b20689ff55 No longer supports extraProperties
-- the underlying data structure is still present, but until I decide what to do for the extensible system I've completely disabled the subsystem
-- Added code to merge Samples, so that a mostly full record can be merged with a consistent empty record.  If the two records are inconsistent, an error is thrown
-- addSample() in Sample.class now invokes mergeSample() when appropriate
-- Validation types are now only STRICT or SILENT
-- Validation code implemented in SampleDBBuilder
-- Extensive unit tests for SampleDBBuilder
2011-10-03 19:20:33 -07:00
Mark DePristo 867a7476c1 Systematic unit tests for the sample object 2011-10-03 19:09:02 -07:00
Mauricio Carneiro 3837aa45b4 Fixing conflicts
Conflicts:
	public/java/test/org/broadinstitute/sting/utils/clipreads/ReadClipperUnitTest.java
2011-10-03 19:07:59 -07:00
Mark DePristo 2e3dc52088 Minor function renaming 2011-10-03 14:41:13 -07:00
Mark DePristo dd71884b0c On path to SampleDB engine integration
-- PedReader tag parser
-- Separation of SampleDBBuilder from SampleDB (now immutable)
-- Removed old sample engine arguments
2011-10-03 12:08:07 -07:00
Eric Banks c3eff7451a Found a small inefficiency while profiling: we were still using String.split instead of ParsingUtils.split to break up array values in the INFO field. There was a noticeable (albeit not big) difference in the change when reading sites only files. 2011-10-03 14:20:39 -04:00
Mark DePristo 8ee0f91904 Remove residual processing tracker arguments 2011-10-03 09:50:01 -07:00
Mark DePristo 89ac50e86e SampleDataSource -> SampleDB 2011-10-03 09:33:30 -07:00
Mark DePristo 93fba06cb5 Support for whitespace only lines 2011-10-03 09:30:10 -07:00
Mark DePristo 0604ce55d1 PedReader support for ; separated lines, not only newline 2011-10-03 09:19:58 -07:00
Mark DePristo 52f670c8b8 100% version of PedReader
-- Passes all unit tests
-- Added unit tests for missing fields
2011-10-03 06:12:58 -07:00
Roger Zurawicki bf6a3a6532 Added framework to do batch CigarClip Testing
*NOTE: This commit has not been compiled!
2011-10-02 22:33:46 -04:00
Mark DePristo dd75ad9f49 95% PedReader
-- Passes significiant unit tests
-- Implicit sample creation for mom / dad when you create single samples
-- Continuing cleanup of Sample and SampleDataSource
2011-09-30 18:03:34 -04:00
Andrey Sivachenko c7898a9be7 inconsequential change in string constants printed into the vcf which noone uses anyway... 2011-09-30 16:40:21 -04:00
Mark DePristo 010899f886 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-30 15:51:09 -04:00
Mark DePristo 84160bd83f Reorganization of Sample
-- Moved Gender and Afflication to separate public enums
-- PedReader 90% implemented
-- Improve interface cleanup to XReadLines and UserException
2011-09-30 15:50:54 -04:00
Mauricio Carneiro 05fba6f23a Clipping ends inside deletion and before insertion
fixed.
2011-09-30 15:44:43 -04:00
Mark DePristo c1cf6bc45a PEDReader should be in samples 2011-09-30 14:22:19 -04:00
Mark DePristo 56f10b40a8 Fixing test bugs for WindowMaker that required empty sample list 2011-09-30 14:18:27 -04:00
Ryan Poplin af6c053435 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-30 13:33:31 -04:00
Mark DePristo 810e8ad011 Removed getXByReaders() function from the engine
-- These could be simplied in their downstream uses
-- Or they could be replaced with a generic getSAMFileHeaders() function and then apply the getSamples(header) as desired downstream
2011-09-30 10:43:51 -04:00
Mark DePristo 178ba24c27 Move getSamplesForSamFile to SampleUtils
-- A nearly identical piece of code already lived in SampleUtils.  Now there are two functions, one taking a regular header and another grabbing the merged header from the GATK engine itself.  Much cleaner
2011-09-30 10:28:18 -04:00
Mark DePristo 30d23942b1 Renamed ReadBackedPileup getXSampleName() functions to getXSample
-- now that we don't have Sample objects floating around we don't have to have all of the Name extensions on our functions
2011-09-30 10:02:57 -04:00
Mark DePristo 3289a325fc Removed final use of Sample in RBP 2011-09-30 09:57:39 -04:00
Mark DePristo a69a4dda2f SamplesDB no longer has null sample
-- Updated getSamples().size() == 2 test in CallableLociWalker that really ensured there was one sample in the system
2011-09-30 09:56:23 -04:00
Mark DePristo e055a78f6e LIBS now requires at least one sample be present
-- UnitTest provides a "null" sample for matching the reads without read groups
2011-09-30 09:49:35 -04:00
Mark DePristo 9860a2c989 Merge branch 'master' into ped 2011-09-30 09:28:18 -04:00
Mark DePristo d901fed617 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-09-30 08:41:44 -04:00
Mauricio Carneiro cabacf028d Intermediate commit to fix interval skipping
may need additional testing.
2011-09-29 18:45:12 -04:00
Mark DePristo b71b51751e Bug fix for UnitTest
-- Provide the null sample to the LIBS, as this seems to be required for correctly passing this unit test
-- Will be fixed in a future update
2011-09-29 17:30:01 -04:00
Mark DePristo 1765fbeb6b Merge branch 'master' into ped 2011-09-29 17:18:51 -04:00
Mark DePristo 98ecaf8aa0 Support for ReducedReads with reduced counts and average quals
-- ReadUtils and UnitTest updated to support new byte[] style
-- Removed unnecessary read transformer in PairHMM
2011-09-29 17:18:39 -04:00
Mauricio Carneiro 9508220157 fixed hard clipping both ends inside deletion
If both ends of the interval falls within a deletion in the read then hardClipBothEnds would cut the right tail first including the entire deletion, then fail to cut the left tail because there would not be any bases there anymore. Fixed.
2011-09-29 15:36:49 -04:00
Mark DePristo 9458f01409 Test cleanup of Sample object 2011-09-29 15:13:05 -04:00
Mark DePristo 625ffb6a07 LocusIteratorByState and ReadBackedPileups no long use Sample 2011-09-29 14:52:11 -04:00
Mark DePristo b3a2371925 Merge branch 'master' into ped 2011-09-29 14:32:17 -04:00
Mark DePristo 68761a6e28 Removed sample from header 2011-09-29 14:13:05 -04:00
Mauricio Carneiro a5e75cd14c Outputting both consensus base qualities and counts
The base qualities of a consensus reads are now the average quality of the bases forming the consensus base (most common base) and the consensus quality tag now carry an array with the counts of each base in the consensus. This should increase file size but improve calling sensitivity/specificity.
2011-09-29 12:54:41 -04:00