Commit Graph

561 Commits (e6ce80c8e3da4b9ab5ff5a1ea96e117c82b95ca2)

Author SHA1 Message Date
hanna a5154d99a3 Haven't heard any complaints, so I'm deleting the original implementation of TraverseByLociByReference. All TbyLbyR's will now go through the new sharding code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@637 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 13:37:00 +00:00
aaron bae4256574 Started the process to make the GATK engine into a runnable object so we can call it from other processes. Step 1: make a configuration object that can serialize to and from an XML file. This way we can store the information everyone uses shell scripts for. Also we can now pull the list of params out of the GenomeAnalysisTK.java. More to come...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@636 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 01:25:26 +00:00
hanna 226edbdef6 Hypen-style xml output. Much sexier.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@635 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 01:04:40 +00:00
hanna 4c269b8496 Cleanup LinearMicroScheduler in preparation for TraverseByLoci inclusion.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@634 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 00:58:37 +00:00
aaron 21536df308 Change the sample XML marshalling code over to simple XML, and take out the castor lines in the ivy.xml
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@633 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 00:08:25 +00:00
hanna 7f8850a8a2 Argument validation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@631 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 20:28:56 +00:00
hanna a3d8febbf2 Error message cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@630 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 19:31:32 +00:00
hanna c241d386a7 Beefed up command-line usage string.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@629 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 19:08:19 +00:00
depristo 5a6892900e fixing oddities in duplicates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@628 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:55:45 +00:00
depristo 4a26f35caa new default syntax
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@627 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:16:53 +00:00
ebanks 283a4d1b54 Fix some special-case cleaner issues.
We now do the same as brute force in all examples to date.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@626 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:16:35 +00:00
depristo 93211c1cd8 template for windowmaker utility -- total non-functional
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@625 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:13:03 +00:00
depristo 2204be43eb System for traversing duplicate reads, along with a walker to compute quality scores among duplicates and a smarter method to combine quality scores across duplicates -- v1
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@624 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:06:02 +00:00
depristo 71e8f47a6c boundQual function for capping qual values
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@623 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:04:18 +00:00
depristo e848f34896 countOccurances of char in string and max of a list of bytes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@622 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:03:49 +00:00
depristo 5a4bb76cc3 More capabilities for the pileup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@621 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:03:13 +00:00
depristo 89a26a7078 Utilities for handling duplicates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@620 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:02:24 +00:00
hanna 4f85062004 Cleanup parsing method to make it less generic.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@619 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 16:21:17 +00:00
hanna d725c6cf1c Added unit tests for parsing failures that I encountered during integration testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@618 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 14:01:54 +00:00
hanna 2f3ab53888 Oops. Arguments didn't load into applications with non-plugins (basically everything except the GATK).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@617 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 13:37:19 +00:00
hanna 4177560543 Mutually exclusive options.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@616 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 13:27:48 +00:00
hanna 752928df94 Switch to better mechanism for supplying a default.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@615 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 01:22:01 +00:00
hanna dc944ec69b First stage of ROD plumbing for MicroScheduler.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@614 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 23:26:21 +00:00
aaron 5136724884 Added code to the schedulers, one step closer to turning on the new reads traversals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@613 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 22:36:25 +00:00
hanna 9c0b81e946 Default flags to 'not required'.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@612 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 22:09:49 +00:00
asivache 072808858e added COUNT_CUTOFF arg: it is nor possible to tell the code to try to realign all read piles over trains of nearby indels with at least one indel observed in COUNT_CUTOFF or more different alignments (set the arg to 1 to realign around all indels); also, some diagnostic printouts added to the output (time spent on loading the reference, time spent on scrolling through the input bam file, counts of discarded reads)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@611 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 21:59:33 +00:00
hanna 1fe8155111 Some critical fixes for cases where argument values directly abut argument names
and for arguments with missing short names.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@610 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 21:47:34 +00:00
aaron 0aba688e6f Added a interface that all our SAMRecord iterators should try to code to. This is in the effort to keep our code generic
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@609 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 21:40:41 +00:00
hanna 62e7e46754 Miscellaneous cleanup. Better display of help output. Better exception subtyping. More thought-out access routines.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@608 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 21:16:01 +00:00
ebanks 5be75e0ae6 First version of indel cleaner walker that works on intervals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@607 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 20:20:48 +00:00
hanna 98716138e9 Cleanup: add support for non-public fields. Track matches as state of parsing engine as well as definitions.
Made fields of command-line argument system non-public by default.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@606 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 19:38:05 +00:00
aaron f5eae98af2 Fixed a bug where we could ask for a read when there were none in the pool (that's a bad thing).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@605 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 18:40:55 +00:00
hanna ef211f96b1 Remove old Apache CLI-based arg system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@604 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 18:37:51 +00:00
hanna 521aa40baa Bring new command-line argument parsing system live.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@603 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 18:16:11 +00:00
aaron 98f4920739 Added BCEL and some basic instrumenation code to the test library.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@602 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 17:18:23 +00:00
hanna bfd6dfe36c Added real-world tests and tests for conditional validation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@601 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 13:38:46 +00:00
hanna 4ac9e72739 Migrate default and GATK arguments over to new attribute system in preparation for conversion.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@600 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 23:57:48 +00:00
hanna 2ee9374975 Check for proper error output in case of boolean args with parameter specified.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@599 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 23:08:48 +00:00
hanna b0cdba8bb3 Acting on Kiran's suggestion to make the doc tag in the @Argument annotation required.x
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@598 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 22:43:40 +00:00
hanna ec0261275b Lots of command line argument validation. Catches all common validation problems, including missing required arguments, invalid arguments, and several types of misplaced argument value errors.
Still pending:
- Help system.
- Mutually exclusive arguments.
- Design includes too many classes per file.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@597 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 22:08:00 +00:00
aaron 70afda12c4 Cut the test time down
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@596 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 22:05:51 +00:00
aaron f5880109a7 Added TraverseReads test, some bug fixes discovered in the traversal test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@594 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 20:36:00 +00:00
aaron daa2163ee8 Made the MergingSamIterator2 peekable. This iterator is being a ducktaped together swiss army knife, the iterators could use a redo soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@593 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 19:15:07 +00:00
aaron 09b0b6b57d Fixes to try and speed up unmapped read traversals. Still not nearly as fast as they should be, but the next step would be to modify samtools code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@592 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 18:17:07 +00:00
hanna 6550fe6f97 Another pass of command-line arguments. Revised parser supports all types
of arguments that the existing parser supports, but does a poor job with
validation.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@591 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-04 22:41:23 +00:00
depristo 8925df2e1e More information from the duplicate combiner quality metrics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@590 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-04 21:51:01 +00:00
kcibul 2b6466ea00 coverage calculator based on Gabor's Pilot 3 Coverage Metrics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@589 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-04 14:18:16 +00:00
hanna 4f2ccda56a Interface skeleton for a new command line argument parser. Nowhere near the point of being a drop-in replacement for apache cli yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@588 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-04 00:11:42 +00:00
hanna 6e38966349 Rename some key classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@587 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 22:01:04 +00:00
hanna 5bdf653919 Cleanup: prepare for better output handling.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@586 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 21:40:46 +00:00
depristo fd496159a8 Added convenience functions for RefHanger
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@585 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 21:14:40 +00:00
depristo 7ed496b859 JUnit test for RefHanger
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@584 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 20:11:14 +00:00
hanna 9f5f6f9bc7 N-way parallelism. Works for small test cases. Untested for large test cases.
-Needs more comprehensive unit testing.
-Needs some basic refactoring.
-Needs rethink of interface boundaries.
-Needs to play more nicely in the /tmp sandbox.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@583 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 19:34:09 +00:00
kiran df88c4d6b0 Added some code to determine the on-genotype and off-genotype secondary base distributions (which, at the moment, is commented out).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@582 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:48:19 +00:00
kiran e7534b292f Optionally applies secondary base distribution priors to normal single-sample genotyper posteriors.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@581 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:36:32 +00:00
kiran 58c80d8d87 For on and off-genotype primary bases, optionally compute the concordance of the secondary bases to their expected distributions. Each genotype has slightly different profiles.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@580 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:33:48 +00:00
kiran 16467ae7cf A better (less overflow-y) implementation of multinomialProbability().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@579 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:28:16 +00:00
kiran 4f818f5c1c Choose a random base to stick in the pileup if the 2nd-best base matches the best base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@578 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:27:37 +00:00
kiran 9800d09608 A more thorough test for multinomialProbability.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@577 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:27:05 +00:00
depristo 84dae06d5a Initial version of ByDuplicates traversal, as well as a duplicate quality score estimator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@576 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 22:16:21 +00:00
depristo ff420f5f6f Enabled iterator() function
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@575 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 22:15:14 +00:00
depristo 12d6edfe7c Only prints about first contig info setting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@574 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 22:14:26 +00:00
depristo 1cc5e74435 More ways to access quality utils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@573 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 22:12:07 +00:00
aaron 63403d32cd Changes to the interface to the simple data source rippled out to a bunch of files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@572 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 20:35:56 +00:00
hanna 19e4e97f21 Add tag to ignore node class.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@571 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 20:27:34 +00:00
hanna 7f173af2ea Encapsulate output tracking a bit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@570 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 15:12:13 +00:00
aaron 3bf3c21ddd Changed the assert code in the genome loc to throw exceptions, and deleted a function no one seems to be using.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@569 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 13:54:51 +00:00
andrewk b630f2f2f1 More tables output by CovariateCounterWalker AND made CovariateCounterWalker and LogisticRecalibration aware of positive and negative strandedness of data which changes the regression output significantly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@568 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 01:22:50 +00:00
aaron f7a877bfeb Changed Sting exception from a base exception to a runtime exception. This makes it so you can throw it without the consumer having to check it, and hopefully people will be more inclined to use it.
Please use this instead of throwing a plain runtime exception.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@567 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 22:09:41 +00:00
hanna ba9a0b5da8 Break out some of the weird inner classes out of the HierachicalMicroScheduler.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@566 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 21:07:07 +00:00
hanna 95d10ba314 Sketch of hierarchical reduce process, with unit tests for some core classes. Requires breakout of inner classes, testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@565 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 20:26:16 +00:00
kiran 0a707a887b Added ability to evaluate best + random base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@564 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 20:05:36 +00:00
kcibul 334f158e5a added parameters for mapping quality and duplicate filters
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@563 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 18:05:34 +00:00
ebanks 7de5da7065 Start getting the cleaner working in Walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@561 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 14:59:53 +00:00
hanna 4c5f640eb7 Tweak the arguments passed to the command-line arguments parser so that it fails less often for invalid arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@560 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 14:36:27 +00:00
kcibul f557da0a78 Calculate interval-based statistics for Hybrid Selection
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@558 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 04:01:24 +00:00
hanna 6ecc43f385 Provide a default logger, some config settings, and some doc updates.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@557 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 02:06:05 +00:00
aaron b836761104 removed the test cases from the bottom of this file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@556 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 21:50:22 +00:00
aaron 6b02248298 moved the test cases out of the GenomeAnalysisTK code and into a JUnit test case
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@555 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 21:49:17 +00:00
aaron d4de68e260 added changes for the readsTraversal to accomidate design changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@553 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 19:49:58 +00:00
aaron b6874f30cb Added changes to bounded read iterator, it now explicitly takes a MSRI2 instead of the interfaces ClosableIterator<SAMRecord>. It would be good to fix this in the future with an interface that lets you get the (possibly merged) header.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@552 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 17:57:54 +00:00
aaron 395aaf48b0 Added the new by reads traversal, still needs to be sewn into the micromanager code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@551 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 17:55:08 +00:00
andrewk 58b2578c44 Several changes to CovariateCounter walker to print more tables (called vs. observed Q scores), bug fixes to LogisticRecalibrationWalker and LogisticRegressor, and print string functionality added to Pair.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@550 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 00:37:48 +00:00
ebanks a0a581171b print out the last interval
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@549 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 20:43:06 +00:00
aaron a343f3eab7 Fixed bug where we weren't setting the reads group correctly. Also added code to set the printMetrics field of the singleSampleGenotyper from the Pool caller, it was null excepting out for me without that set.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@548 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 15:17:20 +00:00
kiran 1daf8e0987 A utility to compare the results of the SingleSampleGenotyper in 1-base and 4-base mode.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@547 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 15:10:08 +00:00
kiran 444bc18183 Removed binomialProb() method. Set better values for qHom, qHet, and qHomNonRef and allowed those to be set from the command-line.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@546 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 15:09:02 +00:00
kiran b9c9dbb1d7 Added multinomialProbability method.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@545 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 15:03:50 +00:00
kiran eeb0b78cce Added another assert to testBinomialProbability() and added a test method for testMultinomialProbability().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@544 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 14:59:11 +00:00
hanna 9a8902571c Placeholder for parallel MicroManager.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@542 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 23:08:12 +00:00
hanna 1daa011387 Interval-based traversals were bleeding file handles. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@541 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 18:35:54 +00:00
hanna 1e2e78265d Inadvertently removed interval file support in new TbLbR. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@540 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 18:15:42 +00:00
hanna c9e9731495 More cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@539 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 17:46:52 +00:00
hanna 4036f24909 Documentation and cleanup work in preparation for parallelism.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@538 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 17:42:00 +00:00
ebanks 0c76a70313 Renamed traversal by "interval" to "locusWindow"
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@537 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 02:26:08 +00:00
depristo 9a299c11d3 Oops, typo and build problems. FYI, fixing typos is better than packing...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@536 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-25 01:37:17 +00:00
depristo ce470702fc consistency with java naming conventions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@535 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 21:44:48 +00:00
depristo bfce0c93ab removing bad file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@534 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 21:40:04 +00:00
depristo 05c6679321 Enabled ReduceByInterval
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@533 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 21:39:44 +00:00
hanna ee2f022c71 Make new TraverseByLociByReference the default.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@532 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:50:11 +00:00
hanna e50ae97fe1 Introduce new index-based fasta reader. Clean up MicroManager code, pushing necessary code back into TraversalEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@531 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:40:21 +00:00
depristo 40a2b3eeb3 Basic logistic regression support for calibrating qualities; mostly for Andrew to experiment with
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@529 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:09:50 +00:00
andrewk 061f4328b1 Covariate counter now outputs files used by R to do logistic regression.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@527 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 17:11:57 +00:00
jmaguire 4e4fd33584 First draft of actual pooled EM caller.
Produces sane looking output on region of 1kG pilot1:

   CALL NA12813.SRP000031.2009_02.bam CC 0.609084 0.609084
   CALL NA12003.SRP000031.2009_02.bam CC 2.114234 2.114234 CCCCC
   CALL NA06994.SRP000031.2009_02.bam CC 0.910114 0.910114 C
   CALL NA18940.SRP000031.2009_02.bam CT 2.589749 0.910114 T
   CALL NA18555.SRP000031.2009_02.bam CC 0.609084 0.609084

Next up, eval vs. Baseline pilot1 calls and pilot3 deep-coverage truth.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@526 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 13:43:41 +00:00
jmaguire dd408a2a9a First draft of actual pooled EM caller.
Produces sane looking output on region of 1kG pilot1:

    CALL NA12813.SRP000031.2009_02.bam CC 0.609084 0.609084
    CALL NA12003.SRP000031.2009_02.bam CC 2.114234 2.114234 CCCCC
    CALL NA06994.SRP000031.2009_02.bam CC 0.910114 0.910114 C
    CALL NA18940.SRP000031.2009_02.bam CT 2.589749 0.910114 T
    CALL NA18555.SRP000031.2009_02.bam CC 0.609084 0.609084

Next up, eval vs. Baseline pilot1 calls and pilot3 deep-coverage truth.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@525 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 13:42:15 +00:00
ebanks 13d4692d2e 1. Added a by-interval traversal.
2. Added a shell for the indel cleaner walker (it's currently being used to test the interval traversal).
3. Fixed small bug in downsampling (make sure to downsample the offsets too)
4. GenomeAnalysisTK.execute => anyone object to my change to "instanceof" instead of trying to catch a ClassCastException (yuck)?



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@524 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 04:33:35 +00:00
kiran 1984bb2d13 Made num_loci_total public because I'm lazy. I'll change it back later.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@523 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:57:23 +00:00
kiran 7ce11e152b Simplified. Added option to perform four-base retest of a putative variant.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@522 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:56:15 +00:00
kiran 135d3eabeb Now only distributes 80% of the residual probability to the secondary base, 10% each to the other two bases. Nicer labelling for stringified probability distribution output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@521 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:34:43 +00:00
kiran 3cda85f2e3 New implementation of binomial probability that accurately computes values down to around 1e-237.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@520 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:32:04 +00:00
kiran 305584b69e Test class for MathUtils with a test for binomialProbability().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@519 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 03:31:02 +00:00
aaron bd4cacb832 Added code to make a read group and sample name for BAM files that don't annotate them on reads. The defaults for both are now the filename, but this may be shortened in the future.
The sample name for a read can be retrieved with the command:

read.getAttribute(SAMTag.RG.toString());




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@518 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 00:31:00 +00:00
hanna 45d962e491 I understood the contig index incorrectly when I initially wrote this code. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@517 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 22:31:43 +00:00
aaron 635bfd8604 Added a little bit of hack to get the header back to the walker by initialization time, which was before sharding in the last version.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@516 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 21:07:11 +00:00
aaron 0208d201c7 Forgot this in the last commit...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@515 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 20:47:22 +00:00
aaron 3dc2afd7ab Added the ability to get a merged header in a LociByReference traversal
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@514 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 20:34:52 +00:00
hanna 282f1d88b8 Make the operation 'read from the iterator and place on the queue' atomic with respect to hasNext(), next().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@513 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 20:16:26 +00:00
aaron 998763950c Oops, contig index is a zero not one based value
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@512 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 19:08:16 +00:00
aaron 8c13940c5a A lot of changes to support by-read sharding and some from debugging of the by loci traversals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@511 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 19:03:14 +00:00
andrewk 32715a6c47 First check-in of walker that produces tables showing covariation of read cycle, and dinucleotide with quality score in a format usable for R analysis and for doing logistic regression.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@510 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 18:58:25 +00:00
aaron 0720d248ce Adding the test case for by reads sharding of BAM data sources
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@509 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 18:01:22 +00:00
ebanks cae54ec52d Walker for creating intervals to be used in the indel cleaner
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@508 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:58:19 +00:00
kiran 96db1477d4 I meant for default lod threshold to be 5.0, not 0.0.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@507 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:46:08 +00:00
kiran ca66cccd2f Privatized constructor to prevent instantiation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@506 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:45:39 +00:00
kiran 77e1e9e2f1 Added a static class to house useful math methods. All this has at the moment are methods for comparing doubles and floats, but I suggest that the bulk of our little math methods should be added here to avoid filling up Utils.java with so much random stuff.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@505 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:45:19 +00:00
hanna 3d7575bbb8 Oops...omitted walker.initialize().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@504 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:35:28 +00:00
kiran 11e85f1969 Four-base mode now estimates the genotype using the one-base method and retests the site if the one-base method suggests the site is a het.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@503 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:23:24 +00:00
kiran bd719f9c06 When checking that values are not infinite, also prints out the position so that I know which site was giving the error and I can just go there and debug it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@502 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:21:58 +00:00
kiran efba30f1a1 Added a constructor in which the lod threshold can be set.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@501 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:20:48 +00:00
jmaguire 8c1905c7d9 Simple walker to print all of the sample names present in a merged bam file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@500 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 12:26:56 +00:00
kiran a3a1c9dae8 Suppressed emission of duplicate paths through a four-base pileup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@498 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 21:08:45 +00:00
jmaguire 6cef8bd76c added k-best quality path enumeration.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@497 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 20:26:51 +00:00
ebanks d99d67d51c Refactored to clean it up a bit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@495 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 19:18:46 +00:00
hanna 1bf4d040d8 Increase default shard size from 5 to 100000.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@494 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 18:29:44 +00:00
hanna 3af66a462e Make PrintLocusContextWalker less verbose.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@493 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 18:28:02 +00:00
kiran ffcd672c1c Intermediate commit while working on getting four-base probs to work in the single sample genotyper. Has infrastructure for the new combinatorial approach and just choosing the best base more intelligently given a probability distribution over bases and the reference base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@492 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 18:06:50 +00:00
hanna 4cafb95be8 TraverseByLoci / TraverseByLociByReference suffered from the same sam-triggered off-by-one (?) bug as TraverseByReference; it was just less obvious here because these versions don't shard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@491 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 15:48:20 +00:00
kcibul cb2f621d01 reverting accidental commit of change to shard size
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@490 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 00:33:28 +00:00
kcibul b820130dce * added ability to load multiple BAM files from command line
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@489 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 00:28:08 +00:00
kiran 5b8502745a Added an epsilon (1e-4) to the tertiary and quaternary base hypotheses.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@488 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 00:01:37 +00:00
kiran 2ac240d78b Removed an extraneous print statement.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@487 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 23:36:36 +00:00
kiran 0149c887ff Fixed a bug wherein the residual probability was not being distributed properly when a file had secondary probs and the best and next-best base agreed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@486 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 23:36:09 +00:00
kiran 5abfc7d079 Added an argument ('extended' or 'ext') that outputs the four-base probs in a long format.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@485 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 22:27:26 +00:00
kiran dac76f041b Added some methods to retreive the probability distributions of individual bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@484 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 22:26:25 +00:00
kiran 5b2a7c9c23 Added some methods to complement a single simple base ([AaCcGgTt]) and reverse-complement a byte-array of bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@483 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 22:25:33 +00:00
asivache 521e202a10 updated interface
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@482 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 21:07:20 +00:00
asivache 55ca272919 reimplemented; now implements Genotype interface instead of AllelicVariant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@481 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 21:06:42 +00:00
asivache 5f37ba8f26 now can be asked to log at INFO level all concordant or discordant sites, or both
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@480 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 21:03:44 +00:00
asivache 1f84b9647d auxiliary data structure for mendelian concordance reporting; it's nice to have the latest version checked in in order for the code to compile...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@479 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 21:02:40 +00:00
asivache ece3e9969e one trivial walker to filter reads; bam in -> filter -> bam out
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@478 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 20:39:29 +00:00