Commit Graph

631 Commits (8f1cabd33dff321af4da2e691a2d7bcf40aa496c)

Author SHA1 Message Date
kiran 3761c0900b Added Bustard vs. Four-prob percent bases consistent output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@710 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 18:01:41 +00:00
ebanks 7a1f85ff86 option to print out the indels found by the cleaner to a file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@709 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 17:50:08 +00:00
ebanks b62bddee42 The header was never being set.
Added this hack for now and will alert the authorities ASAP... 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@708 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 17:18:51 +00:00
kiran 959cf09d4b Removed some debugging print statements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@707 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 17:12:42 +00:00
kiran 2f42a643a8 A new, much simpler (and now, complete) driver program for four-base probs. Serves as a model for anyone who wants to write their own driver program that trains and calls with data from a different source than the raw Illumina data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@706 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:58:22 +00:00
kiran 5824dea0c1 Trains and calls a read at a time rather than a base at a time (which, given it's name, it should have done in the first place)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@705 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:57:00 +00:00
kiran e4770885fd The four-probs for all bases in a single read. Some utility functions for generating the primary and secondary base strings, as well as generating the SQ tag byte array in a manner that's consistent with the Bustard base calls (meaning the primary Bustard call and the secondary Four-Prob call are not permitted to be the same).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@704 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:55:49 +00:00
kiran fdd123fe16 A parser the raw Illumina data. Allows one to arbitrarily jump from one tile to another.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@703 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:53:07 +00:00
aaron 7aa90757ac Moved the iterators over to the StingSAMIterator interface. This will help us ensure that iterators that need to be closed get closed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@702 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:52:18 +00:00
kiran 6d98234555 Holds raw intensities, sequence, and quality scores.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@701 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:52:03 +00:00
kiran 241de0b235 A class that implements multiple training strategies and presents the training data in a common form.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@700 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:51:29 +00:00
kiran 64c65c7751 New methods to generated compressed SQ quality elements in line with the SAM spec.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@699 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:50:31 +00:00
aaron c3b2c66911 The GATK doesn't need the rest
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@698 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:20:45 +00:00
aaron 0215905bb6 Added an adapter class, that will adapt plain iterators and closeable iterators of SAMRecords into STingSAMIterators. Also unit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@697 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 15:17:32 +00:00
ebanks 5dda448ae0 1. Add printouts for the cleaner
2. First pass at the entropy interval walker (still needs work)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@696 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 13:59:48 +00:00
hanna 80c13f7127 Added a getter for command-line arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@695 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 13:55:52 +00:00
hanna 307c6e4ecf Oops. Forgot to add new file to svn.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@694 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 00:52:30 +00:00
hanna d14cab0be7 Added IterableLocusContextQueue and test. Cleaned up tests, adding BaseTest where it didn't exist. Enhanced test runner to run only classes ending in ...Test.java, so that utility classes can sit alongside the tests but won't be run by JUnit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@693 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 21:32:05 +00:00
asivache 7b59f63f12 and don't forget to close sam writer after we are done...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@692 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 20:46:36 +00:00
asivache de0cce87ea new optional arg added that allows to specify a separate bam file to send all piles that fail to realign to; plus minor fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@691 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 20:24:23 +00:00
hanna 12ae3a22b6 Break locus context data access providers into modular components in preparation for traverse by loci.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@689 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 18:51:16 +00:00
jmaguire 7084ecdeb6 a few changes; checked in to allow debugging.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@688 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 15:50:48 +00:00
depristo 5b47c5ab6c fixing kiran's busted build
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@686 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 21:29:04 +00:00
kiran 4f2c8bf0a3 Fixed an import statement that broke when all the files were moved to this directory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@685 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 20:43:16 +00:00
kiran cedc4c9ccb Refactored into oblivion.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@684 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 20:33:07 +00:00
kiran 4e4767e5de Moved to org.broadinstitute.sting.secondarybase
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@682 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 20:26:43 +00:00
kiran 219eb60716 Added newly-required documentation to arguments so that build can complete successfully.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@681 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 20:26:10 +00:00
kiran 688358190c Moved secondary base stuff out of playground for the purpose of making it a core utility. Modified package names and imports such that things would build properly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@680 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 20:24:18 +00:00
kcibul 8079acb1d3 basic step0 implementation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@679 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:49:39 +00:00
kiran 57ecb7fbf1 Nicer reporting functions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@678 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:48:30 +00:00
hanna ee99320c83 Removed at Mark's request.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@677 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:48:21 +00:00
kiran f1de3d6366 Minor tweaks to how probs are supplied.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@676 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:47:41 +00:00
kiran 095dacd154 Experimental refactoring.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@675 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:46:50 +00:00
kiran 758f8aa89b Experimental refactoring.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@674 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:46:34 +00:00
andrewk 1518f8f9bf Update training data creation in CovariateCounterWalker to output much smaller files by counting the number of occurences of each data point combination rather than outputting a line for each data point (i.e. each base). Also fixed bug in LogisticRecalibrationWalker where a null SAMHeader was being pulled from a function that is now marked deprecated.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@673 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:23:14 +00:00
ebanks 4c12df372c Dumb, dumb bug.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@672 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 19:21:33 +00:00
aaron 6e69193e3c Deprecated calls to getSamReader on both the GenomeAnalysisEngine and the TraversalEngine. This call fails in the new style traversals, but it won't disapear until the cut-over to the new traversals is complete.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@671 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 18:52:42 +00:00
ebanks 630066cc0a 1. Merge LocusWindows whose reads overlap.
2. Fix bug (we weren't clearing the "to emit" list)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@670 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 17:33:23 +00:00
aaron 9f942fdfa0 Added code to correct the violation of the parsing interface. Now the analysis type resides in the command line arg, but is stored into the argument collection before it's passed to the genomeAnalysisEngine.
Also fixed a bug where we'd exception-out if we didn't provide a interval region.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@669 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:33:55 +00:00
jmaguire c4d89997ca put in a dummy sample_name so it'll compile
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@668 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:12:42 +00:00
jmaguire c8d7223789 do pooled calling properly for 1kg
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@667 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:12:13 +00:00
jmaguire 313a6d0fb5 lots of changes to facilitate calling indels and 1kG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@666 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:11:42 +00:00
jmaguire add7b6cf65 add sample_name to constructor, misc bug fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@665 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:10:17 +00:00
jmaguire 0267ccae7f add code for computing indel genotype likelihoods
make reference lods negative


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@664 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:09:29 +00:00
jmaguire 11723fbcc2 added method indelPileup. Generates a pileup of indel alleles given reads and ofsets (as from a locus walker).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@663 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:08:24 +00:00
hanna ee9077fc69 LocusIterator iterated through LocusContexts, which was fine until now when we need something
that iterates through loci (GenomeLocs).  Rename LocusIterator to LocusContextIterator.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@662 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 13:54:57 +00:00
hanna 608948210c Check for a reference before extraction.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@661 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 13:29:44 +00:00
hanna 32696b13f5 Fixed method override issue with old-style traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@660 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 01:22:18 +00:00
hanna 862b8a6787 intervals_file + genome_loc => intervals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@659 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 01:04:18 +00:00
hanna 0bca588629 Botched some boolean logic.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@658 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:53:52 +00:00
hanna 23e9e29964 Changed reads traversals from providing a LocusContext from which the reference sequence
could be extracted to a char[] containing the reference bases.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@657 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:45:11 +00:00
hanna 052819bed5 Switched dependencies of GenomeAnalysisTK to depend on GenomeAnalysisEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@656 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:33:00 +00:00
aaron ff1b92acc4 Switch over to the GenomeAnalysisEngine/CommandLineGATK system from the GenomeAnalysisTK code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@655 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:05:58 +00:00
ebanks 009e71fcd9 We need to sort cleaned reads ourselves (instead of letting SAMFileWriter
do it) because the SAM headers are often screwed up and claim to be
"unsorted".  While here, I broke off the module from the SortSamIterator
in case someone else wants to use it.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@654 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 15:43:42 +00:00
aaron c735e1f627 small javadoc cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@653 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 03:44:21 +00:00
aaron e8b8ab5985 Added code to extend Matt's getReferenceBases out to the read walkers, so they can see the corresponding reference for each read.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@652 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 03:42:38 +00:00
aaron 4ce3feba4d my move ended up being a copy, so this is to delete dupplicate files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@651 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 02:10:26 +00:00
aaron 898f65547e Added code to split GenomeAnalysisTK.java into an object concerned with loading command line args, and one that runs the engines. This will allow us to run the GATK from other tools (like Matlab). Also some cleanup to seperate out the legacy traversals and the new style traversals. This is not live yet, and any modifications you need should be made to GenomeAnalysisTK.java for now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@650 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 02:07:20 +00:00
aaron 8d43ec3d7e a fix for a situation where a chromosome on the reference file contains no reads, and doesn't align to the bam file. This came up using reference 18, which has chomosomes like chr1_random that aren't in all BAM files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@649 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 01:39:25 +00:00
aaron ee02b61068 added support for the argument collections code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@648 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-09 07:07:33 +00:00
aaron 742840017b added the argument collection annotation for situations where fields in a command line args have embedded fields that should be checked for command line args
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@647 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-09 06:59:17 +00:00
hanna 55c1b688bd Fix mediocre javadoc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@646 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 22:31:16 +00:00
hanna 522f8b58be Added second method for getting large sequences of the reference for use in reads traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@645 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 22:18:04 +00:00
aaron 517f27f331 Added sharding strat. code that picks the right kind of shard, based on the traversal engine
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@644 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 21:55:10 +00:00
hanna 6e394490cb Cleanup in preparation for ByLoci traversal. Also did some work minimizing unit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@643 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 21:27:54 +00:00
hanna ee777c89de Change the default mechanism for adding ROD bindings to the new system. TODO: create a new object type for these triplets.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@642 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 18:43:00 +00:00
ebanks 3aabc144c6 Added functionality to allow for a contract between LocusWindowTraversalEngine and LocusWindowWalker which allows the Walker to act upon reads outside of the provided intervals.
(Really, all we want to do is spit out all reads, but this allows the Walker to do other things with the reads if it wants)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@641 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 17:28:16 +00:00
hanna de1c282e62 Reference-ordered data relies on bugs in the old command-line argument system to work. Update the ROD system to from -B track1 type1 file1 track2 type2 file2 to -B track1,type1,file1 -B track2,type2,file2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@640 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 15:28:19 +00:00
hanna 483a58627b More cleanup -- pushing shared functions down into the traversal engine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@639 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 14:12:45 +00:00
hanna 7a9cfe1f75 Push reduceInit down a level so that the walker can call into it without weird casts.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@638 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 13:46:28 +00:00
hanna a5154d99a3 Haven't heard any complaints, so I'm deleting the original implementation of TraverseByLociByReference. All TbyLbyR's will now go through the new sharding code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@637 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 13:37:00 +00:00
aaron bae4256574 Started the process to make the GATK engine into a runnable object so we can call it from other processes. Step 1: make a configuration object that can serialize to and from an XML file. This way we can store the information everyone uses shell scripts for. Also we can now pull the list of params out of the GenomeAnalysisTK.java. More to come...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@636 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 01:25:26 +00:00
hanna 226edbdef6 Hypen-style xml output. Much sexier.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@635 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 01:04:40 +00:00
hanna 4c269b8496 Cleanup LinearMicroScheduler in preparation for TraverseByLoci inclusion.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@634 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 00:58:37 +00:00
aaron 21536df308 Change the sample XML marshalling code over to simple XML, and take out the castor lines in the ivy.xml
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@633 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 00:08:25 +00:00
hanna 7f8850a8a2 Argument validation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@631 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 20:28:56 +00:00
hanna a3d8febbf2 Error message cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@630 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 19:31:32 +00:00
hanna c241d386a7 Beefed up command-line usage string.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@629 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 19:08:19 +00:00
depristo 5a6892900e fixing oddities in duplicates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@628 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:55:45 +00:00
depristo 4a26f35caa new default syntax
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@627 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:16:53 +00:00
ebanks 283a4d1b54 Fix some special-case cleaner issues.
We now do the same as brute force in all examples to date.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@626 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:16:35 +00:00
depristo 93211c1cd8 template for windowmaker utility -- total non-functional
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@625 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:13:03 +00:00
depristo 2204be43eb System for traversing duplicate reads, along with a walker to compute quality scores among duplicates and a smarter method to combine quality scores across duplicates -- v1
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@624 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:06:02 +00:00
depristo 71e8f47a6c boundQual function for capping qual values
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@623 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:04:18 +00:00
depristo e848f34896 countOccurances of char in string and max of a list of bytes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@622 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:03:49 +00:00
depristo 5a4bb76cc3 More capabilities for the pileup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@621 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:03:13 +00:00
depristo 89a26a7078 Utilities for handling duplicates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@620 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:02:24 +00:00
hanna 4f85062004 Cleanup parsing method to make it less generic.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@619 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 16:21:17 +00:00
hanna d725c6cf1c Added unit tests for parsing failures that I encountered during integration testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@618 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 14:01:54 +00:00
hanna 2f3ab53888 Oops. Arguments didn't load into applications with non-plugins (basically everything except the GATK).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@617 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 13:37:19 +00:00
hanna 4177560543 Mutually exclusive options.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@616 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 13:27:48 +00:00
hanna 752928df94 Switch to better mechanism for supplying a default.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@615 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 01:22:01 +00:00
hanna dc944ec69b First stage of ROD plumbing for MicroScheduler.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@614 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 23:26:21 +00:00
aaron 5136724884 Added code to the schedulers, one step closer to turning on the new reads traversals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@613 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 22:36:25 +00:00
hanna 9c0b81e946 Default flags to 'not required'.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@612 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 22:09:49 +00:00
asivache 072808858e added COUNT_CUTOFF arg: it is nor possible to tell the code to try to realign all read piles over trains of nearby indels with at least one indel observed in COUNT_CUTOFF or more different alignments (set the arg to 1 to realign around all indels); also, some diagnostic printouts added to the output (time spent on loading the reference, time spent on scrolling through the input bam file, counts of discarded reads)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@611 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 21:59:33 +00:00
hanna 1fe8155111 Some critical fixes for cases where argument values directly abut argument names
and for arguments with missing short names.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@610 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 21:47:34 +00:00
aaron 0aba688e6f Added a interface that all our SAMRecord iterators should try to code to. This is in the effort to keep our code generic
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@609 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 21:40:41 +00:00
hanna 62e7e46754 Miscellaneous cleanup. Better display of help output. Better exception subtyping. More thought-out access routines.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@608 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 21:16:01 +00:00
ebanks 5be75e0ae6 First version of indel cleaner walker that works on intervals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@607 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 20:20:48 +00:00
hanna 98716138e9 Cleanup: add support for non-public fields. Track matches as state of parsing engine as well as definitions.
Made fields of command-line argument system non-public by default.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@606 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 19:38:05 +00:00
aaron f5eae98af2 Fixed a bug where we could ask for a read when there were none in the pool (that's a bad thing).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@605 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 18:40:55 +00:00
hanna ef211f96b1 Remove old Apache CLI-based arg system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@604 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 18:37:51 +00:00
hanna 521aa40baa Bring new command-line argument parsing system live.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@603 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 18:16:11 +00:00
aaron 98f4920739 Added BCEL and some basic instrumenation code to the test library.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@602 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 17:18:23 +00:00
hanna bfd6dfe36c Added real-world tests and tests for conditional validation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@601 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 13:38:46 +00:00
hanna 4ac9e72739 Migrate default and GATK arguments over to new attribute system in preparation for conversion.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@600 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 23:57:48 +00:00
hanna 2ee9374975 Check for proper error output in case of boolean args with parameter specified.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@599 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 23:08:48 +00:00
hanna b0cdba8bb3 Acting on Kiran's suggestion to make the doc tag in the @Argument annotation required.x
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@598 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 22:43:40 +00:00
hanna ec0261275b Lots of command line argument validation. Catches all common validation problems, including missing required arguments, invalid arguments, and several types of misplaced argument value errors.
Still pending:
- Help system.
- Mutually exclusive arguments.
- Design includes too many classes per file.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@597 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 22:08:00 +00:00
aaron 70afda12c4 Cut the test time down
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@596 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 22:05:51 +00:00
aaron f5880109a7 Added TraverseReads test, some bug fixes discovered in the traversal test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@594 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 20:36:00 +00:00
aaron daa2163ee8 Made the MergingSamIterator2 peekable. This iterator is being a ducktaped together swiss army knife, the iterators could use a redo soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@593 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 19:15:07 +00:00
aaron 09b0b6b57d Fixes to try and speed up unmapped read traversals. Still not nearly as fast as they should be, but the next step would be to modify samtools code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@592 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 18:17:07 +00:00
hanna 6550fe6f97 Another pass of command-line arguments. Revised parser supports all types
of arguments that the existing parser supports, but does a poor job with
validation.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@591 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-04 22:41:23 +00:00
depristo 8925df2e1e More information from the duplicate combiner quality metrics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@590 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-04 21:51:01 +00:00
kcibul 2b6466ea00 coverage calculator based on Gabor's Pilot 3 Coverage Metrics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@589 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-04 14:18:16 +00:00
hanna 4f2ccda56a Interface skeleton for a new command line argument parser. Nowhere near the point of being a drop-in replacement for apache cli yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@588 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-04 00:11:42 +00:00
hanna 6e38966349 Rename some key classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@587 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 22:01:04 +00:00
hanna 5bdf653919 Cleanup: prepare for better output handling.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@586 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 21:40:46 +00:00
depristo fd496159a8 Added convenience functions for RefHanger
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@585 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 21:14:40 +00:00
depristo 7ed496b859 JUnit test for RefHanger
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@584 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 20:11:14 +00:00
hanna 9f5f6f9bc7 N-way parallelism. Works for small test cases. Untested for large test cases.
-Needs more comprehensive unit testing.
-Needs some basic refactoring.
-Needs rethink of interface boundaries.
-Needs to play more nicely in the /tmp sandbox.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@583 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 19:34:09 +00:00
kiran df88c4d6b0 Added some code to determine the on-genotype and off-genotype secondary base distributions (which, at the moment, is commented out).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@582 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:48:19 +00:00
kiran e7534b292f Optionally applies secondary base distribution priors to normal single-sample genotyper posteriors.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@581 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:36:32 +00:00
kiran 58c80d8d87 For on and off-genotype primary bases, optionally compute the concordance of the secondary bases to their expected distributions. Each genotype has slightly different profiles.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@580 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:33:48 +00:00
kiran 16467ae7cf A better (less overflow-y) implementation of multinomialProbability().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@579 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:28:16 +00:00
kiran 4f818f5c1c Choose a random base to stick in the pileup if the 2nd-best base matches the best base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@578 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:27:37 +00:00
kiran 9800d09608 A more thorough test for multinomialProbability.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@577 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 06:27:05 +00:00
depristo 84dae06d5a Initial version of ByDuplicates traversal, as well as a duplicate quality score estimator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@576 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 22:16:21 +00:00
depristo ff420f5f6f Enabled iterator() function
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@575 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 22:15:14 +00:00
depristo 12d6edfe7c Only prints about first contig info setting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@574 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 22:14:26 +00:00
depristo 1cc5e74435 More ways to access quality utils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@573 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 22:12:07 +00:00
aaron 63403d32cd Changes to the interface to the simple data source rippled out to a bunch of files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@572 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 20:35:56 +00:00
hanna 19e4e97f21 Add tag to ignore node class.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@571 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 20:27:34 +00:00
hanna 7f173af2ea Encapsulate output tracking a bit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@570 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 15:12:13 +00:00
aaron 3bf3c21ddd Changed the assert code in the genome loc to throw exceptions, and deleted a function no one seems to be using.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@569 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 13:54:51 +00:00
andrewk b630f2f2f1 More tables output by CovariateCounterWalker AND made CovariateCounterWalker and LogisticRecalibration aware of positive and negative strandedness of data which changes the regression output significantly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@568 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 01:22:50 +00:00
aaron f7a877bfeb Changed Sting exception from a base exception to a runtime exception. This makes it so you can throw it without the consumer having to check it, and hopefully people will be more inclined to use it.
Please use this instead of throwing a plain runtime exception.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@567 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 22:09:41 +00:00
hanna ba9a0b5da8 Break out some of the weird inner classes out of the HierachicalMicroScheduler.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@566 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 21:07:07 +00:00
hanna 95d10ba314 Sketch of hierarchical reduce process, with unit tests for some core classes. Requires breakout of inner classes, testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@565 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 20:26:16 +00:00
kiran 0a707a887b Added ability to evaluate best + random base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@564 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 20:05:36 +00:00
kcibul 334f158e5a added parameters for mapping quality and duplicate filters
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@563 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 18:05:34 +00:00
ebanks 7de5da7065 Start getting the cleaner working in Walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@561 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 14:59:53 +00:00
hanna 4c5f640eb7 Tweak the arguments passed to the command-line arguments parser so that it fails less often for invalid arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@560 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 14:36:27 +00:00
kcibul f557da0a78 Calculate interval-based statistics for Hybrid Selection
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@558 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 04:01:24 +00:00
hanna 6ecc43f385 Provide a default logger, some config settings, and some doc updates.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@557 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 02:06:05 +00:00
aaron b836761104 removed the test cases from the bottom of this file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@556 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 21:50:22 +00:00
aaron 6b02248298 moved the test cases out of the GenomeAnalysisTK code and into a JUnit test case
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@555 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 21:49:17 +00:00
aaron d4de68e260 added changes for the readsTraversal to accomidate design changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@553 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 19:49:58 +00:00