gatk-3.8

Commit Graph

Author	SHA1	Message	Date
hanna	6955b5bf53	Cleanup of the doc system, and introduce Kiran's concept of a detailed summary below the specific command-line arguments for the walker. Also introduced @help.summary to override summary descriptions if required. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2337 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-12 04:04:37 +00:00
hanna	cdfe204d19	Incorporated feedback from Kiran. Use the Javadoc first sentence extraction capability to just show the first sentence from each line of Javadoc. @help.description can still be used to produce exceptionally verbose descriptions. Also increased the line width as much as I could tolerate (100 characters -> 120 characters). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2336 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 21:59:55 +00:00
aaron	09811b9f34	Now that we always output the VCF header, make sure that we correctly handle the situation where there are no records in the file. Added unit tests as well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2333 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 19:51:05 +00:00
depristo	8f7554d44f	A few improvements to pooled concordance calcluations. Now will show you FN with the -V option. BasicGenotype now prints out a reasonable representaiton wiwth toString git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2320 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 23:09:10 +00:00
ebanks	2869270c11	Fixed deletion depth calculation plus mis-spelling in ReadBackedPileup method. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2315 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 21:11:42 +00:00
hanna	5eac510b2f	Refactor the code I gave Eric yesterday to output command line arguments. Convert it from a completely wonky solution to a slightly less wonky solution that will work in more cases. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2310 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 18:57:54 +00:00
ebanks	a45adadf1f	VCFGenotypeRecord already defines all the methods needed to be SampleBacked, so let's annotate it as being SampleBacked. This way, when used as a generic Genotype, sample data can be retrieved. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2305 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 04:16:21 +00:00
ebanks	4e54b91ce4	UG now outputs the FORMAT header fields when there's genotype data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2294 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 16:31:07 +00:00
ebanks	7a76e13459	Better explanation in the exception being thrown. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2291 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 03:59:36 +00:00
ebanks	717eb1de96	- Depth annotation now includes MQ0 reads - Removed MQ0 annotation - Updated RMS MQ annotation to use new pileup - UG now outputs all of its arguments as key/value pairs in the header (for VCF) - Cleaned up VCFGenotypeWriterAdapter interface a bit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2288 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 02:53:00 +00:00
ebanks	e8822a3fb4	Stage 3 of Variation refactoring: We are now VCF3.3 compliant. (Only a few more stages left. Sigh.) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2287 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-08 21:43:28 +00:00
hanna	9e2f831206	A bit of cleanup in preparation for Picard patch. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2286 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-08 16:09:04 +00:00
hanna	d3b78338da	Get rid of characters in the docs that aren't universally compatible with character sets used throughout the group. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2285 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 21:41:07 +00:00
hanna	d75d3a361a	Clean up some of the walker help output based on additional experience and feedback received. Also, add a flag to build.xml to disable generation of docs on demand (use ant -Ddisable.doc=true to disable docs). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2284 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 21:33:11 +00:00
hanna	a3e88c0b1c	Cleanup results of bad merge. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2281 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 19:30:49 +00:00
hanna	10be5a5de9	Move some files around to reflect our growing help infrastructure. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2280 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 19:23:12 +00:00
rpoplin	1d5b9883db	Added --solid_recal_mode argument to experiment with different ways of dealing with solid reference bias. Currently the default option is DO_NOTHING which means use the same behavior as the old recalibrator. Eventually the new methods in RecalDataManager will be moved over to a SolidUtils class. Added transition and transversion methods to BaseUtils that work like simpleComplement, used with the color space in my solid methods. Also, initial check-in of HomopolymerCovariate. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2276 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 14:26:27 +00:00
hanna	8089aa3c50	Adding support to override the help text. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2273 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 00:16:26 +00:00
ebanks	c0528cd88e	Updated the CallsetConcordance classes to use new VCF Variation code... and uncovered a whole bunch of VCF bugs in the process. I'm not convinced that I got them all, so I'll unit test like crazy when the refactoring is done. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2272 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-06 11:43:40 +00:00
ebanks	b6f8e33f4c	Stage 2 of Variation refactoring: VCFRecord now implements Variation, VCFGenotypeRecord now implements Genotype. Because of this change, RodVCF is now just a wrapper around the VCFRecord and does nothing else. Also, one can call toVariation on the VCFGenotypeRecord and it returns the VCFRecord. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2271 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-06 06:48:03 +00:00
hanna	3b440e0dbc	Add a taglet to allow users to override the display name in command-line help. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2270 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-06 04:12:10 +00:00
ebanks	08f2214f14	Stage 1 of massive Variation/Genotype refactoring. This stage consists only of the code originating in the Genotyper and flowing through to the genotype writers. I haven't finished refactoring the writers and haven't even touched the readers at all. The major changes here are that 1. Variations which are BackedByGenotypes are now correctly associated with those Genotypes 2. Genotypes which have an associated Variation can actually be associated with it (and then return it when toVariation() is called). The only integration tests which need to be updated are MSG-related (because the refactoring now made it easy for me to prevent MSG from emitting tri-allelic sites). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2269 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-06 03:12:41 +00:00
hanna	b04de77952	First pass at a reorganized walker info display. Groups walkers by package and displays walker data extracted from the JavaDoc. Needs a bit of help, both in content and flexibility of package naming. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2267 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-04 23:24:29 +00:00
depristo	07b88621c5	Improved RankSum calculations and RankSum annotation. Much more meaningful git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2266 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-04 22:16:40 +00:00
hanna	4c147329a9	Turn javadoc comments for packages and classes into key/value pairs in a properties file. Embed the properties file in GenomeAnalysisTK.jar. Still no support for actually displaying the archived javadoc. Also change the approach to providing package javadocs: retired the deprecated package.html file in favor of Java1.5-style package-info.java. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2263 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-04 20:08:41 +00:00
ebanks	b05e73a914	Finished implementation of the Wilcoxon Rank Sum Test thanks to Tim Fennell (calculating the normal approximation) and Nick Patterson (dithering to break tie bands). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2255 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-04 04:04:39 +00:00
ebanks	9da5cc25ad	More archiving (with permission from Andrey) plus a move to core. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2242 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-03 15:40:27 +00:00
aaron	b3bdcd0e60	make sure we close the error log stream in CommandLineProgram if it's opened; unit tests and clean-up for BasicVariation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2241 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-03 06:59:27 +00:00
ebanks	2c83f2f2bc	Move MSG - plus now obsolete classes which it depends on -- to oneoffprojects (with permission from Jared). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2224 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 20:04:22 +00:00
ebanks	2838629724	-VCF writer now checks whether the allele frequency has been set before trying to write it out. -Renamed methods to be more consistent. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2214 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 16:25:32 +00:00
depristo	6231637615	fixes for VariantAnnotations and second bases. Misc. removal of failing (and unstable) integration tests that require rereview git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2213 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 15:41:35 +00:00
jmaguire	adf8f1f8b3	Add an InputStream constructor, which is immensely useful for various reasons. Also a minor performance optimization. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2201 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-01 17:25:00 +00:00
ebanks	084337087e	Removing deprecated code and walkers for which I had the green light from repository. Moved piecemealannotator and secondarybases to archive. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2195 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-01 05:58:20 +00:00
ebanks	7c6c490652	An unfinished implementation of the Wilcoxon rank sum test and a variant annotation that uses it. I need to merge and update this code with Tim's implementation somehow - but that won't happen until later this week, so I'm committing this before I accidentally blow it away. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2193 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-01 04:56:17 +00:00
ebanks	00f15ea909	Improved performance of deletion-free pileup and added mapping-quality-zero-free pileup convenience method. Finished converting genotyper and annotator code to new ReadBackedPileup system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2192 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-01 04:50:47 +00:00
depristo	e793e62fc9	minor code cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2189 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 20:57:20 +00:00
ebanks	add2fa7ab4	more use of new ReadBackedPileup optimizations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2187 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 20:04:01 +00:00
ebanks	a184d28ce9	Completing the optimization started by Matt: we now wrap SAMRecords and SAMReadGroupRecords with our own versions which cache oft-used variables (e.g. platform, readString, strand flag). All walkers automagically get this speedup since the wrapping occurs in the engine. I note that all integration/unit tests pass except for BaseTransitionTableCalculatorJava, which is already broken. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2182 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 17:39:29 +00:00
depristo	75b61a3663	Updated, optimized REadBackedPileup. Updated test that was breaking the build -- it created a pileup from reads without bases... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2169 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 23:30:39 +00:00
depristo	db40e28e54	ReadBackedPileup in all its glory. Documented, aligned with the output of LocusIteratorByState, and caching common outputs for performance git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2165 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 20:54:44 +00:00
depristo	03342c1fdd	Restructuring and interface change to ReadBackedPileup. We now lower support the Pileup interface, the BasicPileup static methods, and the ReadBackedPileup class. Now everything is a ReadBackedPileup and all methods to manipulate pileups are off of it. Also provides the recommended iterable() interface of pileup elements so you can use the syntax for (PileupElement p : pileup) and access directly from p.getBase() and p.getQual() and p.getSecondBase(). Only a few straggler walkers use the old style interface -- but those walkers will be retired soon. Documentation coming in the AM. Please everyone use the new syntax, it's safer, and will be more efficient as soon as the LocusIteratorByState directly emits the ReadBackedPileup for the Alignment context, as opposed to the current interface. In the process of the change over, discovered several bugs in the second-best base code due to things getting out of sync, but these changes were resolved manually. All other integrationtests passed without modification. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2154 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 03:51:41 +00:00
ebanks	3484f652e7	1. Variation is now passed to VariantAnnotator along with the List of Genotypes so non-genotype calls has access to all relevant info. 2. Killed OnOffGenoype 3. SpanningDeletions is now SpanningDeletionFraction git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2151 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 21:47:20 +00:00
ebanks	e05cb346f3	GenotypeLocusData now extends Variation. Also, Variations should be INSERTIONs or DELETIONs (and not just INDELs). Technically, VCF records can be indels now. More changes coming git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2150 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 21:07:55 +00:00
aaron	8fbc0c8473	fix for bug GSA-234: fasta index files couldn't handle anything but letters, numbers, or spaces in the contig name git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2147 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 19:19:47 +00:00
ebanks	b3f561710f	Optimizations: 1. Only do calculations in UG for alternate allele with highest sum of quality scores (note that this also constitutes a bug fix for a precision problem we were having). 2. Avoid using Strings in DiploidGenotype when we can (it was taking 1.5% of my compute according to JProfiler) UG now runs in half the time for JOINT_ESTIMATE model. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2141 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 16:27:39 +00:00
ebanks	cb6d6f2686	Very minor performance improvements git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2137 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 05:21:07 +00:00
ebanks	c90bea39a1	read.getReadString().charAt(offset) --> read.getReadBases()[offset] [As a courtesy I fixed all instances once I was updating GenotypeLikelihoods] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2136 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 04:25:19 +00:00
ebanks	be6a549e7b	Added the capability to allow expressions in an integration test command (i.e. -filter 'foo') by escaping them in the command. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2132 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 02:34:48 +00:00
ebanks	dfe7d69471	1. VCF: don't print slod if it's never set 2. UG: don't print slod if lods are infinite (todo: figure out a good guess instead) 3. UG: if probF=0 for 2 alt alleles are both 0 (because of precision), use log values to discriminate git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2116 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-23 02:55:43 +00:00
ebanks	04d6ac940c	Always print out VCF header - not just when there is genotype data present. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2114 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-23 01:44:10 +00:00

1 2 3 4 5 ...

466 Commits (2748eb60e14503a2f7e9cea739c35b718d01ebf9)