gatk-3.8

Commit Graph

Author	SHA1	Message	Date
depristo	2cbc85cc7a	min mapping quality and min base quality arguments for UG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2354 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-15 03:57:27 +00:00
depristo	faa638532a	Correct location git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2353 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-15 02:42:21 +00:00
depristo	1da97ebb85	Walker for calculating non-independent base errors, v1. Will be moved to somewhere not in core git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2352 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-15 02:40:15 +00:00
rpoplin	8e44bfd2ef	CycleCovariate and PrimerRoundCovariate now correctly handle negative strand 454 and SOLID reads. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2349 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-14 21:52:30 +00:00
ebanks	97618663ef	Refactored and generalized the VCF header info code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2346 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-13 21:02:45 +00:00
depristo	05b8782d5f	Documentation updates. Moved CountX.java walkers to QC git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2345 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-13 18:40:22 +00:00
depristo	92307361a4	In preparation for move git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2344 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-13 18:28:06 +00:00
ebanks	45199136f0	Completed my documentation responsibilities - based on Mark's reasonable assignment and not the one Matt made up while on Meth. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2342 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-13 04:13:30 +00:00
ebanks	bd2a46ab4c	I want to move over to hpprojects tonight, so I'm checking in various changes all in one go: 1. Initial code for annotating calls with the base mismatch rate within a reference window (still needs analysis). 2. Move error checking code from rodVCF to VCFRecord. 3. More improvements to SNP Genotype callset concordance. 4. Fixed some comments in Variation/Genotype git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2341 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-13 02:52:18 +00:00
kiran	2748eb60e1	Added short documentation for each class so that it appears in the walker command-line documentation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2340 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-12 21:41:07 +00:00
rpoplin	78e94b5a84	TableRecalibration now puts the full list of walker arguments into the PG tag of the bam file it creates. Thanks Matt and Eric. Also, the default nback for the HomopolymerCovariate is 8, down from 10. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2339 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-12 17:29:41 +00:00
rpoplin	014013630f	Added hieracrchy to the covariate classes: Required, Standard, and Experimental. Required covariates (rg and reported quality) are added for the user whether or not they are specified in the -cov list. There is now a -standard option in CountCovariates which will add in all of the standard covariates so the user doesn't have to type them all out or even know which ones are the standard. There is logger output to say which covariates are being used of course. The list of covariates used is also added to the PG tag in the bam file produced by TableRecalibration. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2338 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-12 16:34:05 +00:00
hanna	6955b5bf53	Cleanup of the doc system, and introduce Kiran's concept of a detailed summary below the specific command-line arguments for the walker. Also introduced @help.summary to override summary descriptions if required. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2337 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-12 04:04:37 +00:00
hanna	cdfe204d19	Incorporated feedback from Kiran. Use the Javadoc first sentence extraction capability to just show the first sentence from each line of Javadoc. @help.description can still be used to produce exceptionally verbose descriptions. Also increased the line width as much as I could tolerate (100 characters -> 120 characters). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2336 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 21:59:55 +00:00
kiran	38d9f7b903	Renamed ReferenceContext's getSimpleBase() method to getBaseIndex() git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2334 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 20:14:39 +00:00
rpoplin	60c3eb4b60	Added help.description to the recalibration walkers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2331 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 19:02:29 +00:00
ebanks	2ea7632b76	The SNP genotype concordance module is now more comprehensive. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2330 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 18:34:33 +00:00
hanna	590aeee7d2	Documentation for more basic walkers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2329 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 18:15:40 +00:00
hanna	d1815f3559	More documentation for walkers that I'm familiar with in the collection of core walkers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2328 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 18:02:33 +00:00
hanna	956c36a2c8	Help for the qc package. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2327 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 17:32:47 +00:00
hanna	450ea233a5	Docs for the basic walkers: CountLoci, CountReads. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2326 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 17:17:34 +00:00
aaron	86dc98bfb5	update the documentation for CombineDuplicates for the new help system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2324 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 17:01:42 +00:00
aaron	420725441a	documentation updates for the new help system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2323 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 16:15:44 +00:00
ebanks	2de7e1a178	Move VariantAnnotator over to use a StratifiedAlignmentContext split by sample. The only major difference is that we are now able to get accurate allele balance ratios. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2321 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-11 05:28:28 +00:00
aaron	f64a4c66ac	some tweaks for the GATK paper genotyper to better work with shared memory parallelization, added documentation changes for Matt's new help system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2319 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 22:33:51 +00:00
ebanks	2869270c11	Fixed deletion depth calculation plus mis-spelling in ReadBackedPileup method. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2315 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 21:11:42 +00:00
ebanks	31b1d60d28	Generalized the StratifiedAlignmentContext code so that it's easy to add new ways to stratify. Then added an MQ0-free stratification so we don't need to be carrying around 2 different alignment contexts (full vs. mq0-free) anymore. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2314 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 19:50:06 +00:00
hanna	0c396f04a2	Fix obvious cut/paste error in output stream management code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2313 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 19:23:13 +00:00
ebanks	11ac7885b0	Pull out StratifiedAlignmentContext code so other walkers can use it. This is basically a wrapper class around AlignmentContext which allows you to stratify a context by e.g. reads on forward vs. reverse strands. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2312 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 19:21:16 +00:00
hanna	adb2fdbee7	Before, we were only checking that the reference was present if @Requires required that a reference was present. Now we always check that a reference is present, so that we get an intelligent error message. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2311 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 19:15:48 +00:00
hanna	5eac510b2f	Refactor the code I gave Eric yesterday to output command line arguments. Convert it from a completely wonky solution to a slightly less wonky solution that will work in more cases. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2310 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 18:57:54 +00:00
hanna	74b8055b6a	Only show extra walker help if the user didn't specify a walker or specified an invalid walker. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2309 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-10 16:43:06 +00:00
ebanks	f7c44ad019	- Read in arguments for the header based on reflection - Hook up Variation and Genotype in SSG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2300 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 21:35:33 +00:00
hanna	408f6f3dee	Refactoring of prior commit: better handling of unnamed package within the help system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2297 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 20:12:35 +00:00
hanna	1d2151adcf	Better handling of nulls output by git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2296 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 19:34:56 +00:00
ebanks	40c2d7a4bc	Fix all-bases-mode and genotype-mode in the UG and add integration tests for them. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2295 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 17:41:30 +00:00
ebanks	4e54b91ce4	UG now outputs the FORMAT header fields when there's genotype data. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2294 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 16:31:07 +00:00
rpoplin	12c49ea485	Added DuplicateReadFilter to filter out reads that are marked as duplicates. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2293 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 15:42:53 +00:00
ebanks	fb900b12e1	VariantFiltration now details the filters it has used in the header of the VCF it produces. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2292 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 15:36:15 +00:00
ebanks	8d67d9ade3	-Minor fix in UG for all-bases mode -Make minConfidenceScore in VariantEval a double so non-integer values can be used (requested by Steve H). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2290 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 03:49:10 +00:00
ebanks	717eb1de96	- Depth annotation now includes MQ0 reads - Removed MQ0 annotation - Updated RMS MQ annotation to use new pileup - UG now outputs all of its arguments as key/value pairs in the header (for VCF) - Cleaned up VCFGenotypeWriterAdapter interface a bit git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2288 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-09 02:53:00 +00:00
ebanks	e8822a3fb4	Stage 3 of Variation refactoring: We are now VCF3.3 compliant. (Only a few more stages left. Sigh.) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2287 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-08 21:43:28 +00:00
hanna	d75d3a361a	Clean up some of the walker help output based on additional experience and feedback received. Also, add a flag to build.xml to disable generation of docs on demand (use ant -Ddisable.doc=true to disable docs). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2284 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 21:33:11 +00:00
hanna	10be5a5de9	Move some files around to reflect our growing help infrastructure. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2280 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 19:23:12 +00:00
rpoplin	1d5b9883db	Added --solid_recal_mode argument to experiment with different ways of dealing with solid reference bias. Currently the default option is DO_NOTHING which means use the same behavior as the old recalibrator. Eventually the new methods in RecalDataManager will be moved over to a SolidUtils class. Added transition and transversion methods to BaseUtils that work like simpleComplement, used with the color space in my solid methods. Also, initial check-in of HomopolymerCovariate. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2276 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-07 14:26:27 +00:00
ebanks	c0528cd88e	Updated the CallsetConcordance classes to use new VCF Variation code... and uncovered a whole bunch of VCF bugs in the process. I'm not convinced that I got them all, so I'll unit test like crazy when the refactoring is done. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2272 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-06 11:43:40 +00:00
ebanks	b6f8e33f4c	Stage 2 of Variation refactoring: VCFRecord now implements Variation, VCFGenotypeRecord now implements Genotype. Because of this change, RodVCF is now just a wrapper around the VCFRecord and does nothing else. Also, one can call toVariation on the VCFGenotypeRecord and it returns the VCFRecord. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2271 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-06 06:48:03 +00:00
hanna	3b440e0dbc	Add a taglet to allow users to override the display name in command-line help. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2270 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-06 04:12:10 +00:00
ebanks	08f2214f14	Stage 1 of massive Variation/Genotype refactoring. This stage consists only of the code originating in the Genotyper and flowing through to the genotype writers. I haven't finished refactoring the writers and haven't even touched the readers at all. The major changes here are that 1. Variations which are BackedByGenotypes are now correctly associated with those Genotypes 2. Genotypes which have an associated Variation can actually be associated with it (and then return it when toVariation() is called). The only integration tests which need to be updated are MSG-related (because the refactoring now made it easy for me to prevent MSG from emitting tri-allelic sites). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2269 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-06 03:12:41 +00:00
hanna	b04de77952	First pass at a reorganized walker info display. Groups walkers by package and displays walker data extracted from the JavaDoc. Needs a bit of help, both in content and flexibility of package naming. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2267 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-04 23:24:29 +00:00
depristo	07b88621c5	Improved RankSum calculations and RankSum annotation. Much more meaningful git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2266 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-04 22:16:40 +00:00
hanna	4c147329a9	Turn javadoc comments for packages and classes into key/value pairs in a properties file. Embed the properties file in GenomeAnalysisTK.jar. Still no support for actually displaying the archived javadoc. Also change the approach to providing package javadocs: retired the deprecated package.html file in favor of Java1.5-style package-info.java. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2263 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-04 20:08:41 +00:00
ebanks	1e8dcc30da	-dbSNP rod should not implement VariantBackedByGenotype since dbsnp records have no genotype data -added code to cache the allele list so it didn't need to get recomputed each time it was requested. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2260 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-04 14:56:48 +00:00
ebanks	58937bf9ba	You can now use the -exp flag to tell the Genotyper to include experimental annotations when it calls out to VariantAnnotator. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2256 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-04 04:45:05 +00:00
ebanks	b05e73a914	Finished implementation of the Wilcoxon Rank Sum Test thanks to Tim Fennell (calculating the normal approximation) and Nick Patterson (dithering to break tie bands). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2255 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-04 04:04:39 +00:00
ebanks	861221d046	- Moved various header line printing into a single method - Fixed output for coverage above min depth git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2254 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-04 02:15:43 +00:00
ebanks	aef4be5610	Moved CoarseCoverageWalker to core and packaged both coverage walkers in coverage/ git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2249 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-03 17:53:36 +00:00
ebanks	c2017cc91b	PrintCoverageWalker functionality moved to DepthOfCoverageWalker. Added integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2247 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-03 17:23:59 +00:00
ebanks	01cf5cc741	1. Merged CoverageHistogram into DepthOfCoverageWalker 2. Fixed bug in histogram calculation for small intervals 3. Better output in DoCWalker 4. Comments added to code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2245 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-03 17:01:53 +00:00
ebanks	44b9f60735	PercentOfBasesCovered functionality moved to DepthOfCoverageWalker. Added integration tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2244 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-03 16:11:09 +00:00
ebanks	126d1eca35	Move to core (qc/) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2243 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-03 15:45:58 +00:00
ebanks	9da5cc25ad	More archiving (with permission from Andrey) plus a move to core. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2242 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-03 15:40:27 +00:00
ebanks	a88202c3f6	Refactored DoCWalker to output in a more helpful and usable style. It now outputs in tabular format with 2 different sections: per locus and then per interval. I am now at a point where I can merge the functionality from other coverage walkers into this one. Thanks to Andrew for input. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2239 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-03 05:28:21 +00:00
ebanks	d7e4cd4c82	Moving some useful and stable walkers to core: - ClipReads - PrintRODs (generalized to print all RODs that are Variations) - FixBAMSortOrderTag (added documentation to walker so that people know what it does and why) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2238 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-03 03:00:45 +00:00
rpoplin	46f3d3e39b	Added comments to AnalyzeCovariates and R scripts. R script prevents residuals from going off the edge of the plot. Added skeleton code to the recalibration walkers showing how we plan to handle SOLID reference inserting behavior. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2233 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 23:15:52 +00:00
depristo	dec0a781c2	Un-reinventing the wheel. --sleep argument removed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2227 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 20:19:28 +00:00
chartl	6a9e7bea05	Removing experimental annotations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2220 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 19:03:55 +00:00
ebanks	0a2304eff8	- Rename minConfidenceScore in VariantEval to minPhredConfidenceScore - Moved validation walkers to new qc dir - Killed unused test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2218 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 17:59:19 +00:00
ebanks	a5dfc9107d	- Cleaned up annotation code some more - Use QualityUtils when phred-scaling now git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2217 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 17:45:29 +00:00
ebanks	7055a3ea2d	- All annotations are now required to return their VCF INFO keys and descriptions - Renamed keys to fit with the standard naming - FisherStrand is no longer standard - Integration tests no longer test experimental annotations since they're not stable git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2216 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 17:24:06 +00:00
rpoplin	67179e2412	Initial checkin of AnalyzeCovariates.java which replaces analyzeRecalQuals_1KG.py and is updated to use the new Covariates system. It creates similar plots of residual error for each covariate that was used in the calculation. There is also an option to filter out base qualities below a given threshold. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2215 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 16:47:35 +00:00
ebanks	2838629724	-VCF writer now checks whether the allele frequency has been set before trying to write it out. -Renamed methods to be more consistent. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2214 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 16:25:32 +00:00
depristo	6231637615	fixes for VariantAnnotations and second bases. Misc. removal of failing (and unstable) integration tests that require rereview git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2213 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 15:41:35 +00:00
ebanks	b979bd2ced	- Optimized implementation of -byReadGroup in DoCWalker - Added implementation of -bySample in DoCWalker - Removed CoverageBySample and added a watered down version to the examples directory git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2209 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 03:39:24 +00:00
ebanks	7c73496e72	Moved DoC walker over to new pileup system so it no longer moves like it's stuck in molasses. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2208 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 02:46:39 +00:00
ebanks	05923f7fba	Started transition to oneoffprojects. Moved/killed a few other walkers (with permission). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2204 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-01 21:19:02 +00:00
ebanks	c36069355e	Trivial change to verbose git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2203 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-01 20:48:10 +00:00
rpoplin	3180fffd43	Eliminated unnecessary boxing of longs in RecalDatum. Changes to RecalDatum in preparation for new AnalyzeCovariates script. Updated TableRecalibrationWalker to make use of these changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2199 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-01 16:49:05 +00:00
chartl	21a9a717e4	Some minor changes and test: - DepthOfCoverage is now by reference (so locus-by-locus output correctly reports zero-coverage bases) - VariantsToVCF now lets you bind variants with any string except intervals and dbsnp (not just NA######) - A PileupWalker integration test on a particularly nasty FHS site - Two second-base annotation related integration tests on that same site + outputs were all hand-validated in matlab; within a certain tolerance for the annotations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2197 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-01 15:15:54 +00:00
ebanks	7c6c490652	An unfinished implementation of the Wilcoxon rank sum test and a variant annotation that uses it. I need to merge and update this code with Tim's implementation somehow - but that won't happen until later this week, so I'm committing this before I accidentally blow it away. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2193 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-01 04:56:17 +00:00
ebanks	00f15ea909	Improved performance of deletion-free pileup and added mapping-quality-zero-free pileup convenience method. Finished converting genotyper and annotator code to new ReadBackedPileup system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2192 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-01 04:50:47 +00:00
rpoplin	6bb864da2a	More misc cleanup. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2191 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 22:29:07 +00:00
rpoplin	b89b9adb2c	misc code cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2190 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 21:16:00 +00:00
rpoplin	4969cb1957	CountCovariates uses new optimized ReadBackedPileup. It also smarter about re-doing calculations for the dnsnp variation rate sanity check. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2188 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 20:35:40 +00:00
ebanks	add2fa7ab4	more use of new ReadBackedPileup optimizations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2187 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 20:04:01 +00:00
rpoplin	817e2cb8c5	Recalibrator makes use of the new GATKSAMRecord wrapper and now no longer has to hash the SAMRecord. Covariate's getValue method signature has changed to take the SAMRecord instead of the ReadHashDatum. ReadHashDatum removed completely. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2185 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 19:59:17 +00:00
ebanks	e9a8156cfb	Use new optimized ReadBackedPileup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2184 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 18:17:18 +00:00
rpoplin	d8146ab23d	Changed the format of the recalibration csv file slightly so that it is easier to load the file into something like R and look at the values of the covariates. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2183 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 17:55:23 +00:00
ebanks	a184d28ce9	Completing the optimization started by Matt: we now wrap SAMRecords and SAMReadGroupRecords with our own versions which cache oft-used variables (e.g. platform, readString, strand flag). All walkers automagically get this speedup since the wrapping occurs in the engine. I note that all integration/unit tests pass except for BaseTransitionTableCalculatorJava, which is already broken. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2182 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-30 17:39:29 +00:00
hanna	3300ca906a	An iterator for Eric to use when injecting his new wrapping reads -- a stopgap solution for getting additional caching functionality into a SAMRecord. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2173 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-27 22:25:52 +00:00
rpoplin	26db15be5c	Added SingleReadGroupFilter to only use reads from a specific read group, filtering out all others. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2172 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-27 20:33:59 +00:00
rpoplin	91f5672a32	misc cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2171 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-27 19:56:20 +00:00
rpoplin	d1298dda13	Encapsulated the sections of code that were shared by the two Recalibration walkers. This includes both the shared command line arguments and the section of code in the map methods which pull out data from the SAMRecord and stuff it into the ReadHashDatum. Command line arguments are now passed to the Covariates using a new initialize method that all Covariates must implement. Updated the dbsnp sanity check warning message to be less cryptic. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2170 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-27 19:54:10 +00:00
depristo	75b61a3663	Updated, optimized REadBackedPileup. Updated test that was breaking the build -- it created a pileup from reads without bases... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2169 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 23:30:39 +00:00
alecw	ac1b289d55	Add tile to ReadHashDatum, and implement TileCovariate git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2166 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 21:41:42 +00:00
depristo	db40e28e54	ReadBackedPileup in all its glory. Documented, aligned with the output of LocusIteratorByState, and caching common outputs for performance git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2165 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 20:54:44 +00:00
rpoplin	b44363d20a	Removed silly casts from Integer to int. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2164 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 19:59:21 +00:00
ebanks	d0f673f0c0	Use Math.abs so we don't get (inconsistent) -0's git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2160 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 19:08:34 +00:00
rpoplin	6ff8526592	Added arguments to the recalibration walkers so the user can specify the default read group id and platform to use when a read has no read group. There are also options to force every read group and every platform to be the specified values. Added integration tests that use a bam file with no read groups. Added comments to all the covariates to explain what each of the methods in the Covariate interface are used for. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2157 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 15:41:12 +00:00
ebanks	e1e5b35b19	Don't have the spanning deletions argument be a hard cutoff, but instead be a percentage of the reads in the pileup. Default is now 5% of reads. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2155 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 04:54:44 +00:00
depristo	03342c1fdd	Restructuring and interface change to ReadBackedPileup. We now lower support the Pileup interface, the BasicPileup static methods, and the ReadBackedPileup class. Now everything is a ReadBackedPileup and all methods to manipulate pileups are off of it. Also provides the recommended iterable() interface of pileup elements so you can use the syntax for (PileupElement p : pileup) and access directly from p.getBase() and p.getQual() and p.getSecondBase(). Only a few straggler walkers use the old style interface -- but those walkers will be retired soon. Documentation coming in the AM. Please everyone use the new syntax, it's safer, and will be more efficient as soon as the LocusIteratorByState directly emits the ReadBackedPileup for the Alignment context, as opposed to the current interface. In the process of the change over, discovered several bugs in the second-best base code due to things getting out of sync, but these changes were resolved manually. All other integrationtests passed without modification. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2154 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-25 03:51:41 +00:00
ebanks	2cb3e53b0b	Verbose mode shouldn't be printing out 'NaN's and 'Infinity's git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2153 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 22:01:00 +00:00
rpoplin	c9ff5f209c	Added a CountCovariates integration test that uses a vcf file as the list of variant sites to skip over instead of the usual dbSNP rod. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2152 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 21:51:38 +00:00
ebanks	3484f652e7	1. Variation is now passed to VariantAnnotator along with the List of Genotypes so non-genotype calls has access to all relevant info. 2. Killed OnOffGenoype 3. SpanningDeletions is now SpanningDeletionFraction git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2151 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 21:47:20 +00:00
ebanks	e05cb346f3	GenotypeLocusData now extends Variation. Also, Variations should be INSERTIONs or DELETIONs (and not just INDELs). Technically, VCF records can be indels now. More changes coming git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2150 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 21:07:55 +00:00
rpoplin	8b30279edc	style update git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2149 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 20:56:31 +00:00
rpoplin	dffa46b380	BAM files created by TableRecalibration now have the version number and list of covariates used appended to their header with a new 'PG' tag. Eventually the entire list of command line args will be put in there as well. Big thanks to Matt and Aaron. The integration test uses the --no_pg_tag so that the md5 doesn't change every time the version number changes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2148 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 20:53:57 +00:00
rpoplin	277e6d6b32	Further optimizations of TableRecalibration. This completes my goal of having the only math done in the map function be addition, subtraction and rounding the quality score to an integer. Everything else has been moved to the initialize method and only done once. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2145 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 18:21:57 +00:00
ebanks	87c1860398	I'm not sure I believe it, but JProfiler claims that calling FourBaseProbs.isVerbose() was taking 5% of my runtime... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2142 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 17:00:32 +00:00
ebanks	b3f561710f	Optimizations: 1. Only do calculations in UG for alternate allele with highest sum of quality scores (note that this also constitutes a bug fix for a precision problem we were having). 2. Avoid using Strings in DiploidGenotype when we can (it was taking 1.5% of my compute according to JProfiler) UG now runs in half the time for JOINT_ESTIMATE model. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2141 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 16:27:39 +00:00
rpoplin	a59e5b5e1a	Added dbSNP sanity check to CountCovariates. If the mismatch rate is too low at dbSNP sites it warns the user that the dbSNP file is suspicious. Added option in CountCovariates and TableRecalibration to ignore read group id's and collapse them together. Also, If the read group is null the walkers no long crash with NullPointerException but instead warn the user the read group and platform are defaulting to some values. Default window size in MinimumNQSCovariate is 5 (two bases in either direction) based on rereading of Chris's analysis. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2140 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 16:16:44 +00:00
alecw	e5e6d515c3	Fix misunderstanding of GenomeLoc interval git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2138 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 15:12:49 +00:00
ebanks	cb6d6f2686	Very minor performance improvements git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2137 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 05:21:07 +00:00
ebanks	c90bea39a1	read.getReadString().charAt(offset) --> read.getReadBases()[offset] [As a courtesy I fixed all instances once I was updating GenotypeLikelihoods] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2136 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 04:25:19 +00:00
ebanks	ec321abd7b	Added ability to filter on the QUAL field git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2135 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 04:08:22 +00:00
ebanks	36d493e645	All standard annotations now inherit from StandardVariantAnnotation. Users can specify whether they want all annotations, just the standard annotations, or specific annotations. When calling in from another walker, the default is just the standard ones. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2134 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 03:55:12 +00:00
ebanks	ee5093d2c6	-Added VariantFiltration integration tests -Added integration test for GLFs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2133 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-24 02:36:27 +00:00
rpoplin	9e4eadc37c	CountCovariates v2.0.2: Added a --process_nth_locus <int> argument to only use every Nth covered locus when creating the recalibration table. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2129 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-23 22:07:38 +00:00
ebanks	ed4cf3de57	Check that we're biallelic before calling isSNP() git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2127 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-23 20:20:48 +00:00
rpoplin	5744a1d968	The covariates don't care about SAMRecord's anymore - Cleaning up the import statements. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2126 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-23 20:10:12 +00:00
chartl	23983b2fd8	New annotation: ResidualQuality Computes a metric for how much error is left that isn't explained by ref or snp bases. This is the sum of Q scores, weighted by the proportion of non-ref non-snp bases to non-snp bases. Reported in Log space. Update to the integration test so bamboo doesn't look as though someone murdered it with a spork git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2124 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-23 20:04:01 +00:00
ebanks	70059a0fc9	Refactored joint estimation model to allow subclasses to overload PofD calculation over all frequencies. Pooled model now takes only 20% of time that it used to. Added integration test for pooled model and updated other joint estimation tests to be more comprehensive now that they are faster. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2123 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-23 20:03:38 +00:00
rpoplin	7f947f6b60	Updated recalibrator integration tests to use all three platforms as well as a bam with multi-platform reads intermingled. CountCovariates v2.0.1: Once again uses a read filter to filter out zero mapping quality reads. Added --sorted_output option to output the table recalibration file in sorted order git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2122 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-23 19:51:36 +00:00
ebanks	14bf6ce83c	1. Newest version of the joint estimation model. Faster than previous version and now qscores can get to be > 39.8 for hets. 2. More sanity checks in annotations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2119 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-23 17:05:50 +00:00
rpoplin	1d46de6d34	The old recalibrator is replaced with the refactored recalibrator. Added a version message to the logger output. These walkers start at version 2.0.0 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2117 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-23 14:58:33 +00:00
ebanks	dfe7d69471	1. VCF: don't print slod if it's never set 2. UG: don't print slod if lods are infinite (todo: figure out a good guess instead) 3. UG: if probF=0 for 2 alt alleles are both 0 (because of precision), use log values to discriminate git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2116 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-23 02:55:43 +00:00
ebanks	753cb100a3	Add checks for weird situations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2115 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-23 02:14:25 +00:00
ebanks	bf935a6ab1	1. Fixed bug in PrimaryBaseSecondaryBaseSymmetry code (not checking for null before trying to access object's methods) which was causing Integration Tests to fail. 2. Retired allele frequency range from UG, which wasn't very useful. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2113 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-23 01:31:48 +00:00
depristo	9c206abb97	removing unnecessary printing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2110 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-22 12:41:48 +00:00
chartl	59416ae06a	This is an annotation adapted from one that Mark Daly suggested some time ago. Right now it calculates: - For all reference bases, the proportion of their second best bases that support the SNP - the proportion of non-reference bases that support the SNP and reports the difference between the two. Initially I was taking depth into account as well, but that did not appear to work as nicely as I'd like (even at 20,000x depth, if 95% of the non-reference bases are C, and 98% of the reference second-best-bases are C, then we would want to be suspicious of it; but perhaps slightly less so than if the depth were only 20...) Anyway it's now available. I'm not sure how useful it will be, but I spawned the FHS annotation jobs again, so we'll see. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2109 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-22 00:47:49 +00:00
depristo	27122f7f97	Performance improvements for pooled caller. Now possible to actually run on real data in a finite amount of time. Minor changes to GL interface (making strandIndex public) to support cached calculations in pooled caller. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2107 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-21 15:07:40 +00:00
ebanks	797bb83209	New VariantFiltration. Wiki docs are updated. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2105 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-20 19:50:26 +00:00
ebanks	d84444200b	The Unified Genotyper now sorts the sample names in the vcf that it outputs. [There was no reason to enforce that every VCF being output from the GATK should have the samples sorted, since someone might want them ordered non-alphabetically] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2102 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-20 16:13:18 +00:00
ebanks	2a5349d886	VariantAnnotator now adds dbsnp id if a dbsnp rod is supplied and it's not already set for a record git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2100 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-20 03:26:09 +00:00
depristo	82fd824c4d	Continuing improvements to unified genotyper git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2098 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-20 01:39:29 +00:00
rpoplin	22aaf8c5e0	Added the old recalibrator integration tests to the refactored recalibrator sitting in playground. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2096 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-19 22:43:28 +00:00
aaron	6ba1f3321d	Fixed the sample mix-up bug Kiran discovered, and added a unit test in the VCF reader class (Thanks for the good example files Kiran). Also renamed the toStringRepresentation function to toStringEncoding, and added a matching method in VCFGenotypeRecord. Updated the integration tests that were failing to due to different ordering of genotyping entries in VCF, I'll check in the VCF diff tool I wrote when I get a cycle or two. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2092 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-19 18:17:47 +00:00
alecw	7623b39927	Add rodPicardDbSNP git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2088 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-19 17:27:46 +00:00
ebanks	7b957d3e2e	Make the whining from Khalid's office stop already git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2079 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-19 03:04:48 +00:00
hanna	85bc9d3e91	(Hopefully) temporary hack: load contig information by contig name rather than contig id to avoid off-by-one errors. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2078 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 23:33:27 +00:00
ebanks	f667bed7fc	-Don't annotate allele balance or on-off genotype if there's no genotype data -If qscore is infinity (because of precision) make a best guess instead git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2076 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 22:01:32 +00:00
ebanks	087e01a439	minor changes for --noSLOD git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2074 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 18:48:01 +00:00
ebanks	a70cf2b763	A bunch of changes needed to make outputting pooled calls possible git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2073 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 18:42:57 +00:00
ebanks	0a35c8e0ba	1. The joint estimation model now constrains genotypes to be AA,AB,or BB only (i.e. to use a single alternate allele). Note that this doesn't work for the old models (point estimate or SSG) because calculations aren't divided by alternate allele. 2. Allele frequency spectrum is not emitted for single samples (since it doesn't make sense). 3. If in pooled mode, throw an exception of pool size isn't set appropriately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2072 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 17:43:15 +00:00
depristo	6fe1c337ff	Pileup cleanup; pooled caller v1 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2070 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 17:03:48 +00:00
chartl	b68d6e06b7	Rollback of the previous "fix" and implementation of the real fix. We totally do want to annotate the call if called by another walker. Totally boneheaded misenterpretation of what the code was doing -- Eric, please forgive me for being an idiot. Instead, change the StingException to what it really should be -- an IllegalStateException, which is not coincidentally already handled by the calling function. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2067 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 06:09:24 +00:00
chartl	95f1be94c0	Fix for the broken build: do not attempt to annotate if UnifiedGenotyper is called from another walker! Why this didn't break the build earlier I have no idea. Ultimately, there should be a better way of interfacing UG with another walker -- what if some other walker wants the annotations from UG? But since we're calling map directly -- and the annotations don't get returned directly from map -- this needs to be handled differently, while the map function should ultimately return the LOD score or quality under the GCM alone. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2066 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 05:56:31 +00:00
ebanks	9fb50e9bd9	Further refactoring so that pooled calling will work. Okay, Mark, you should be all set. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2065 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 00:18:13 +00:00
chartl	539f6f15e5	Added -- Second base skew annotations and integration tests. Nothing need be given except -A SecondBaseSkew; the statistic it annotates calls with is a chi-square statistic given by the deviation of the observed proportion of reference second-best-bases from the expected 1/3. Future additions may be to ask that the deviation be instead from a given transition table. A big note for all users: All IllegalStateExceptions from the variation ROD (e.g. the RodGeliText) are dealt with SILENTLY. I understand this isn't optimal, but I'd rather simply not annotate a non-bi-allelic site than fail completely (there are quite a few such sites even on the regions over which the integration test has been written). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2064 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-18 00:11:13 +00:00
depristo	42a0bbaf46	Minor reformating for pooled calling git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2063 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-17 22:06:11 +00:00
ebanks	4d9c826766	Integration tests actually run on real data now. <tries to hide sheepish grin> git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2061 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-17 21:04:14 +00:00
ebanks	a048f5cdf1	-Refactored JointEstimation code so that pooled calling will work -Use phred-scale for fisher strand test -Use only 2N allele frequency estimation points git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2059 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-17 20:21:15 +00:00
asivache	21729d9311	Do not print debug message when debug mode is not requested!! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2056 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-16 20:28:41 +00:00
rpoplin	967215066d	The old CountCovariates now warns the user if they didn't supply a dbSNP rod file. Thanks Kiran for the use case. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2055 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-16 19:16:46 +00:00
ebanks	4558375575	Stage 1 of the VariantFiltration refactoring is now complete. There now exists a parallel tool called VariantAnnotator which simply takes variant calls and annotates them with the same type of data that we used to use for filtering (e.g. DoC, allele balance). The output is a VCF with the INFO field appropriately annotated. VariantAnnotator can be called as a standalone walker or by another walker, as it is by the UnifiedGenotyper. UG now no longer computes any of this meta data - it relegates the task completely to the annotator (assuming the output format accepts it). This is a fairly all-encompassing check in. It involves changes to all of the UG code, bug fixes to much of the VCF code as things popped up, and other changes throughout. All integration tests pass and I've tediously confirmed that the annotation values are correct, but this framework could use some more rigorous testing. Stage 2 of the process will happen later this week. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2053 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-16 02:41:20 +00:00
kiran	103763fc84	An accessor for the VCF header git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2051 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-15 09:28:25 +00:00
ebanks	bf451873ff	1. Bug fix: check that AF=0 doesn't contain more probability than 1-fraction 2. Fix for Kiran: allow UG to call SNPs at deletion sites; we'll add an annotation to the VariantAnotator for deletions at the locus (next week). 3. Added integration tests for joint estimation model git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2038 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 18:02:18 +00:00
asivache	1be36ca959	Bug fix: when cleanedReadIterator is initialized, it gets immediately set to the contig of the first cleaned read; when the first uncleaned read coming in is on the lower contig, this would trigger 'readNextContig' with that lower contig as an argument. As the result, the whole cleaned reads file would be read through the end and no cleaned reads would be ever seen by the code afterwards. Now we do not call readNextContig if the (uncleaned) read's contig is lower than the current contig already loaded into cleanedReadIterator. the 'readNextContig' method now also throws an exception if requested contig is less than the currently loaded one git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2037 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 15:41:26 +00:00
depristo	cff31f2d06	comments for eric git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2035 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 14:19:31 +00:00
aaron	234bb71747	changed the toVariation() method to take a reference base, instead of using the reference base loaded from the underlying data source (if it was reference aware). Also changed some isVariant() methods which weren't using the passed in ref base. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2034 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 06:54:38 +00:00
ebanks	902cf84448	Bug fix: if the most likely allele frequency is 0, don't make a variant call (even if the Qscore for AF=1/n > threshold) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2033 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 04:10:32 +00:00
ebanks	555fb975de	1. Print out allele frequency range (from joint estimation model only). 2. Don't print verbose output from SLOD calculation (it's just a repeat of previous output). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2032 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-13 03:59:13 +00:00
hanna	8145ed4672	Take 2, updating picard with bug fix for bam files containing no reads. Just stomped on the existing md5s because that's what Eric told me to do. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2029 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-12 22:52:08 +00:00
ebanks	61b5fb82ce	2 major changes: 1. Add dbsnp RS ID to VCF output from genotyper; to do this I needed to fix the dbsnp rod which did not correctly return this value. 2. Remove AlleleBalanceBacked and instead generalize the arbitrary info fields backing VCFs (and potentially others) in preparation for refactoring VariantFiltration next week. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2028 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-12 22:51:49 +00:00
aaron	c3c001e02e	cleanup of the traversal output code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2026 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-12 06:18:10 +00:00
ebanks	0922400ca9	Don't try to calculate ratios when DoC is zero (which happens when calls are made by an LD-aware genotyper) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2025 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-12 02:51:44 +00:00
hanna	2ea85fb62b	Fix some problematic command-line argument naming and descriptions. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2023 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-12 02:12:26 +00:00
depristo	6c9f86bb4d	Removed unnecessary output and added debugging print() routine git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2020 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-11 18:37:36 +00:00
hanna	8406325247	New Picard is breaking one of the integration tests. Revert until we find out whether the cause is legit. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2017 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-11 03:59:32 +00:00
hanna	499e7d1d75	Push forward some more delicate merging routines. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2016 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-11 03:07:34 +00:00
hanna	bae4d3f7ea	Updated Picard with fix for Doug Voet. Thanks Alec. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2015 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-11 02:01:08 +00:00
hanna	2e4782f202	Command-line arguments for SamReadFilters. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2014 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-10 23:36:17 +00:00
hanna	2cf9670d1e	Allow users to directly specify filters from the command-line, applicable to any walker. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2012 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-10 18:40:16 +00:00
ebanks	6a37090529	Output changes for VCF and UG: 1. Don't cap q-scores at 99 2. Scale SLOD to allow more resolution in the output 3. UG outputs weighted allele balance (AB) and on-off genotype (OO) info fields for het genotype calls (works for joint estimation model and SSG) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2011 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-10 16:31:31 +00:00
depristo	d316cbad4c	VariantFilteration now accepts a VCF rod in addition to an input geli. It will then annotate this VCF file with filtering information in the INFO field too. --OnlyAnnotate will not write in filtering output git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2008 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-10 13:24:58 +00:00
aaron	f9819d5f13	a little clean-up git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2007 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-10 06:18:34 +00:00
aaron	2ed423ed56	print the current location in read walkers (in addition to the number of reads processed), along with some refactoring to support the change. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2006 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-10 05:57:01 +00:00
ebanks	2fa2ae43ec	Enough people have found this useful, so... Moving Callset Concordance tool to core and adding integration test. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2003 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 20:59:18 +00:00
ebanks	3793519bd4	-Added convenience method to VCF record to tell if it's a no call and have rodVCF use it before querying for info fields -Don't restrict info fields to 2-letter keys [about to move these to core] git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2002 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 20:52:51 +00:00
ebanks	74751a8ed3	-Some minor fixes to get accurate vcf record merging done -Improvement to snp genotype concordance test And with that, it looks like I get revision #2000. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2000 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 06:40:55 +00:00
ebanks	7ce0df76f8	Added accessors to the rod data sources so that walkers can access the name/file/type triplets for input rods. This is necessary if e.g. you want to create a vcf writer based on all of the samples being input. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1994 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 04:25:39 +00:00
ebanks	d07f3bb6f6	Added methods to get strand bias and to test if record has allele freq or bias fields set. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1993 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-09 04:20:35 +00:00
kiran	3313b0ddb4	Fixed a minor bug where the lodThreshold wasn't being printed in the header. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1992 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-08 16:51:36 +00:00
kiran	567f5758d2	Optionally lists read depths by read group. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1990 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-08 16:39:19 +00:00
hanna	21c5f543fa	Fix sharding bug -- loci to which >100,000 (= 1 shard) reads are assigned an alignment start will confuse the sharding system and cause it to return duplicate reads. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1987 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-08 14:27:26 +00:00
ebanks	d549347f25	Refactored GenotypeLikelihoods to use an underlying 4-base model. It needs to be modified a bit and then hooked up to a pooled model, but that is now possible. At this point, there is no difference to the Unified Genotyper. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1978 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-05 21:59:25 +00:00
aaron	aacd72854f	a fix for a bug Andrey discovered: in read-based interval traversals we're dupplicating reads in rare cases. The problem was that to accomidate a bug in SAM JDK indexing, we were forced to add one to the stop of our QueryOverlapping() calls to ensure we always got all of the overlapping reads. Added a PlusOneFixIterator that wraps other iterators, and eliminates reads that start outside of our intended interval (interval stop - 1). Updated and checked BamToFastqIntegrationTest MD5 sums. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1976 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-05 05:26:33 +00:00
ebanks	a545859c62	Joint Estimation model now emits a reasonable slod git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1969 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-03 21:12:42 +00:00
ebanks	11d950abe0	No longer allow the lod_threshold argument - use confidence instead. Have UG output qscores in all cases. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1968 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-03 16:18:51 +00:00
asivache	2fb45dbd73	Make window size a command line argument git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1967 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-03 16:13:35 +00:00
asivache	55f61b1f88	Bug fix in adjustment of the shift position. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1966 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-03 16:08:11 +00:00
ebanks	3a33401822	2nd stage of the genotyper output refactoring is complete. Now, all output is generalized and all of the intelligence lies where it is supposed to. Next stage is syncing up old and new models and making sure we're outputting exactly what we should. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1960 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-02 22:43:08 +00:00
aaron	ba67c7f02b	added a warning for those using bed files; we properly convert bed to the internal representation but the user needs to be aware that any output will be one-based closed intervals git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1959 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-02 21:09:18 +00:00
aaron	b71b66bd88	the underlying parameter is a float so we need to use Float.valueOf() instead; Noticed by external user Hou Huabin git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1958 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-02 20:22:25 +00:00
ebanks	af6d0003f8	-Generalized the GenotypeConcordance module to deal with any number of individuals (although it will default to its old behavior if the -samples argument is left out). -Make rods return the appropriate type of Genotype calls from getGenotype(). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1954 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-01 05:35:47 +00:00
asivache	4b0796ba58	After fixing a few glitches and bugs, this version finally works as intended git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1952 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-31 04:59:58 +00:00
asivache	ea8d5c7077	Some internal refactoring. Now "safely" ignores duplicate records (NOT duplicate reads but rather malformed bam files!) resulting from the bug/feature in CleanedReadInjector. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1949 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 17:50:51 +00:00
ebanks	4ee1d6f733	-Have the calculation models determine whether a call passes the lod/confidence thresholds (as opposed to returning everything and letting the UG decide); this way, walkers which call map() will get only the good calls. -Do the right thing in all models for all-base-mode (for Kiran). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1940 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-30 02:35:51 +00:00
asivache	e3b4d4cbed	Genotyper reimplemented. Does the same thing, at least for now, but internal data structures redesign enables collecting various statistics for indel-containing/reference-matching reads. The statistics are not yet used by the caller itself to make a better judgement w.r.t. the validity of the calls it makes, but they are now printed into the output stream (--verbose). The statistics (for both normal and tumor) include: indel observation count/total coverage, av. number of mismatches per indel-containing and per ref-matching read, av. mapping quality, av. mismatch rate and av. base quality within an NQS windoew around the indel, numbers of indel and ref observations per strand. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1936 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-29 19:09:16 +00:00
ebanks	5cdbdd9e5b	now that the design is stable, pull the setReference and setLocation methods back out of Genotype and stick them into constructors of implementing classes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1931 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-29 13:27:37 +00:00
ebanks	3091443dc7	Sweeping changes to the genotype output system, as per several discussions with Matt & Aaron. Some things still need to be changed, but it will entail some more design decisions first (which means I get to bug M&A again tomorrow!). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1930 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-29 03:46:41 +00:00
depristo	86573177d1	Reverting rod walkers to use underlying refwalker implementation while we work on ROD2 and reenable the system. Added some serious sparse file parsing to variant eval tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1929 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-29 01:04:37 +00:00
aaron	5a3bd50537	adding error log reporting to the GATK, and a stream based output method for the argument collection git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1926 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-28 19:56:05 +00:00
aaron	04e9a494e9	removed the GenotypesBacked interface, which is currently unused. Also cleaned up some documentation lines git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1924 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-28 18:08:14 +00:00
depristo	186a8dd698	Trivial protection for null value git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1918 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 21:52:52 +00:00
depristo	726378be8b	Almost ready to stop doing eagar decoding; waiting on Eric git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1914 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-27 15:28:05 +00:00
aaron	3fb3773098	a fix for traverse dupplicates bug: GSA-202. Also removed some debugging output from FastaAltRef walker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1912 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-26 20:18:55 +00:00
hanna	a1e8a532ad	Support for initialize() and onTraversalDone() output from parallelized walkers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1911 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-26 20:18:31 +00:00
ebanks	75ad6bbef7	Check that map isn't being called passing in null arguments. (This seems wrong; see JIRA entry GSA-211) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1907 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-25 02:30:36 +00:00
hanna	65b98470f3	Temporary fix: have RodLocusView manage and close its RODs. Really the relationship between these two classes needs to be rethought; see JIRA GSA-207. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1904 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-23 16:00:12 +00:00
aaron	ad1fc511b1	intermediate commit for some changes in the Variation system, so Eric can go ahead with his changes. Everything is pretty set, but the Variation interface could use a convenience method that joins all the alternate alleles. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1903 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-23 06:31:15 +00:00
ebanks	6c338eccb8	Joint Estimation model now emits calls in all formats. The whole GenotypeCall framework needs to be changed, but this will work for the time being. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1902 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-23 03:07:28 +00:00
ebanks	54c61c663c	-Cleanup of the Joint Estimation code -Don't print verbose/debugging output to logger, but instead specify a file in the argument collection (and then we only need to print conditionally) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1899 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-22 15:25:29 +00:00
asivache	2cab4c68d4	Added method: isCodingExon(). Returns true if position is simultaneously within an exon AND within coding interval of any single transcript from the list. The old method of detecting coding positions as isExon() && isCoding() is buggy, as the position could be in the UTR part of one transcript (isExon() is true), and within coding region bounds (but not in the exon) of another transcript (isCoding() is true). As a result UTR positions would be erroneously annotated as coding. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1898 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-22 14:55:07 +00:00
ebanks	55fa1cfa06	-Renamed new calculation model and worked out some significant xhanges with Mark -Allow walkers calling the UG to pass in their own argument collections git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1896 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-21 20:49:36 +00:00
ebanks	9b9744109c	Mark's new unified calculation model is now officially implemented. Because it doesn't actually use EM, it's no longer a subclass of the EM model. Note that you can't use it just yet because it doesn't actually emit calls (just prints to logger). I need to deal with general UG output tomorrow. Hold off until then, Mark, and then you can go wild. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1891 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-21 02:39:23 +00:00
depristo	caa3187af8	Enabling correct high-performance ROD walker and moved VariantEval over to it. Performance improvements in variantEval in general. See wiki for full description git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1890 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 23:31:13 +00:00
depristo	449a6ba75a	Deleting lots of code as part of my cleanup. More classes tagged for removal. Many more walkers have their days numbered. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1885 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 12:23:36 +00:00
ebanks	b8ab77c91c	Don't filter out reads without proper read groups. Instead, allow the user (or another walker calling UG) to specify an assumed sample to use (but then we assume single-sample mode). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1883 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 01:30:53 +00:00
ebanks	c29924e7cf	Reverting previous change. Aaron, it's all yours... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1881 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 00:55:24 +00:00
aaron	d21b582b18	memory leak, where the Resource Pool was releasing based on the value and not the key, resulting in the resourceAssignments map growing with each additional shard git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1880 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 00:39:42 +00:00
ebanks	761a730758	assertBiAllelic -> assertMultiAllelic. Chris, if this breaks an integration test, you get it. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1879 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 00:09:46 +00:00
aaron	cfa86d52c2	ensure that in the indel case we don't allow identification as both an insertion and deletion at the same location in the VCF ROD git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1875 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-19 18:21:00 +00:00
ebanks	51f9ec0a5c	subtract largest posterior value from all values; this hopefully solves any precision issues git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1870 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-18 05:20:15 +00:00
ebanks	b9e8867287	-push allele frequency and genotype likelihood variable definitions down into the subclasses so that they can use different data structures -use slightly more stringent stability metric -better integration test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1869 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-18 04:22:17 +00:00
chartl	ad777a9c14	@BasicPileup - made the counts public so they can be used @PoolUtils - split reads by indel/simple base @BaseTransitionTable - complete refactoring, nicer now @UnifiedArgumentCollection - added PoolSize as an argument @UnifiedGenotyper - checks to ensure pooled sequencing uses the appropriate model @GenotypeCalculationModel - instantiates with the new PoolSize argument git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1867 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 21:56:56 +00:00
andrewk	d1a4cd2f73	Added ValidationData analysis type to VariantEvalWalker; this eval takes a GFF file with validated truth data positions (bound to "validation")and calculates the accuracy of the genotype calls bound to "eval". git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1862 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 15:39:08 +00:00
ebanks	418e007ca6	A cleaner interface: now everyone can use UG's initialize method git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1860 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 14:09:16 +00:00
aaron	96972c3a5c	a fix for a bug Eric found: if your first call contains fewer samples than calls at other loci, your VCFHeader got setup incorrectly. Also moved a buch of Lists over to Sets for consistancy. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1859 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 04:57:50 +00:00
aaron	a69ea9b57c	Cleaning up the VCF code, adding lots of tests for a variety of edge cases. Two issues are still outstanding: updating the no call string with the standard 1000g decided on today, and fixing Eric's issue where not all the VCF sample names are present initially. also: their, I hope your happy Eric, from now on I'll try not to flout my awesomest grammer in the future accept when I need to illicit a strong response :-) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1858 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 04:11:34 +00:00
ebanks	993c567bd8	I had to remove some of my more agressive optimizations, as they were causing us to get slightly different results as MSG. Results in only small cost to running time. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1856 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 00:59:32 +00:00
asivache	7d7ff09f54	throw an exception if read has no associated read group git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1855 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-15 18:11:32 +00:00
depristo	0c2016c19a	Improved error messages -- now easier to read, points to the GATK Error Messages wiki, and avoids double printing of stack traces git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1850 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-15 12:07:44 +00:00
ebanks	a32470cea1	Deal with the fact that walkers can call UG's init/map functions directly. We need to filter contexts in that case since the calling walkers don't get UG's traversal-level filters. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1848 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-15 02:31:45 +00:00
ebanks	e740e7a7ce	Because walkers call UG's map function, we need to move the actual writing out to UG's reduce function. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1845 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 20:49:26 +00:00
ebanks	52d2e0ca07	All walkers now use read.getReadGroup() git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1839 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 19:27:40 +00:00
aaron	eb90e5c4d7	changes to VCF output, and updated MD5's in the integration tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1836 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 18:42:48 +00:00
ebanks	89771fef05	-Use read.getReadGroup() -Add another filter for read groups for Chris git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1835 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 18:08:32 +00:00
ebanks	311ab8da5a	A helper class to create the masks for the sequenom design maker. This project is now officially done. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1834 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 17:28:51 +00:00
ebanks	0c95d6906f	Merge both versions of the Sequenom assay design maker: use Jared's base code and add in indels. [Jared, this still emits the same output for SNPs as your original version) Remove all sequenom stuff from the FastaAlternateReferenceMaker so it can just concentrate on making alternate references... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1831 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 17:11:45 +00:00
ebanks	f2886d88e0	We now emit genotype calls git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1828 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 02:49:56 +00:00
ebanks	96b8499a31	Remodeled version of the UnifiedGenotyper. We currently get identical lods and slods as MultiSampleCaller (except slods for ref calls, as I discussed with Jared) and are a bit faster in my few test cases. Single-sample mode still emulates SSG. The remaining to do items: 1. more testing still needed 2. we currently only output lods/slods, but I need to emit actual calls 3. stubs are in place for Mark's proposed version of the EM calculation and now I need to add the actual code. More check-ins coming soon... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1821 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-13 20:27:01 +00:00
aaron	77499e35ac	fixes for GSA-199: Need easier way to write binary outputs to standard output. GLF and VCF now have stream constructors, and can get dumped to standard out. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1818 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-13 15:50:20 +00:00
ebanks	caf689821f	added method to get normalized posteriors git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1809 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-12 02:33:22 +00:00
ebanks	cf7a26759d	-use the getReadGroup() function that was added to picard for us -clean up some include lines git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1808 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-12 01:39:32 +00:00
hanna	d844d1c496	SAMFileWriters specified as command-line arguments were sometimes incorrectly altering the default short name. Make sure short name is not specified if shortName is not specified but fullName is. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1807 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-09 19:16:46 +00:00
hanna	da084357db	Fixed minor typo in output message. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1806 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-09 18:56:54 +00:00
ebanks	a9f3d46fa8	Your time has come, SSG. Fare thee well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1799 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 20:27:56 +00:00
aaron	98e3a0bf1a	VCF can now be emitted from SSG. The basic's are there (the genotype, read depth, our error estimate), but more fields need to be added for each record as nessasary. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1797 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 19:50:04 +00:00
kiran	29ad6cd876	Made redundant by BCMMarkDupes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1795 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 18:47:20 +00:00

... 3 4 5 6 7 ...

1054 Commits (79c4cc1db7ac3da56db8e039a76c75b6836d61be)