gatk-3.8

Commit Graph

Author	SHA1	Message	Date
hanna	497bcbcbb7	Recent changes to the build system make the build system complain loudly about pieces of core that depend on playground. Most of these have been eliminated by (temporarily) promoting Aaron's report system to core in this checkin. I'll follow up with other changes in separately. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4350 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-24 22:09:12 +00:00
aaron	782e0018e4	removal of most of the old GATK ROD system; also a fix for -Dsingle so we can again run just a single unit or integration test (single tests in tribble can be run with the -DsingleTest option now). More to come. * Three integration tests had to change: * RecalibarationWalkersIntegrationTest: One of the tests was using the interval as the snp track, and wasn't supplying a DbSNP track (for CountCovariates) SequenomValidationConverterIntegrationTest: relies on Plink ROD which we've removed. PileupWalkerIntegrationTest: we no longer have implicit interval tracks, so there isn't a rod name over the specified region. Otherwise the same result. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4292 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-15 22:54:49 +00:00
depristo	7880863eb7	Final step in error refactoring. GATK exception is now ReviewedStingException, indicating that this exception is really what one wants. Only use this exception when you have thought about StingException vs. UserException and made a real decision. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4267 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 15:07:38 +00:00
depristo	7ad8fbdd5a	Moved GATKException to exceptions git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4266 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 14:47:19 +00:00
depristo	595907e98e	Moving StingException git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4262 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 14:34:15 +00:00
depristo	40e6179911	Penultimate step in exception system overhaul. UserError is now UserException. This class should be used for all communication with the USER for problems with their inputs. Engine now validates sequence dictionaries for compatibility, detecting not only lack of overlap but now inconsistent headers (b36 ref with v37 BAM, for example) as well as ref / bam order inconsistency. New -U option to allow users to tolerate dangerous seq dict issues. WalkerTest system now supports testing for exceptions (see email and wiki for docs). Tests for vcf and bam vs. ref incompatibility. Waiting on Tribble seq dict improvements to detect b36 VCF with b37 ref (currently cannot tell this is wrong. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4258 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-12 14:02:43 +00:00
depristo	8f1a32acae	All exceptions thrown by the GATK have been reviewed and UserErrors replaced where appropriate. Shazam. Another check-in will remove the GATKException and restore the StingException. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4252 348d0f76-0448-11de-a6fe-93d51630548a	2010-09-10 15:25:30 +00:00
kiran	fd19c63aaf	A data structure that allows data to be collected over the course of a walker's computation, then have that data written to a PrintStream such that it's human-readable, AWK-able, and R-friendly (given that you load it using the GATKReport loader module). This object designed to be both the structure that holds data during the execution of the walker, as well as the object that properly formats and emits the data so that it can be easily loaded into R. In the end, you get a table that looks like this: ##:GATKReport.v0.1 ErrorRatePerCycle : The error rate per sequenced position in the reads cycle errorrate.61PA8.7 qualavg.61PA8.7 0 0.007451835696110506 25.474613284804366 1 0.002362777171937477 29.844949954504095 2 9.087604507451836E-4 32.87590975254731 3 5.452562704471102E-4 34.498999090081895 4 9.087604507451836E-4 35.14831665150137 5 5.452562704471102E-4 36.07223435225619 6 5.452562704471102E-4 36.1217248908297 7 5.452562704471102E-4 36.1910480349345 8 5.452562704471102E-4 36.00345705967977 ... A GATKReport object can hold multiple tables, and the write() method will emit all tables in succession. Each table starts with its own ##:GATKReport.v0.1 table header, so each table can stand alone. This allows for tables to be mixed and matched in a single file, or for the output from different walkers to be combined into a single file with no ill effect. The display property of individual columns can be turned off. This is useful when a column is used to store intermediate results, necesary for the computation of some later value, but the contents of the intermediate column itself are not required in the final output file. Finally, the GATKReportTable allows for some simple, mathematical, element-wise and column-wise operations. For instance, two whole columns can be divided, the results of the operation being stored in a third column. This mimics the most basic of R operations, where whole vectors can be added, subtracted, multiplied or divided without requiring the developer to explicitly write a loop. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4159 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-29 05:39:24 +00:00
ebanks	bfcac33e80	Cleaning up playground utils and tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4136 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-27 01:25:47 +00:00
chartl	38e65f6e1b	Added: A VariantEval module that gives simple metrics by sample, an an abstract class that makes per-sample modules easy to write (but a little bit clunky since a class needs be defined for each data point -- see SimpleMetricsBySample as an example). AnalysisModuleScanner needed a slight update to pull in data points from parent classes for this to work (thanks Khalid for showing me how to do this). After a code review with Aaron (thanks) and ensuring integration tests pass, I am committing. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3939 348d0f76-0448-11de-a6fe-93d51630548a	2010-08-04 19:37:39 +00:00
aaron	3d049204ed	some refactoring for the variant eval output system git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3576 348d0f76-0448-11de-a6fe-93d51630548a	2010-06-17 05:34:31 +00:00
aaron	c8d09a29ed	some quick changes to the VE output system - more to come. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3253 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-23 21:55:08 +00:00
ebanks	e9e844fbf5	1. Reverting: dbsnp automatically is a comp 2. Fixing logic for min Qscore calculation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3230 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-21 18:51:35 +00:00
hanna	c1e53d407d	The copyright tag that I copied/pasted from a LaTeX document into IntelliJ had unicode quote characters embedded in it. These characters were invisible inside IntelliJ but cause compile warnings for Ryan and Aaron, who for whatever reason have a different default charset. Fixed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3203 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-20 15:26:32 +00:00
hanna	1bc26f69e9	An attempt to cleanup the Utils directory. Email to follow. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3198 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-19 23:00:08 +00:00
aaron	b54031fc86	adding an experimental format to VariantEval2, which when you source() from R, imports all VE2 output as individual tables with appropriate row and column names. More testing and feedback needed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3172 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-15 06:09:27 +00:00
aaron	9ca8e345fc	by-by old junk. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3131 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-07 20:41:48 +00:00
aaron	8fd59c8823	Modified the report system based on Ryan's feedback: tables are now created independently to avoid the permutation problem when they were all compressed in rows, and removed our dependency on FreeMarker. The Grep format stays the same. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3130 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-07 20:39:55 +00:00
aaron	585cc880a2	changed jexl expressions to jexl names in the VariantEval2 output, fixed integration test, and fixed a problem where a line was getting dropped in CSV output git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3108 348d0f76-0448-11de-a6fe-93d51630548a	2010-04-01 16:23:14 +00:00
aaron	a6e8687d71	implementing a clean way to import the template files into the GATK jar (they should not always get bundled). All further resources should be added to the gatk.resources path id in the build script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3094 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-30 04:20:19 +00:00
aaron	074ec77dcc	First go of the new output system for VE2. There are three different report types supported right now (Table, Grep, CSV), which can be specified with the reportType command line option in VE2. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3083 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-27 03:59:32 +00:00
aaron	60dfba997b	added some sample annotations to VariantEval2 analysis modules, and some changes to the report system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3067 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-24 05:40:10 +00:00
aaron	439c34ed38	clean-up before annotating VariantEval2 for output. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3055 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-22 07:39:20 +00:00
aaron	8a5f0b746e	some cleanup for the output system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3032 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-18 12:54:39 +00:00
aaron	10e76abbbc	adding some VE2 report infrastructure; work-in-progress. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3008 348d0f76-0448-11de-a6fe-93d51630548a	2010-03-16 03:57:42 +00:00
kshakir	3738b76320	Added a playground concordance analyzer for summarizing VariantEval across a group. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2867 348d0f76-0448-11de-a6fe-93d51630548a	2010-02-22 20:28:52 +00:00
ebanks	9da5cc25ad	More archiving (with permission from Andrey) plus a move to core. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2242 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-03 15:40:27 +00:00
ebanks	2c83f2f2bc	Move MSG - plus now obsolete classes which it depends on -- to oneoffprojects (with permission from Jared). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2224 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-02 20:04:22 +00:00
ebanks	084337087e	Removing deprecated code and walkers for which I had the green light from repository. Moved piecemealannotator and secondarybases to archive. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2195 348d0f76-0448-11de-a6fe-93d51630548a	2009-12-01 05:58:20 +00:00
kiran	2225d8176e	A convenience class for maintaining a dynamically growing table of values with access to the elements by named row and column identifiers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1988 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-08 16:34:35 +00:00
ebanks	3a33401822	2nd stage of the genotyper output refactoring is complete. Now, all output is generalized and all of the intelligence lies where it is supposed to. Next stage is syncing up old and new models and making sure we're outputting exactly what we should. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1960 348d0f76-0448-11de-a6fe-93d51630548a	2009-11-02 22:43:08 +00:00
ebanks	3091443dc7	Sweeping changes to the genotype output system, as per several discussions with Matt & Aaron. Some things still need to be changed, but it will entail some more design decisions first (which means I get to bug M&A again tomorrow!). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1930 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-29 03:46:41 +00:00
chartl	3b1fabeff0	Major code refactoring: @ Pooled utils & power - Removed two of the power walkers leaving only PowerBelowFrequency, added some additional flags on PowerBelowFrequency to give it some of the behavior that PowerAndCoverage had - Removed a number of PoolUtils variables and methods that were used in those walkers or simply not used - Removed AnalyzePowerWalker (un-necessary) - Changed the location of Quad/Squad/ReadOffsetQuad into poolseq @NQS - Deleted all walkers but the minimum NQS walker, refactored not to use LocalMapType @ BaseTransitionTable - Added a slew of new integration tests for different flaggable and integral parameters - (Scala) just a System.out that was added and commented out (no actual code change) - (Java) changed a < to <= and a boolean formula Chris git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1887 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-20 14:58:04 +00:00
chartl	ad777a9c14	@BasicPileup - made the counts public so they can be used @PoolUtils - split reads by indel/simple base @BaseTransitionTable - complete refactoring, nicer now @UnifiedArgumentCollection - added PoolSize as an argument @UnifiedGenotyper - checks to ensure pooled sequencing uses the appropriate model @GenotypeCalculationModel - instantiates with the new PoolSize argument git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1867 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 21:56:56 +00:00
ebanks	418e007ca6	A cleaner interface: now everyone can use UG's initialize method git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1860 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-16 14:09:16 +00:00
ebanks	a32470cea1	Deal with the fact that walkers can call UG's init/map functions directly. We need to filter contexts in that case since the calling walkers don't get UG's traversal-level filters. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1848 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-15 02:31:45 +00:00
ebanks	e740e7a7ce	Because walkers call UG's map function, we need to move the actual writing out to UG's reduce function. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1845 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-14 20:49:26 +00:00
chartl	ec68ae3bc5	Added a filter that will split the read set by a threshold of mapping quality (Request from Jason Flannick) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1812 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-12 20:58:37 +00:00
ebanks	a9f3d46fa8	Your time has come, SSG. Fare thee well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1799 348d0f76-0448-11de-a6fe-93d51630548a	2009-10-08 20:27:56 +00:00
depristo	392152f149	1000x performance improvements to MSG for crisis control git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1723 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-24 23:44:33 +00:00
asivache	3e289fcaa4	A little piece that PairMaker needs in order to compile ;) Iterates synchronously over two (name-ordered) single-end alignment SAM files with, possibly, multiple alignments per read and for each read name encountered returns pairs<all alignments for end1, all alignments for end2> git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1639 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 19:17:40 +00:00
chartl	b353bd6f81	Added a Quad toString() method. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1603 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 01:13:57 +00:00
chartl	2e237a12e9	This commit has a bunch to do with cleaning up the CoverageAndPowerWalker code: implementing some new printing options, but mostly altering the code so it's much more readable and understandable, and much less hacky-looking. ADDED: @Quad: This is just like Pair, except with four fields. In the original CoverageAndPowerWalker I often used a pair of pairs to hold things, which made the code nigh unreadable. @SQuad: An extension of Quad for when you want to store objects of the same type. Let's you simply declare new SQuad<X> rather than new Quad<X,X,X,X> @ReadOffsetQuad: An extension of Quad specifically for holding two lists of reads and two lists of offsets Supports construction from AlignmentContexts and conversion to AlignmentContexts (given a GenomeLoc). There are methods that make it very clear what the code is doing (getSecondRead() rather than the cryptic getThird() ) @PowerAndCoverageWalker: The new version of CoverageAndPowerWalker. If the tests all go well, then I'll remove the old version. New to this version is the ability to give an output file directly to the walker, so that locus information prints to the file, while the final reduce prints to standard out. Bootstrap iterations are now a command line argument rather than a final int; and users can instruct the walker to print out the coverage/power statistics for both the original reads, and those reads whose quality score exceeds a user-defined threshold. CHANGES: @PoolUtils: Altered methods to accept as argumetns, and return, Quad objects. Added a random partition method for bootstrapping. @CoverageAndPowerWalker: Altered methods to work with the new PoolUtils methods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1602 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 01:00:04 +00:00
asivache	d9f3e9493f	Does not return 0-length cigar elements anymore (used to do so when previous cigar element ended exactly at the segment boundary) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1570 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 20:05:55 +00:00
depristo	a08c68362e	Renaming error to getNegLog10PError(); added Cached clearing method to GL; SSG now has a CallResult that counts calls; No more Adding class to System.out, now to logger.info; First major testing piece (and general approach too) to unit testing of a walker -- SingleSampleGenotyper now knows how many calls to make on a particular 1mb region on NA12878 for each call type and counts the number of calls AND the compares the geli MD5 sum to the expected one! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1530 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 12:39:06 +00:00
aaron	3c2ae55859	changes for the genotype overhaul. Lots of changes focusing on the output side, from single sample genotyper to the output file formats like GLF and geli. Of note the genotype formats are still emitting posteriors as likelihoods; this is the way we've been doing it but it may change soon. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1529 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 05:31:15 +00:00
chartl	544900aa99	Migration of some core calculations (log-likelihood probabilties, etc.) from CoverageAndPowerWalker into static methods in PoolUtils git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1527 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 21:43:29 +00:00
asivache	499b3536a4	Changed to use AlignmentUtils.isReadUnmapped() for better consistency with SAM spec; also, it is now explicitly enforced that unmapped reads have <NO_...> values set for ref contig and start upon "remapping" git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1519 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 16:45:07 +00:00
chartl	5130ca9b94	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1516 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 15:17:02 +00:00
jmaguire	e2780c17af	Checkin of the Multi-Sample SNP caller. Doesn't work yet; same command I used to use now causes GATK to throw an exception. Will check with Matt & Aaron tomorrow, then do a regression test. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1509 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 00:23:28 +00:00

1 2 3

123 Commits (14e19f460568fd6b244e18f335933e9bb2ae909e)