gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	5c2799554a	Refactor updateReadStates into PerSampleReadStateManager, add tracking of downsampling rate	2013-01-14 16:30:16 -05:00
Mark DePristo	a4334a67e0	SamplePartitioner optimizations and bugfixes -- Use a linked hash map instead of a hash map since we want to iterate through the map fairly often -- Ensure that we call doneSubmittingReads before getting reads for samples. This function call fell out before and since it wasn't enforced I only noticed the problem while writing comments -- Don't make unnecessary calls to contains for map. Just use get() and check that the result is null -- Use a LinkedList in PassThroughDownsampler, since this is faster for add() than the existing ArrayList, and we were's using random access to any resulting	2013-01-14 16:30:16 -05:00
Mark DePristo	19288b007d	LIBS bugfix: kept reads now only (correctly) includes reads that at least passed the reservoir -- Added unit tests to ensure this behavior is correct	2013-01-14 16:30:16 -05:00
Mark DePristo	83fcc06e28	LIBS optimizations and performance tools -- Made LIBSPerformance a full featured CommandLineProgram, and it can be used to assess the LIBS performance by reading a provided BAM -- ReadStateManager now provides a clean interface to iterate in sample order the per-sample read states, allowing us to avoid many map.get calls -- Moved updateReadStates to ReadStateManager -- Removed the unnecessary wrapping of an iterator in ReadStateManager -- readStatesBySample is now a LinkedHashMap so that iteration occurs in LIBS sample order, allowing us to avoid many unnecessary calls to map.get iterating over samples. Now those are just map native iterations -- Restructured collectPendingReads for simplicity, removing redundant and consolidating common range checks. The new piece is code is much clearer and avoids several unnecessary function calls	2013-01-14 16:30:15 -05:00
Mark DePristo	ec05ecef60	getAdaptorBoundary returns an int, not an Integer, as this was taking 30% of the allocation effort for LIBS	2013-01-14 16:30:15 -05:00
Mark DePristo	3a6b4b43b7	Backporting LIBSPerformance improvements to original commit	2013-01-13 09:53:10 -05:00
Mark DePristo	f204908a94	Add some todos for future optimization to LIBS	2013-01-11 15:17:18 -05:00
Mark DePristo	e88dae2758	LocusIteratorByState operates natively on GATKSAMRecords now -- Updated code to reflect this new typing	2013-01-11 15:17:18 -05:00
Mark DePristo	94cb50d3d6	Retire LegacyLocusIteratorByState -- Left in the remaining infrastructure for David to remove, but the legacy downsampler is no longer a functional option in the GATK	2013-01-11 15:17:18 -05:00
Mark DePristo	cc0c1b752a	Delete old LocusIteratorByState, leaving only new LIBS and legacy	2013-01-11 15:17:18 -05:00
Mark DePristo	9e23c592e6	ReadBackedPileup cleanup -- Only ReadBackedPileupImpl (concrete class) and ReadBackedPileup (interface) live, moved all functionality of AbstractReadBackedPileup into the impl -- ReadBackedPileupImpl was literally a shell class after we removed extended events. A few bits of code cleanup and we reduced a bunch of class complexity in the gatk -- ReadBackedPileups no longer accept pre-cached values (size, nMapQ reads, etc) but now lazy load these values as needed -- Created optimized calculation routines to iterator over all of the reads in the pileup in whatever order is most efficient as well. -- New LIBS no longer calculates size, n mapq, and n deletion reads while making pileups. -- Added commons-collections for IteratorChain	2013-01-11 15:17:18 -05:00
Mark DePristo	e3e3ae29b2	Final documentation for LocusIteratorByState	2013-01-11 15:17:18 -05:00
Mark DePristo	6a91902aa2	Fix final merge conflicts	2013-01-11 15:17:18 -05:00
Mark DePristo	b9a33d3c66	Split original and optimized ART into largely independent pieces -- Allows us to cleanly run old and new art, which now have different traversal behavior (on purpose). Split unit tests as well.	2013-01-11 15:17:18 -05:00
Mark DePristo	02130dfde7	Cleanup ART -- Initialize routine captures essential information for running the traversal	2013-01-11 15:17:17 -05:00
Mark DePristo	9b2be795a7	Initial working version of new ActiveRegionTraversal based on the LocusIteratorByState read stream -- Implemented as a subclass of TraverseActiveRegions -- Passes all unit tests -- Will be very slow -- needs logical fixes	2013-01-11 15:17:17 -05:00
Mark DePristo	8b83f4d6c7	Near final cleanup of PileupElement -- All functions documented and unit tested -- New constructor interface -- Cleanup some uses of old / removed functionality	2013-01-11 15:17:17 -05:00
Mark DePristo	fb9eb3d4ee	PileupElement and LIBS cleanup -- function to create pileup elements in AlignmentStateMachine and LIBS -- Cleanup pileup element constructors, directing users to LIBS.createPileupFromRead() that really does the right thing	2013-01-11 15:17:17 -05:00
Mark DePristo	2f2a592c8e	Contracts and documentation for AlignmentStateMachine and LocusIteratorByState -- Add more unit tests for both as well	2013-01-11 15:17:17 -05:00
Mark DePristo	cc1d259cac	Implement get Length and Bases of OfImmediatelyFollowingIndel in PileupElement -- Added unit tests for this behavior. Updated users of this code	2013-01-11 15:17:17 -05:00
Mark DePristo	2c38310868	Create LIBS using new AlignmentStateMachine infrastructure -- Optimizations to AlignmentStateMachine -- Properly count deletions. Added unit test for counting routines -- AlignmentStateMachine.java is no longer recursive -- Traversals now use new LIBS, not the old one	2013-01-11 15:17:17 -05:00
Mark DePristo	80d9b7011c	Complete rewrite of low-level machinery of LIBS, not hooked up -- AlignmentStateMachine does what SAMRecordAlignmentState should really do. It's correct in that it's more accurate than the LIB_position tests themselves. This is a non-broken, correct implementation. Needs cleanup, contracts, etc. -- This version is like 6x slower than the original implementation (according to the google caliper benchmark here). Obvious optimizations for future commit	2013-01-11 15:17:16 -05:00
Mark DePristo	0ac4352614	LIBS can now (optionally) track the unique reads it uses from the underlying read iterator -- This capability is essential to provide an ordered set of used reads to downstream users of LIBS, such as ART, who want an efficient way to get the reads used in LIBS -- Vastly expanded the multi-read, multi-sample LIBS unit tests to make sure this capability is working -- Added createReadStream to ArtificialSAMUtils that makes it relatively easy to create multi-read, multi-sample read streams for testing	2013-01-11 15:17:16 -05:00
Mark DePristo	b3ecfbfce8	Refactor LIBS into component parts, expand unit tests, some code cleanup -- Split out all of the inner classes of LIBS into separate independent classes -- Split / add unit tests for many of these components. -- Radically expand unit tests for SAMRecordAlignmentState (the lowest level piece of code) making sure at least some of it works -- No need to change unit tests or integration tests. No change in functionality. -- Added (currently disabled) code to track all submitted reads to LIBS, but this isn't accessible or tested	2013-01-11 15:17:16 -05:00
Mark DePristo	b2990497e2	Refactor LIBS into utils.locusiterator before refactoring	2013-01-11 15:17:16 -05:00
Mauricio Carneiro	9ed922d562	Updating licenses to Eric's last commit - for now we're still running the script by hand, soon automated solution will be in place. GSATDG-5	2013-01-11 14:33:00 -05:00
Eric Banks	e7906713d9	Moving some random walkers back to public as requested by Mark. Mauricio will the licenses get updated automatically?	2013-01-11 02:03:43 -05:00
Ami Levy-Moonshine	352cb831d0	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-10 21:27:06 -05:00
Ami Levy-Moonshine	fac0bce916	add RunCoveredByNSamplesSites; changes in CoveredByNSamplesSites so it can work in parallel; also, move it to diagnostics	2013-01-10 21:26:49 -05:00
Mauricio Carneiro	2a4ccfe6fd	Updated all JAVA file licenses accordingly GSATDG-5	2013-01-10 17:06:41 -05:00
Ryan Poplin	487fb2afb4	Bug fix for the case of overlapping assembled and partially-assembled events created by the HC. Unfortunately the symbolic allele can't be combined with the indel allele because the reference basis will change.	2013-01-09 15:30:46 -05:00
Eric Banks	4fa439d89e	Move some classes back to public because they are used in the engine. Move some test classes to protected. We should have no more public->protected dependancies now	2013-01-09 11:06:10 -05:00
Eric Banks	676e79542a	Bring CombineVariants back to public since it's used for SG. I needed to break ChromosomeCountConstants out of ChromosomeCounts to make this work.	2013-01-09 10:39:48 -05:00
Ryan Poplin	c87ad8c0ef	Bug fixes related to HC's GGA mode. Tracking just the artificial allele isn't sufficient when there are multiple GGA records that change the reference basis. Also, duplicated records screw up the tracking of merged alleles.	2013-01-09 10:00:46 -05:00
Ami Levy-Moonshine	15ca5015cd	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-08 21:53:36 -05:00
Ami Levy-Moonshine	d6071728e8	add new walker to find sites with good coverage	2013-01-08 17:10:38 -05:00
Eric Banks	264cc9e78d	Resolve protected->public dependencies for BQSR by wrapping the BQSR-specific arguments in a new class. Instead of the GATK Engine creating a new BaseRecalibrator (not clean), it just keeps track of the arguments (clean). There are still some dependency issues, but it looks like they are related to Ami's code. Need to look into it further.	2013-01-08 16:23:29 -05:00
Eric Banks	f0bd1b5ae5	Okay, all public->protected dependencies are gone except for the BQSR arguments. I'll need to think through this but should be able to make that work too.	2013-01-08 15:46:32 -05:00
Eric Banks	245fcc8bb5	Merged bug fix from Stable into Unstable	2013-01-08 12:59:15 -05:00
Eric Banks	d6146d369a	Remove all of the references to ProgramElementDoc	2013-01-08 12:58:31 -05:00
Eric Banks	47d030a52d	Oops, move the covariates over too	2013-01-07 15:47:25 -05:00
Eric Banks	35699a8376	Move bqsr utils to protected	2013-01-07 15:41:21 -05:00
Eric Banks	5371613ad1	Tests seem to pass (can't be positive though because I ran before Tad's recent push), so I'm going to push now (this push touches so many files that I don't want to keep it around much longer). Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-07 15:27:43 -05:00
Ami Levy-Moonshine	d4b4f95e12	move CatVariants to public	2013-01-07 15:07:16 -05:00
Eric Banks	a0219acfaa	Collapse the PerReadAlleleLikelihoodMap classes into 1 now that Lite is gone	2013-01-07 14:55:21 -05:00
Eric Banks	35d9bd377c	Moved (nearly) all Walkers from public to protected and removed GATKLite utils	2013-01-07 14:42:40 -05:00
Eric Banks	b4e7b3d691	Fixed precision problem in the Bayesian calculation of Qemp: we need to cap below max integer because the MathUtils code add +1. Added unit tests for handling large number of observations.	2013-01-07 13:07:36 -05:00
Tad Jordan	04e3978b04	Fixed VariantEval tests -Added sorting by rows to VariantEval	2013-01-07 12:45:32 -05:00
Ryan Poplin	4f95f850b3	Bug fix in the HC's allele mapping for multi-allelic events. Using the allele alone as a key isn't sufficient because alleles change when the reference allele changes during VariantContextUtils.simpleMerge for multi-allelic events.	2013-01-07 11:05:44 -05:00
Ami Levy-Moonshine	d3c2c97fb2	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-06 23:35:47 -05:00
Ami Levy-Moonshine	c554d9db25	add TODO	2013-01-06 23:04:38 -05:00
Ami Levy-Moonshine	81eef3aa37	merge development branchs of log-less HMM and FastGatherer to master	2013-01-06 23:01:58 -05:00
Eric Banks	0249e1f497	Resolving merge conflicts from VCF move	2013-01-06 14:32:31 -05:00
Eric Banks	8822b8e7c8	Moving HelpConstants out of HelpUtils so that we stop getting these ProgramElementDoc errors when com.sun.javadoc cannot load on a user's system.	2013-01-06 14:30:45 -05:00
Eric Banks	ea21dc9cfb	I just committed this - why didn't it work before? Trying again...	2013-01-06 12:44:13 -05:00
Eric Banks	52067f0549	Handle merge conflicts	2013-01-06 12:29:12 -05:00
Eric Banks	bf25e151ff	Handle long->int precision in Bayesian estimate	2013-01-06 12:26:32 -05:00
Eric Banks	b73d72fe94	update docs for LEftAlignVariants	2013-01-06 01:56:57 -05:00
Mark DePristo	2ab55e4ee7	Fixing bug in TraverseDuplicates.printProgress call: only passes in single location of genome loc	2013-01-05 12:50:27 -05:00
Mark DePristo	69bf70c42e	Cleanup and more unit tests for RecalibrationTables in BQSR -- Added unit tests for combining RecalibrationTables. As a side effect now has serious tests for incrementDatumOrPutIfNecessary -- Removed unnecessary enum.index system from RecalibrationTables. -- Moved what were really static utility methods out of RecalibrationEngine and into RecalUtils.	2013-01-05 12:50:27 -05:00
Chris Hartl	9df30880cb	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-04 17:15:22 -05:00
Joel Thibault	01738e70c3	Archive the experimental Active Region Traversals	2013-01-04 17:05:31 -05:00
Chris Hartl	7b7efa0fff	Add in the AAL as an experimental covariate, in case it's wanted.	2013-01-04 16:47:26 -05:00
Chris Hartl	41bc416b65	Remove AAL and update MD5s.	2013-01-04 16:46:14 -05:00
Eric Banks	bce6fce58d	Resolving merge conflicts after Mark's latest push	2013-01-04 14:46:39 -05:00
Eric Banks	dd7f5e2be7	Hooking up the Bayesian estimate code for calculating Qemp in BQSR; various fixes after adding unit tests.	2013-01-04 14:43:11 -05:00
Ami Levy-Moonshine	80b531f695	emit all sites where more than 90% of the samples have good coverage	2013-01-04 14:27:50 -05:00
Tad Jordan	fe06912a87	Removed sorting by row from walkers	2013-01-04 11:52:33 -05:00
Mark DePristo	810e2da1d4	Cleanup and unit tests for EventType and ReadRecalibrationInfo in BQSR -- Added unit tests for EventType and ReadRecalibrationInfo -- Simplified interface of EventType. Previously this enum carried an index with it, but this is redundant with the enum.ordinal function. Now just using that function instead.	2013-01-04 11:39:25 -05:00
Mark DePristo	a5901cdd20	Bugfix for printProgress in TraverseReadsNano -- Must provide a single bp position (1:10) not the range of the read (1:1-50). ProgressMeter now checks at runtime for this problem as well.	2013-01-04 11:39:24 -05:00
Mark DePristo	bbdf9ee91b	BQSR cleanup: merge Advanced and Standard recalibration engine into just the RecalibrationEngine -- As we are no longer maintaining a public/protected system we need only have one RecalibrationEngine. -- Misc. code cleanup and docs along the way	2013-01-04 11:39:24 -05:00
Mark DePristo	7df47418d8	BQSR optimization: make RecalibrationTables thread-local, and merge results in onTraversalDone -- With the newer, faster BQSR, scaling was limited by the NestedIntegerArray. The solution to this is to make the entire table thread-local, so that each nct thread has its own data and doesn't have any collisions. -- Removed the previous partial solution of having a thread-local quality score table -- Added a new argument -lowMemory	2013-01-04 11:39:24 -05:00
Mark DePristo	1ba8d47a81	Unit tests for ProgressMeterDaemon	2013-01-04 11:39:24 -05:00
Joel Thibault	319d651e4a	Initial updates for ActiveRegionShard	2013-01-03 17:00:13 -05:00
Joel Thibault	e7553545ef	Initial updates for ReadShard	2013-01-03 17:00:13 -05:00
Joel Thibault	14a3ac0e3c	Enable the use of alternate shards	2013-01-03 17:00:13 -05:00
Joel Thibault	4cc372f53b	LocusShardDataProvider doesn't need its own GenomeLocParser	2013-01-03 17:00:13 -05:00
Joel Thibault	ffbd4d85f2	No need to pass fields as parameters	2013-01-03 17:00:12 -05:00
Tad Jordan	c1ba12d71a	Added unit test for outputting sorted GATKReport Tables - Made few small modifications to code - Replaced the two arguments in GATKReportTable constructor with an enum used to specify way of sorting the table	2013-01-03 16:53:59 -05:00
Ami Levy-Moonshine	10a705b27f	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-03 13:42:31 -05:00
Ami Levy-Moonshine	2018285a39	better error message	2013-01-03 13:41:03 -05:00
Eric Banks	c7039a9b71	Pushing in implementation of the Bayesian estimate of Qemp for the BQSR. This isn't hooked up yet with BQSR; it's just a static method used in my testing walker. I'll hook this into BQSR after more testing and the addition of unit tests. Most of the changes in this commit are actually documentation-related.	2013-01-02 15:21:44 -05:00
Joel Thibault	c515175313	Ensure that active region extensions stay on contig	2013-01-02 14:46:24 -05:00
Chris Hartl	e1d09ab0db	QD is now divided by the average length of the alternate allele (weighted by the allele count). The average length is stored in a related annotation, "AAL", which can be used to re-compute the "old" QD by simple multiplication. Integration tests should all pass.	2013-01-02 14:41:29 -05:00
Mark DePristo	12f4c6307e	AutoFormattingTime cleanup and complete unittests -- Underlying system now uses long nano times to be more consistent with standard java practice -- Updated a few places in the code that were converting from nanoseconds to double seconds to use the new nanoseconds interface directly -- Bringing us to 100% test coverage with clover with AutoFormattingTimeUnitTest	2013-01-02 11:29:25 -05:00
Mark DePristo	5558a6b8f7	Deleting / archiving no longer classes -- AminoAcidTable and AminoAcid goes to the archive -- Removing two unused SAMRecord classes	2012-12-29 14:34:17 -05:00
Mark DePristo	38cc496de8	Move SomaticIndelDetector and associated tools and libraries into private/andrey package -- Intermediate commit on the way to archiving SomaticIndelDetector and other tools. -- SomaticIndelDetector, PairMaker and RemapAlignments tools have been refactored into the private andrey package. All utility classes refactored into here as well. At this point, the SomaticIndelDetector builds in this version of the GATK. -- Subsequent commit will put this code into the archive so it no longer builds in the GATK	2012-12-29 14:34:08 -05:00
Ami Levy-Moonshine	f450cbc1a3	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-27 21:23:59 -05:00
Eric Banks	75d5b88a3d	Enabling the Recal Report unit test (which looks like it was never ever enabled)	2012-12-26 15:35:50 -05:00
Eric Banks	efceb0d48c	Check for well-encoded reads while fixing mis-encoded ones	2012-12-26 14:30:51 -05:00
Mark DePristo	af9746af52	Fix merge failure	2012-12-24 13:43:04 -05:00
Mark DePristo	04cc75aaec	Minor cleanup and expansion of the RecalDatum unit tests	2012-12-24 13:35:58 -05:00
Mark DePristo	7bf1f67273	BQSR optimization: read group x quality score calibration table is thread-local -- AdvancedRecalibrationEngine now uses a thread-local table for the quality score table, and in finalizeData merges these thread-local tables into the final table. Radically reduces the contention for RecalDatum in this very highly used table -- Refactored the utility function to combine two tables into RecalUtils, and created UnitTests for this function, as well as all of RecalibrationTables. Updated combine in RecalibrationReport to use this table combiner function -- Made several core functions in RecalDatum into final methods for performance -- Added RecalibrationTestUtils, a home for recalibration testing utilities	2012-12-24 13:35:58 -05:00
Mark DePristo	295455eee2	NanoScheduler optimizations and simplification -- The previous model was to enqueue individual map jobs (with a resolution of 1 map job per map call), to track the number of map calls submitted via a counter and a semaphore, and to use this information in each map job and reduce to control the number of map jobs, when reduce was complete, etc. All hideously complex. -- This new model is vastly simply. The reducer basically knows nothing about the control mechanisms in the NanoScheduler. It just supports multi-threaded reduce. The NanoScheduler enqueues exactly nThread jobs to be run, which continually loop reading, mapping, and reducing until they run out of material to read, when they shut down. The master thread of the NS just holds a CountDownLatch, initialized to nThreads, and when each thread exits it reduces the latch by 1. The master thread gets the final reduce result when its free by the latch reaching 0. It's all super super simple. -- Because this model uses vastly fewer synchronization primitives within the NS itself, it's naturally much faster at getting things done, without any of the overhead obvious in profiles of BQSR -nct 2.	2012-12-24 13:35:57 -05:00
Mark DePristo	aa3ee29929	Handle case where the ReadGroup is null in GATKSAMRecord	2012-12-24 13:35:57 -05:00
Mark DePristo	bf81db40f7	NanoScheduler reducer optimizations -- reduceAsMuchAsPossible no longer blocks threads via synchronization, but instead uses an explicit lock to manage access. If the lock is already held (because some thread is doing reduce) then the thread attempting to reduce immediately exits the call and continues doing productive work. They removes one major source of blocking contention in the NanoScheduler	2012-12-24 13:35:57 -05:00
Mark DePristo	940816f16a	GATKSamRecord now checks that the read group is a GATKReadGroupRecord, and if not makes one	2012-12-24 13:35:57 -05:00
Mark DePristo	14944b5d73	Incorporating clover into build.xml -- See http://gatkforums.broadinstitute.org/discussion/2002/clover-coverage-analysis-with-ant for use docs -- Fix for artificial reads not having proper read groups, causing NPE in some tests -- Added clover itself to private/resources	2012-12-24 13:35:57 -05:00
Mark DePristo	7796ba7601	Minor optimizations for NanoScheduler -- Reducer.maybeReleaseLatch is no longer synchronized -- NanoScheduler only prints progress every 100 or so map calls	2012-12-24 13:35:56 -05:00
Mark DePristo	0f04485c24	NanoScheduler optimization: don't use a PriorityBlockingQueue for the MapResultsQueue -- Created a separate, limited interface MapResultsQueue object that previously was set to the PriorityBlockingQueue. -- The MapResultsQueue is now backed by a synchronized ExpandingArrayList, since job ids are integers incrementing from 0 to N. This means we avoid the n log n sort in the priority queue which was generating a lot of cost in the reduce step -- Had to update ReducerUnitTest because the test itself was brittle, and broken when I changed the underlying code. -- A few bits of minor code cleanup through the system (removing unused constructors, local variables, etc) -- ExpandingArrayList called ensureCapacity so that we increase the size of the arraylist once to accommodate the upcoming size needs	2012-12-24 13:35:56 -05:00

1 2 3 4 5 ...

2909 Commits (705cccaf63c57e42fb3fe7450ba998d85ef324f7)