gatk-3.8

Commit Graph

Author	SHA1	Message	Date
David Roazen	3744d1a596	Collapse the downsampling fork in the GATK engine With LegacyLocusIteratorByState deleted, the legacy downsampling implementation was already non-functional. This commit removes all remaining code in the engine belonging to the legacy implementation.	2013-01-28 01:50:30 -05:00
Mark DePristo	63913d516f	Add join call to Progress meter unit test so we actually know the daemon thread has finished	2013-01-27 16:52:45 -05:00
Mark DePristo	c97a361b5d	Added realistic BandPassFilterUnitTest that ensures quality results for 1000G phase I VCF and NA12878 VCF -- Helped ID more bugs in the ActivityProfile, necessitating a new algorithm for popping off active regions. This new algorithm requires that at least maxRegionSize + prob. propagation distance states have been examined. This ensures that the incremental results are the same as you get reading in an entire profile and running getRegions on the full profile -- TODO is to remove incremental search start algorithm, as this is no longer necessary, and nicely eliminates a state variable I was always uncomfortable with	2013-01-27 14:10:08 -05:00
Mauricio Carneiro	705cccaf63	Making SplitReads output FastQ's instead of BAM - eliminates one step in my pipeline - BAM is too finicky and maintaining parameters that wouldn't be useful was becoming a headache, better avoided.	2013-01-27 02:36:31 -05:00
Mauricio Carneiro	6ea7133d95	Updating licenses of latest moved files	2013-01-26 13:46:52 -05:00
Ami Levy-Moonshine	99cb8d68e9	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-25 16:07:38 -05:00
Mark DePristo	b8c0b05785	Add contract to ensure that getAdapterBoundary returns the right result -- Also renamed the function to getAdaptorBoundary for consistency across the codebase	2013-01-25 16:05:17 -05:00
Mark DePristo	e445c71161	LIBS optimization for adapter clipping -- GATKSAMRecords now cache the result of the getAdapterBoundary, allowing us to avoid repeating a lot of work in LIBS -- Added unittests to cover adapter clipping	2013-01-25 16:05:17 -05:00
Ami Levy-Moonshine	b4447cdca2	In cases where one uses VariantContextUtils.GenotypeMergeType.REQUIRE_UNIQUE we used to verify that the samples names are unique in VariantContextUtils.simpleMerge for each VCs. It couse to a bug that was reported on the forum (when a VCs had 2 VC from the same sample). Now we will check it only in CombineVariants.init using the headers. A new function was added to SamplesUtils with unitTests in CVunitTest.java.	2013-01-25 15:49:51 -05:00
Mark DePristo	592f90aaef	ActivityProfile now cuts intelligently at the best local minimum when in a larger than max size active region -- This new algorithm is essential to properly handle activity profiles that have many large active regions generated from lots of dense variant events. The new algorithm passes unit tests and passes visualize visual inspection of both running on 1000G and NA12878 -- Misc. commenting of the code -- Updated ActiveRegionExtension to include a min active region size -- Renamed ActiveRegionExtension to ActiveRegionTraversalParameters, as it carries more than just the traversal extension now	2013-01-24 13:48:00 -05:00
Mark DePristo	0c94e3d96e	Adaptively compute the band pass filter from the sigma, up to a maximum size of 50 bp -- Previously we allowed band pass filter size to be specified along with the sigma. But now that sigma is controllable from walkers and from the command line, we instead compute the filter size given the kernel from the sigma, including all kernel points with p > 1e-5 in the kernel. This means that if you use a smaller kernel you get a small band size and therefore faster ART -- Update, as discussed with Ryan, the sigma and band size to 17 bp for HC (default ART wide) and max band size of 50 bp	2013-01-24 13:47:59 -05:00
Mark DePristo	9e43a2028d	Making band pass filter size, sigma, active region max size and extension all accessible from the command line	2013-01-24 13:47:59 -05:00
Mark DePristo	ee8039bf25	Fix trivial call in unit test	2013-01-23 13:51:58 -05:00
Mark DePristo	8e8126506b	Renaming IncrementalActivityProfile to ActivityProfile -- Also adding a work in progress functionality to make it easy to visualize activity profiles and active regions in IGV	2013-01-23 13:46:01 -05:00
Mark DePristo	e917f56df8	Remove old ActivityProfile and old BandPassActivityProfile	2013-01-23 13:46:01 -05:00
Mark DePristo	7fd27a5167	Add band pass filtering activity profile -- Based on the new incremental activity profile -- Unit Tested! Fixed a few bugs with the old band pass filter -- Expand IncrementalActivityProfileUnitTest to test the band pass filter as well for basic properties -- Add new UnitTest for BandPassIncrementalActivityProfile -- Added normalizeFromRealSpace to MathUtils -- Cleanup unused code in new activity profiles	2013-01-23 13:46:01 -05:00
Mark DePristo	eb60235dcd	Working version of incremental active region traversals -- The incremental version now processes active regions as soon as they are ready to be processed, instead of waiting until the end of the shard as in the previous version. This means that ART walkers will now take much less memory than previously. On chr20 of NA12878 the majority of regions are processed with as few as 500 reads in memory. Over the whole chr20 only 5K reads were ever held in ART at one time. -- Fixed bug in the way active regions worked with shard boundaries. The new implementation no longer see shard boundaries in any meaningful way, and that uncovered a problem that active regions were always being closed across shard boundaries. This behavior was actually encoded in the unit tests, so those needed to be updated as well. -- Changed the way that preset regions work in ART. The new contract ensures that you get exactly the regions you requested. the isActive function is still called, but its result has no impact on the regions. With this functionality is should be possible to use the HC as a generic assembly by forcing it to operate over very large regions -- Added a few misc. useful functions to IncrementalActivityProfile	2013-01-23 13:46:00 -05:00
Mark DePristo	e050f649fd	IncrementalActivityProfile, complete with extensive unit tests -- This is an activity profile compatible with fetching its implied active regions incrementally, as activity profile states are added	2013-01-23 13:45:21 -05:00
Mark DePristo	8d9b0f1bd5	Restructure ActivityProfiler into root class ActivityProfile and derived class BandPassActivityProfile -- Required before I jump in an redo the entire activity profile so it's can be run imcrementally -- This restructuring makes the differences between the two functionalities clearer, as almost all of the functionality is in the base class. The only functionality provided by the BandPassActivityProfile is isolated to a finalizeProfile function overloaded from the base class. -- Renamed ActivityProfileResult to ActivityProfileState, as this is a clearer indication of its actual functionality. Almost all of the misc. walker changes are due to this name update -- Code cleanup and docs for TraverseActiveRegions -- Expanded unit tests for ActivityProfile and ActivityProfileState	2013-01-23 13:45:21 -05:00
Mark DePristo	42b807a5fe	Unit tests for ActivityProfileResult	2013-01-23 13:45:20 -05:00
Mauricio Carneiro	7b8b064165	Last manual license update (hopefully) if everyone updates their git hook accordingly, this will be the last time I have to manually run the script. GSATDG-5	2013-01-18 16:13:07 -05:00
Mark DePristo	738c24a3b1	Add tests to ensure that all insertion reads appear in the active region traversal	2013-01-16 16:25:36 -05:00
Mark DePristo	2a42b47e4a	Massive expansion of ActiveRegionTraversal unit tests, resulting in several bugfixes to ART -- UnitTests now include combinational tiling of reads within and spanning shard boundaries -- ART now properly handles shard transitions, and does so efficiently without requiring hash sets or other collections of reads -- Updating HC and CountReadsInActiveRegions integration tests	2013-01-16 15:30:00 -05:00
Mark DePristo	4d0e7b50ec	ArtificialBAMBuilder utility class for creating streams of GATKSAMRecords with a variety of properties -- Allows us to make a stream of reads or an index BAM file with read having the following properties (coming from n samples, of fixed read length and aligned to the genome with M operator, having N reads per alignment start, skipping N bases between each alignment start, starting at a given alignment start) -- This stream can be handed back to the caller immediately, or written to an indexed BAM file -- Update LocusIteratorByStateUnitTest to use this functionality (which was refactored from LIBS unit tests and ArtificialSAMUtils)	2013-01-16 15:29:59 -05:00
Eric Banks	ec1cfe6732	Oops, forgot to add 1 of my files	2013-01-16 15:05:49 -05:00
Eric Banks	e47a389b26	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-16 14:59:11 -05:00
Eric Banks	d18dbcbac1	Added tests for changing IUPAC bases to Ns, for failing on bad ref bases, and for the HaplotypeCaller not failing when running over a region with an IUPAC base. Out of curiosity, why does Picard's IndexedFastaSequenceFile allow one to query for start position 0? When doing so, that base is a line feed (-1 offset to the first base in the contig) which is an illegal base (and which caused me no end of trouble)...	2013-01-16 14:55:33 -05:00
Khalid Shakir	4ffb43079f	Re-committing the following changes from Dec 18: Refactored interval specific arguments out of GATKArgumentCollection into InvtervalArgumentCollection such that it can be used in other CommandLinePrograms. Updated SelectHeaders to print out full interval arguments. Added RemoteFile.createUrl(Date expiration) to enable creation of presigned URLs for download over http: or file:.	2013-01-16 12:43:15 -05:00
Eric Banks	392b5cbcdf	The CachingIndexedFastaSequenceFile now automatically converts IUPAC bases to Ns and errors out on other non-standard bases. This way walkers won't see anything except the standard bases plus Ns in the reference. Added option to turn off this feature (to maintain backwards compatibility). As part of this commit I cleaned up the BaseUtils code by adding a Base enum and removing all of the static indexes for each of the bases. This uncovered a bug in the way the DepthOfCoverage walker counts deletions (it was counting Ns instead!) that isn't covered by tests. Fortunately that walker is being deprecated soon...	2013-01-16 10:22:43 -05:00
Mark DePristo	3c37ea014b	Retire original TraverseActiveRegion, leaving only the new optimized version -- Required some updates to MD5s, which was unexpected, and will be sorted out later with more detailed unit tests	2013-01-15 10:24:45 -05:00
Mark DePristo	39bc9e999d	Add a test to LocusIteratorByState to ensure that we aren't holding reads anywhere -- Run an iterator with 100Ks of reads, each carrying MBs of byte[] data, through LIBS, all starting at the same position. Will crash with an out-of-memory error if we're holding reads anywhere in the system. -- Is there a better way to test this behavior?	2013-01-14 16:30:16 -05:00
Mark DePristo	5a5422e4f8	Refactor PerSampleReadStates into a separate class -- No longer update the total counts in each per-sample state manager, but instead return delta counts that are updated by the overall ReadStateManager -- One step on the way to improving the underlying representation of the data in PerSampleReadStateManager -- Make LocusIteratorByState final	2013-01-14 16:30:16 -05:00
Mark DePristo	19288b007d	LIBS bugfix: kept reads now only (correctly) includes reads that at least passed the reservoir -- Added unit tests to ensure this behavior is correct	2013-01-14 16:30:16 -05:00
Mark DePristo	83fcc06e28	LIBS optimizations and performance tools -- Made LIBSPerformance a full featured CommandLineProgram, and it can be used to assess the LIBS performance by reading a provided BAM -- ReadStateManager now provides a clean interface to iterate in sample order the per-sample read states, allowing us to avoid many map.get calls -- Moved updateReadStates to ReadStateManager -- Removed the unnecessary wrapping of an iterator in ReadStateManager -- readStatesBySample is now a LinkedHashMap so that iteration occurs in LIBS sample order, allowing us to avoid many unnecessary calls to map.get iterating over samples. Now those are just map native iterations -- Restructured collectPendingReads for simplicity, removing redundant and consolidating common range checks. The new piece is code is much clearer and avoids several unnecessary function calls	2013-01-14 16:30:15 -05:00
Mark DePristo	ec05ecef60	getAdaptorBoundary returns an int, not an Integer, as this was taking 30% of the allocation effort for LIBS	2013-01-14 16:30:15 -05:00
Mark DePristo	e88dae2758	LocusIteratorByState operates natively on GATKSAMRecords now -- Updated code to reflect this new typing	2013-01-11 15:17:18 -05:00
Mark DePristo	94cb50d3d6	Retire LegacyLocusIteratorByState -- Left in the remaining infrastructure for David to remove, but the legacy downsampler is no longer a functional option in the GATK	2013-01-11 15:17:18 -05:00
Mark DePristo	cc0c1b752a	Delete old LocusIteratorByState, leaving only new LIBS and legacy	2013-01-11 15:17:18 -05:00
Mark DePristo	bd03511e35	Updating AlignmentStateMachinePerformance to include some more useful performance assessments	2013-01-11 15:17:18 -05:00
Mark DePristo	9e23c592e6	ReadBackedPileup cleanup -- Only ReadBackedPileupImpl (concrete class) and ReadBackedPileup (interface) live, moved all functionality of AbstractReadBackedPileup into the impl -- ReadBackedPileupImpl was literally a shell class after we removed extended events. A few bits of code cleanup and we reduced a bunch of class complexity in the gatk -- ReadBackedPileups no longer accept pre-cached values (size, nMapQ reads, etc) but now lazy load these values as needed -- Created optimized calculation routines to iterator over all of the reads in the pileup in whatever order is most efficient as well. -- New LIBS no longer calculates size, n mapq, and n deletion reads while making pileups. -- Added commons-collections for IteratorChain	2013-01-11 15:17:18 -05:00
Mark DePristo	b9a33d3c66	Split original and optimized ART into largely independent pieces -- Allows us to cleanly run old and new art, which now have different traversal behavior (on purpose). Split unit tests as well.	2013-01-11 15:17:18 -05:00
Mark DePristo	02130dfde7	Cleanup ART -- Initialize routine captures essential information for running the traversal	2013-01-11 15:17:17 -05:00
Mark DePristo	9b2be795a7	Initial working version of new ActiveRegionTraversal based on the LocusIteratorByState read stream -- Implemented as a subclass of TraverseActiveRegions -- Passes all unit tests -- Will be very slow -- needs logical fixes	2013-01-11 15:17:17 -05:00
Mark DePristo	8b83f4d6c7	Near final cleanup of PileupElement -- All functions documented and unit tested -- New constructor interface -- Cleanup some uses of old / removed functionality	2013-01-11 15:17:17 -05:00
Mark DePristo	fb9eb3d4ee	PileupElement and LIBS cleanup -- function to create pileup elements in AlignmentStateMachine and LIBS -- Cleanup pileup element constructors, directing users to LIBS.createPileupFromRead() that really does the right thing	2013-01-11 15:17:17 -05:00
Mark DePristo	2f2a592c8e	Contracts and documentation for AlignmentStateMachine and LocusIteratorByState -- Add more unit tests for both as well	2013-01-11 15:17:17 -05:00
Mark DePristo	cc1d259cac	Implement get Length and Bases of OfImmediatelyFollowingIndel in PileupElement -- Added unit tests for this behavior. Updated users of this code	2013-01-11 15:17:17 -05:00
Mark DePristo	2c38310868	Create LIBS using new AlignmentStateMachine infrastructure -- Optimizations to AlignmentStateMachine -- Properly count deletions. Added unit test for counting routines -- AlignmentStateMachine.java is no longer recursive -- Traversals now use new LIBS, not the old one	2013-01-11 15:17:17 -05:00
Mark DePristo	80d9b7011c	Complete rewrite of low-level machinery of LIBS, not hooked up -- AlignmentStateMachine does what SAMRecordAlignmentState should really do. It's correct in that it's more accurate than the LIB_position tests themselves. This is a non-broken, correct implementation. Needs cleanup, contracts, etc. -- This version is like 6x slower than the original implementation (according to the google caliper benchmark here). Obvious optimizations for future commit	2013-01-11 15:17:16 -05:00
Mark DePristo	0ac4352614	LIBS can now (optionally) track the unique reads it uses from the underlying read iterator -- This capability is essential to provide an ordered set of used reads to downstream users of LIBS, such as ART, who want an efficient way to get the reads used in LIBS -- Vastly expanded the multi-read, multi-sample LIBS unit tests to make sure this capability is working -- Added createReadStream to ArtificialSAMUtils that makes it relatively easy to create multi-read, multi-sample read streams for testing	2013-01-11 15:17:16 -05:00
Mark DePristo	b3ecfbfce8	Refactor LIBS into component parts, expand unit tests, some code cleanup -- Split out all of the inner classes of LIBS into separate independent classes -- Split / add unit tests for many of these components. -- Radically expand unit tests for SAMRecordAlignmentState (the lowest level piece of code) making sure at least some of it works -- No need to change unit tests or integration tests. No change in functionality. -- Added (currently disabled) code to track all submitted reads to LIBS, but this isn't accessible or tested	2013-01-11 15:17:16 -05:00
Mark DePristo	2e5d38fd0e	Updating to latest google caliper code	2013-01-11 15:17:16 -05:00
Mark DePristo	b2990497e2	Refactor LIBS into utils.locusiterator before refactoring	2013-01-11 15:17:16 -05:00
Mauricio Carneiro	2a4ccfe6fd	Updated all JAVA file licenses accordingly GSATDG-5	2013-01-10 17:06:41 -05:00
Eric Banks	4fa439d89e	Move some classes back to public because they are used in the engine. Move some test classes to protected. We should have no more public->protected dependancies now	2013-01-09 11:06:10 -05:00
Eric Banks	b099e2b4ae	Moving integration tests to protected	2013-01-08 09:34:08 -05:00
Eric Banks	35d9bd377c	Moved (nearly) all Walkers from public to protected and removed GATKLite utils	2013-01-07 14:42:40 -05:00
Eric Banks	b4e7b3d691	Fixed precision problem in the Bayesian calculation of Qemp: we need to cap below max integer because the MathUtils code add +1. Added unit tests for handling large number of observations.	2013-01-07 13:07:36 -05:00
Eric Banks	ef638489d5	Fixing BQSR gatherer test to keep up to date with latest changes	2013-01-06 14:07:59 -05:00
Eric Banks	52067f0549	Handle merge conflicts	2013-01-06 12:29:12 -05:00
Eric Banks	bf25e151ff	Handle long->int precision in Bayesian estimate	2013-01-06 12:26:32 -05:00
Mark DePristo	b403c269e9	Make multi-threaded progress meter daemon unit test more robust	2013-01-05 12:59:18 -05:00
Mark DePristo	69bf70c42e	Cleanup and more unit tests for RecalibrationTables in BQSR -- Added unit tests for combining RecalibrationTables. As a side effect now has serious tests for incrementDatumOrPutIfNecessary -- Removed unnecessary enum.index system from RecalibrationTables. -- Moved what were really static utility methods out of RecalibrationEngine and into RecalUtils.	2013-01-05 12:50:27 -05:00
Chris Hartl	9df30880cb	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-04 17:15:22 -05:00
Joel Thibault	01738e70c3	Archive the experimental Active Region Traversals	2013-01-04 17:05:31 -05:00
Chris Hartl	41bc416b65	Remove AAL and update MD5s.	2013-01-04 16:46:14 -05:00
Eric Banks	bce6fce58d	Resolving merge conflicts after Mark's latest push	2013-01-04 14:46:39 -05:00
Eric Banks	dd7f5e2be7	Hooking up the Bayesian estimate code for calculating Qemp in BQSR; various fixes after adding unit tests.	2013-01-04 14:43:11 -05:00
Joel Thibault	ab5526b372	More TODOs	2013-01-04 14:09:02 -05:00
Tad Jordan	fe06912a87	Removed sorting by row from walkers	2013-01-04 11:52:33 -05:00
Mark DePristo	810e2da1d4	Cleanup and unit tests for EventType and ReadRecalibrationInfo in BQSR -- Added unit tests for EventType and ReadRecalibrationInfo -- Simplified interface of EventType. Previously this enum carried an index with it, but this is redundant with the enum.ordinal function. Now just using that function instead.	2013-01-04 11:39:25 -05:00
Mark DePristo	1ba8d47a81	Unit tests for ProgressMeterDaemon	2013-01-04 11:39:24 -05:00
Mark DePristo	fbee4c11f1	Unit tests for ProgressMeterData	2013-01-04 11:39:23 -05:00
Joel Thibault	319d651e4a	Initial updates for ActiveRegionShard	2013-01-03 17:00:13 -05:00
Joel Thibault	e7553545ef	Initial updates for ReadShard	2013-01-03 17:00:13 -05:00
Joel Thibault	14a3ac0e3c	Enable the use of alternate shards	2013-01-03 17:00:13 -05:00
Joel Thibault	47e620dfbc	Create BAM index to test shard boundaries	2013-01-03 17:00:12 -05:00
Tad Jordan	c1ba12d71a	Added unit test for outputting sorted GATKReport Tables - Made few small modifications to code - Replaced the two arguments in GATKReportTable constructor with an enum used to specify way of sorting the table	2013-01-03 16:53:59 -05:00
Joel Thibault	dcb7735d3c	Active Region extensions must stay on contig	2013-01-02 14:46:24 -05:00
Chris Hartl	09199366b7	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-02 14:44:49 -05:00
Chris Hartl	e1d09ab0db	QD is now divided by the average length of the alternate allele (weighted by the allele count). The average length is stored in a related annotation, "AAL", which can be used to re-compute the "old" QD by simple multiplication. Integration tests should all pass.	2013-01-02 14:41:29 -05:00
Joel Thibault	a15f368bdc	Re-enable testIsActiveRangeLow/High	2013-01-02 11:57:50 -05:00
Mark DePristo	12f4c6307e	AutoFormattingTime cleanup and complete unittests -- Underlying system now uses long nano times to be more consistent with standard java practice -- Updated a few places in the code that were converting from nanoseconds to double seconds to use the new nanoseconds interface directly -- Bringing us to 100% test coverage with clover with AutoFormattingTimeUnitTest	2013-01-02 11:29:25 -05:00
Joel Thibault	429567cd3f	Rename to TraverseActiveRegionsUnitTest	2013-01-01 19:20:30 -05:00
Joel Thibault	57d38aac8a	Temporarily disable due to unknown contracts problem	2013-01-01 19:20:04 -05:00
Joel Thibault	7748b3816f	Delete the test BAI file as well as the BAM	2013-01-01 19:20:02 -05:00
Joel Thibault	5afeb465aa	TODOs	2013-01-01 19:19:17 -05:00
Eric Banks	75d5b88a3d	Enabling the Recal Report unit test (which looks like it was never ever enabled)	2012-12-26 15:35:50 -05:00
Mark DePristo	04cc75aaec	Minor cleanup and expansion of the RecalDatum unit tests	2012-12-24 13:35:58 -05:00
Mark DePristo	7bf1f67273	BQSR optimization: read group x quality score calibration table is thread-local -- AdvancedRecalibrationEngine now uses a thread-local table for the quality score table, and in finalizeData merges these thread-local tables into the final table. Radically reduces the contention for RecalDatum in this very highly used table -- Refactored the utility function to combine two tables into RecalUtils, and created UnitTests for this function, as well as all of RecalibrationTables. Updated combine in RecalibrationReport to use this table combiner function -- Made several core functions in RecalDatum into final methods for performance -- Added RecalibrationTestUtils, a home for recalibration testing utilities	2012-12-24 13:35:58 -05:00
Mark DePristo	7d250a789a	ArtificialReadPileupTestProvider now creates GATKSamRecords with good header values	2012-12-24 13:35:57 -05:00
Mark DePristo	295455eee2	NanoScheduler optimizations and simplification -- The previous model was to enqueue individual map jobs (with a resolution of 1 map job per map call), to track the number of map calls submitted via a counter and a semaphore, and to use this information in each map job and reduce to control the number of map jobs, when reduce was complete, etc. All hideously complex. -- This new model is vastly simply. The reducer basically knows nothing about the control mechanisms in the NanoScheduler. It just supports multi-threaded reduce. The NanoScheduler enqueues exactly nThread jobs to be run, which continually loop reading, mapping, and reducing until they run out of material to read, when they shut down. The master thread of the NS just holds a CountDownLatch, initialized to nThreads, and when each thread exits it reduces the latch by 1. The master thread gets the final reduce result when its free by the latch reaching 0. It's all super super simple. -- Because this model uses vastly fewer synchronization primitives within the NS itself, it's naturally much faster at getting things done, without any of the overhead obvious in profiles of BQSR -nct 2.	2012-12-24 13:35:57 -05:00
Mark DePristo	bf81db40f7	NanoScheduler reducer optimizations -- reduceAsMuchAsPossible no longer blocks threads via synchronization, but instead uses an explicit lock to manage access. If the lock is already held (because some thread is doing reduce) then the thread attempting to reduce immediately exits the call and continues doing productive work. They removes one major source of blocking contention in the NanoScheduler	2012-12-24 13:35:57 -05:00
Mark DePristo	161487b4a4	MapResult compareTo() is now unit tested -- Thanks clover!	2012-12-24 13:35:57 -05:00
Mark DePristo	7796ba7601	Minor optimizations for NanoScheduler -- Reducer.maybeReleaseLatch is no longer synchronized -- NanoScheduler only prints progress every 100 or so map calls	2012-12-24 13:35:56 -05:00
Mark DePristo	0f04485c24	NanoScheduler optimization: don't use a PriorityBlockingQueue for the MapResultsQueue -- Created a separate, limited interface MapResultsQueue object that previously was set to the PriorityBlockingQueue. -- The MapResultsQueue is now backed by a synchronized ExpandingArrayList, since job ids are integers incrementing from 0 to N. This means we avoid the n log n sort in the priority queue which was generating a lot of cost in the reduce step -- Had to update ReducerUnitTest because the test itself was brittle, and broken when I changed the underlying code. -- A few bits of minor code cleanup through the system (removing unused constructors, local variables, etc) -- ExpandingArrayList called ensureCapacity so that we increase the size of the arraylist once to accommodate the upcoming size needs	2012-12-24 13:35:56 -05:00
Tad Jordan	b491c177ff	Added functionality of outputting sorted GATKReport Tables - Added an optional argument to BaseRecalibrator to produce sorted GATKReport Tables - Modified BSQR Integration Tests to include the optional argument. Tests now produce sorted tables	2012-12-20 14:02:21 -05:00
David Roazen	07b369ca7e	Move VCF/BCF2/VariantContext to new standalone org.broadinstitute.variant package This is an intermediate commit so that there is a record of these changes in our commit history. Next step is to isolate the test classes as well, and then move the entire package to the Picard repository and replace it with a jar in our repo. -Removed all dependencies on org.broadinstitute.sting (still need to do the test classes, though) -Had to split some of the utility classes into "GATK-specific" vs generic methods (eg., GATKVCFUtils vs. VCFUtils) -Placement of some methods and choice of exception classes to replace the StingExceptions and UserExceptions may need to be tweaked until everyone is happy, but this can be done after the move.	2012-12-19 10:25:22 -05:00
Mark DePristo	1ca13f9581	Fundamentally better model for the NanoScheduler -- Now each map job reads a value, performs map, and does as much reducing as possible. This ensures that we scale performance with the nct value, so -nct 2 should result in 2x performance, -nct 3 3x, etc. All of this is accomplished using exactly NCT% of the CPU of the machine. -- Has the additional value of actually simplifying the code -- Resolves a long-standing annoyance with the nano scheduler.	2012-12-19 09:31:31 -05:00
Joel Thibault	a29df3e094	oops	2012-12-18 19:03:12 -05:00

1 2 3 4 5 ...

1257 Commits (9e5a31b5958cfb78e4da4e75f4d6b9bf06f7151d)