gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Mark DePristo	0fb238b61e	TraverseActiveRegions Optimizations and Bugfixes: make sure to record position of current locus to discharge active regions when there's no data -- Now records the position of the current locus, as well as that of the last read. Necessary when passing through regions with no reads. The previous version would keep accumulating empty active regions, and never discharge them until end of traversal (if there was no reads in the future) or until a read was finally found -- Protected a call to logger.debug with if ( logger.isDebugEnabled()) to avoid a lot of overhead in writing unseen debugger logging information	2013-01-27 14:10:06 -05:00
Mark DePristo	93d88cdc68	Optimization: LocusReferenceView now passes along the contig index to createGenomeLoc, speeding up their creation -- Also cleaned up some unused methods	2013-01-27 14:10:06 -05:00
Mauricio Carneiro	705cccaf63	Making SplitReads output FastQ's instead of BAM - eliminates one step in my pipeline - BAM is too finicky and maintaining parameters that wouldn't be useful was becoming a headache, better avoided.	2013-01-27 02:36:31 -05:00
Mauricio Carneiro	6ea7133d95	Updating licenses of latest moved files	2013-01-26 13:46:52 -05:00
Ami Levy-Moonshine	b4447cdca2	In cases where one uses VariantContextUtils.GenotypeMergeType.REQUIRE_UNIQUE we used to verify that the samples names are unique in VariantContextUtils.simpleMerge for each VCs. It couse to a bug that was reported on the forum (when a VCs had 2 VC from the same sample). Now we will check it only in CombineVariants.init using the headers. A new function was added to SamplesUtils with unitTests in CVunitTest.java.	2013-01-25 15:49:51 -05:00
Mark DePristo	008b617577	Cleanup the getLIBS function in LocusIterator -- Now throws an UnsupportedOperationException in the base class. Only LocusView implements this function and actually returns the LIBS	2013-01-25 11:07:28 -05:00
Eric Banks	6dd0e1ddd6	Pulled out the --regenotype functionality from SelectVariants into its own tool: RegenotypeVariants. This allows us to move SelectVariants into the public suite of tools now.	2013-01-25 09:42:04 -05:00
Mark DePristo	592f90aaef	ActivityProfile now cuts intelligently at the best local minimum when in a larger than max size active region -- This new algorithm is essential to properly handle activity profiles that have many large active regions generated from lots of dense variant events. The new algorithm passes unit tests and passes visualize visual inspection of both running on 1000G and NA12878 -- Misc. commenting of the code -- Updated ActiveRegionExtension to include a min active region size -- Renamed ActiveRegionExtension to ActiveRegionTraversalParameters, as it carries more than just the traversal extension now	2013-01-24 13:48:00 -05:00
Mark DePristo	0c94e3d96e	Adaptively compute the band pass filter from the sigma, up to a maximum size of 50 bp -- Previously we allowed band pass filter size to be specified along with the sigma. But now that sigma is controllable from walkers and from the command line, we instead compute the filter size given the kernel from the sigma, including all kernel points with p > 1e-5 in the kernel. This means that if you use a smaller kernel you get a small band size and therefore faster ART -- Update, as discussed with Ryan, the sigma and band size to 17 bp for HC (default ART wide) and max band size of 50 bp	2013-01-24 13:47:59 -05:00
Mark DePristo	9e43a2028d	Making band pass filter size, sigma, active region max size and extension all accessible from the command line	2013-01-24 13:47:59 -05:00
Eric Banks	6790e103e0	Moving lots of walkers back from protected to public (along with several of the VA annotations). Let's see whether Mauricio's automatic git hook really works!	2013-01-24 11:42:49 -05:00
Mark DePristo	09edc6baeb	TraverseActiveRegions now writes out very nice active region and activity profile IGV formatted files	2013-01-23 13:46:01 -05:00
Mark DePristo	8e8126506b	Renaming IncrementalActivityProfile to ActivityProfile -- Also adding a work in progress functionality to make it easy to visualize activity profiles and active regions in IGV	2013-01-23 13:46:01 -05:00
Mark DePristo	eb60235dcd	Working version of incremental active region traversals -- The incremental version now processes active regions as soon as they are ready to be processed, instead of waiting until the end of the shard as in the previous version. This means that ART walkers will now take much less memory than previously. On chr20 of NA12878 the majority of regions are processed with as few as 500 reads in memory. Over the whole chr20 only 5K reads were ever held in ART at one time. -- Fixed bug in the way active regions worked with shard boundaries. The new implementation no longer see shard boundaries in any meaningful way, and that uncovered a problem that active regions were always being closed across shard boundaries. This behavior was actually encoded in the unit tests, so those needed to be updated as well. -- Changed the way that preset regions work in ART. The new contract ensures that you get exactly the regions you requested. the isActive function is still called, but its result has no impact on the regions. With this functionality is should be possible to use the HC as a generic assembly by forcing it to operate over very large regions -- Added a few misc. useful functions to IncrementalActivityProfile	2013-01-23 13:46:00 -05:00
Mark DePristo	8d9b0f1bd5	Restructure ActivityProfiler into root class ActivityProfile and derived class BandPassActivityProfile -- Required before I jump in an redo the entire activity profile so it's can be run imcrementally -- This restructuring makes the differences between the two functionalities clearer, as almost all of the functionality is in the base class. The only functionality provided by the BandPassActivityProfile is isolated to a finalizeProfile function overloaded from the base class. -- Renamed ActivityProfileResult to ActivityProfileState, as this is a clearer indication of its actual functionality. Almost all of the misc. walker changes are due to this name update -- Code cleanup and docs for TraverseActiveRegions -- Expanded unit tests for ActivityProfile and ActivityProfileState	2013-01-23 13:45:21 -05:00
Mauricio Carneiro	7b8b064165	Last manual license update (hopefully) if everyone updates their git hook accordingly, this will be the last time I have to manually run the script. GSATDG-5	2013-01-18 16:13:07 -05:00
Eric Banks	79bc818022	Bug fix for VariantsToVCF: old dbSNP files can have '-' as reference base and those records always need to be padded.	2013-01-16 16:15:58 -05:00
Mark DePristo	2a42b47e4a	Massive expansion of ActiveRegionTraversal unit tests, resulting in several bugfixes to ART -- UnitTests now include combinational tiling of reads within and spanning shard boundaries -- ART now properly handles shard transitions, and does so efficiently without requiring hash sets or other collections of reads -- Updating HC and CountReadsInActiveRegions integration tests	2013-01-16 15:30:00 -05:00
Mark DePristo	ddcb33fcf8	Cache result of getLocation() in Shard so we don't performance expensive calculation over and over	2013-01-16 15:30:00 -05:00
Eric Banks	e47a389b26	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-16 14:59:11 -05:00
Khalid Shakir	4ffb43079f	Re-committing the following changes from Dec 18: Refactored interval specific arguments out of GATKArgumentCollection into InvtervalArgumentCollection such that it can be used in other CommandLinePrograms. Updated SelectHeaders to print out full interval arguments. Added RemoteFile.createUrl(Date expiration) to enable creation of presigned URLs for download over http: or file:.	2013-01-16 12:43:15 -05:00
Eric Banks	392b5cbcdf	The CachingIndexedFastaSequenceFile now automatically converts IUPAC bases to Ns and errors out on other non-standard bases. This way walkers won't see anything except the standard bases plus Ns in the reference. Added option to turn off this feature (to maintain backwards compatibility). As part of this commit I cleaned up the BaseUtils code by adding a Base enum and removing all of the static indexes for each of the bases. This uncovered a bug in the way the DepthOfCoverage walker counts deletions (it was counting Ns instead!) that isn't covered by tests. Fortunately that walker is being deprecated soon...	2013-01-16 10:22:43 -05:00
Mark DePristo	3c37ea014b	Retire original TraverseActiveRegion, leaving only the new optimized version -- Required some updates to MD5s, which was unexpected, and will be sorted out later with more detailed unit tests	2013-01-15 10:24:45 -05:00
Mark DePristo	b8b2b9b2de	ManagingReferenceOrderedView optimization: don't allow a fresh RefMetaDataTracker in the frequent case where there's no reference meta data	2013-01-14 16:30:16 -05:00
Mark DePristo	7eea6b8f92	ReservoirDownsampler optimizations -- Add an option to not allocate always ArrayLists of targetSampleSize, but rather the previous size + MARGIN. This helps for LIBS as most of the time we don't need nearly so much space as we allow -- consumeFinalizedItems returns an empty list if the reservior is empty, which it often true for our BAM files with low coverage -- Allow empty sample lists for SamplePartitioner as these are used by the RefTraversals and other non-read based traversals Make the reservoir downsampler use a linked list, rather than a fixed sized array list, in the expectFewOverflows case	2013-01-14 16:30:16 -05:00
Mark DePristo	a4334a67e0	SamplePartitioner optimizations and bugfixes -- Use a linked hash map instead of a hash map since we want to iterate through the map fairly often -- Ensure that we call doneSubmittingReads before getting reads for samples. This function call fell out before and since it wasn't enforced I only noticed the problem while writing comments -- Don't make unnecessary calls to contains for map. Just use get() and check that the result is null -- Use a LinkedList in PassThroughDownsampler, since this is faster for add() than the existing ArrayList, and we were's using random access to any resulting	2013-01-14 16:30:16 -05:00
Mark DePristo	e88dae2758	LocusIteratorByState operates natively on GATKSAMRecords now -- Updated code to reflect this new typing	2013-01-11 15:17:18 -05:00
Mark DePristo	94cb50d3d6	Retire LegacyLocusIteratorByState -- Left in the remaining infrastructure for David to remove, but the legacy downsampler is no longer a functional option in the GATK	2013-01-11 15:17:18 -05:00
Mark DePristo	cc0c1b752a	Delete old LocusIteratorByState, leaving only new LIBS and legacy	2013-01-11 15:17:18 -05:00
Mark DePristo	6a91902aa2	Fix final merge conflicts	2013-01-11 15:17:18 -05:00
Mark DePristo	b9a33d3c66	Split original and optimized ART into largely independent pieces -- Allows us to cleanly run old and new art, which now have different traversal behavior (on purpose). Split unit tests as well.	2013-01-11 15:17:18 -05:00
Mark DePristo	02130dfde7	Cleanup ART -- Initialize routine captures essential information for running the traversal	2013-01-11 15:17:17 -05:00
Mark DePristo	9b2be795a7	Initial working version of new ActiveRegionTraversal based on the LocusIteratorByState read stream -- Implemented as a subclass of TraverseActiveRegions -- Passes all unit tests -- Will be very slow -- needs logical fixes	2013-01-11 15:17:17 -05:00
Mark DePristo	2c38310868	Create LIBS using new AlignmentStateMachine infrastructure -- Optimizations to AlignmentStateMachine -- Properly count deletions. Added unit test for counting routines -- AlignmentStateMachine.java is no longer recursive -- Traversals now use new LIBS, not the old one	2013-01-11 15:17:17 -05:00
Mark DePristo	0ac4352614	LIBS can now (optionally) track the unique reads it uses from the underlying read iterator -- This capability is essential to provide an ordered set of used reads to downstream users of LIBS, such as ART, who want an efficient way to get the reads used in LIBS -- Vastly expanded the multi-read, multi-sample LIBS unit tests to make sure this capability is working -- Added createReadStream to ArtificialSAMUtils that makes it relatively easy to create multi-read, multi-sample read streams for testing	2013-01-11 15:17:16 -05:00
Mark DePristo	b3ecfbfce8	Refactor LIBS into component parts, expand unit tests, some code cleanup -- Split out all of the inner classes of LIBS into separate independent classes -- Split / add unit tests for many of these components. -- Radically expand unit tests for SAMRecordAlignmentState (the lowest level piece of code) making sure at least some of it works -- No need to change unit tests or integration tests. No change in functionality. -- Added (currently disabled) code to track all submitted reads to LIBS, but this isn't accessible or tested	2013-01-11 15:17:16 -05:00
Mark DePristo	b2990497e2	Refactor LIBS into utils.locusiterator before refactoring	2013-01-11 15:17:16 -05:00
Mauricio Carneiro	9ed922d562	Updating licenses to Eric's last commit - for now we're still running the script by hand, soon automated solution will be in place. GSATDG-5	2013-01-11 14:33:00 -05:00
Eric Banks	e7906713d9	Moving some random walkers back to public as requested by Mark. Mauricio will the licenses get updated automatically?	2013-01-11 02:03:43 -05:00
Ami Levy-Moonshine	352cb831d0	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-10 21:27:06 -05:00
Ami Levy-Moonshine	fac0bce916	add RunCoveredByNSamplesSites; changes in CoveredByNSamplesSites so it can work in parallel; also, move it to diagnostics	2013-01-10 21:26:49 -05:00
Mauricio Carneiro	2a4ccfe6fd	Updated all JAVA file licenses accordingly GSATDG-5	2013-01-10 17:06:41 -05:00
Eric Banks	4fa439d89e	Move some classes back to public because they are used in the engine. Move some test classes to protected. We should have no more public->protected dependancies now	2013-01-09 11:06:10 -05:00
Eric Banks	676e79542a	Bring CombineVariants back to public since it's used for SG. I needed to break ChromosomeCountConstants out of ChromosomeCounts to make this work.	2013-01-09 10:39:48 -05:00
Ami Levy-Moonshine	15ca5015cd	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-08 21:53:36 -05:00
Ami Levy-Moonshine	d6071728e8	add new walker to find sites with good coverage	2013-01-08 17:10:38 -05:00
Eric Banks	264cc9e78d	Resolve protected->public dependencies for BQSR by wrapping the BQSR-specific arguments in a new class. Instead of the GATK Engine creating a new BaseRecalibrator (not clean), it just keeps track of the arguments (clean). There are still some dependency issues, but it looks like they are related to Ami's code. Need to look into it further.	2013-01-08 16:23:29 -05:00
Eric Banks	f0bd1b5ae5	Okay, all public->protected dependencies are gone except for the BQSR arguments. I'll need to think through this but should be able to make that work too.	2013-01-08 15:46:32 -05:00
Eric Banks	35d9bd377c	Moved (nearly) all Walkers from public to protected and removed GATKLite utils	2013-01-07 14:42:40 -05:00
Tad Jordan	04e3978b04	Fixed VariantEval tests -Added sorting by rows to VariantEval	2013-01-07 12:45:32 -05:00

1 2 3 4 5 ...

2121 Commits (5003deafb69831c1f40eb7c83fd0ea496e1bd181)