gatk-3.8

Commit Graph

Author	SHA1	Message	Date
David Roazen	3744d1a596	Collapse the downsampling fork in the GATK engine With LegacyLocusIteratorByState deleted, the legacy downsampling implementation was already non-functional. This commit removes all remaining code in the engine belonging to the legacy implementation.	2013-01-28 01:50:30 -05:00
Mauricio Carneiro	5003deafb6	Fixing split-reads unit tests The new implementation calls for the number of bases to chop, not the chop index anymore, so 0 is no longer appropriate.	2013-01-27 23:38:46 -05:00
Mauricio Carneiro	1aee8f205e	Tool to calculate per base coverage distribution GSATDG-29 #resolve	2013-01-27 23:38:46 -05:00
Mark DePristo	63913d516f	Add join call to Progress meter unit test so we actually know the daemon thread has finished	2013-01-27 16:52:45 -05:00
Mark DePristo	f5473285d5	Update CountReadsInActiveRegions md5	2013-01-27 14:35:55 -05:00
Mark DePristo	14d8afe413	Remove startSearchAt state variable from ActivityProfile -- New algorithm will only try to create an active region if there's at least maxREgionSize + propagation distance states in the list. When that's true, we are guaranteed to actually find a region. So this algorithm is not only truly correct but as super fast, as we only ever do the search for the end of the region when we will certainly find one, and actually generate a region.	2013-01-27 14:10:08 -05:00
Mark DePristo	c97a361b5d	Added realistic BandPassFilterUnitTest that ensures quality results for 1000G phase I VCF and NA12878 VCF -- Helped ID more bugs in the ActivityProfile, necessitating a new algorithm for popping off active regions. This new algorithm requires that at least maxRegionSize + prob. propagation distance states have been examined. This ensures that the incremental results are the same as you get reading in an entire profile and running getRegions on the full profile -- TODO is to remove incremental search start algorithm, as this is no longer necessary, and nicely eliminates a state variable I was always uncomfortable with	2013-01-27 14:10:08 -05:00
Mark DePristo	72b2e77eed	Linearize the findEndOfRegion algorithm in ActivityProfile, radically improving its performance -- Previous algorithm was O(N^2) -- #resolve GSA-723 https://jira.broadinstitute.org/browse/GSA-723	2013-01-27 14:10:06 -05:00
Mark DePristo	0fb238b61e	TraverseActiveRegions Optimizations and Bugfixes: make sure to record position of current locus to discharge active regions when there's no data -- Now records the position of the current locus, as well as that of the last read. Necessary when passing through regions with no reads. The previous version would keep accumulating empty active regions, and never discharge them until end of traversal (if there was no reads in the future) or until a read was finally found -- Protected a call to logger.debug with if ( logger.isDebugEnabled()) to avoid a lot of overhead in writing unseen debugger logging information	2013-01-27 14:10:06 -05:00
Mark DePristo	804caf7a45	HaplotypeCaller Optimization: return a inactive (p = 0.0) activity if the context has no bases in the pileup -- Allows us to avoid doing a lot of misc. work to set up the genotype when we don't have any data to genotype. Valuable in the case where we are passing through large regions without any data	2013-01-27 14:10:06 -05:00
Mark DePristo	93d88cdc68	Optimization: LocusReferenceView now passes along the contig index to createGenomeLoc, speeding up their creation -- Also cleaned up some unused methods	2013-01-27 14:10:06 -05:00
Mark DePristo	52a28968a9	ART optimization: BandPassActivityProfile only applies the gaussian filter if the state probability > 0	2013-01-27 14:10:06 -05:00
Mauricio Carneiro	705cccaf63	Making SplitReads output FastQ's instead of BAM - eliminates one step in my pipeline - BAM is too finicky and maintaining parameters that wouldn't be useful was becoming a headache, better avoided.	2013-01-27 02:36:31 -05:00
Mauricio Carneiro	ae38cf3f72	Adding read directionality to SplitReads - directionality only influences 'chop' operation (since split will maintain all bases of the original read) - added directional unit test GSATDG-25 #resolved	2013-01-26 22:25:56 -05:00
Mauricio Carneiro	6ea7133d95	Updating licenses of latest moved files	2013-01-26 13:46:52 -05:00
Mauricio Carneiro	ef4cc742e5	Fixing the licensing scripts - Fixed shell glob limitation that was failing license updates on big commits - Hook will now force user to re-commit after updating the licenses (pre-commit hook can't update and commit in the same process) - Moved all scripts to bash/zsh - Separated the license utilities in a separate python module to avoid copying code GSATDG-28 #resolve	2013-01-26 13:42:49 -05:00
Mauricio Carneiro	e7c9e3639e	Making metrics a required parameter in MarkDuplicates As requested by user (forum)	2013-01-25 17:49:49 -05:00
Ami Levy-Moonshine	99cb8d68e9	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-25 16:07:38 -05:00
Mark DePristo	b8c0b05785	Add contract to ensure that getAdapterBoundary returns the right result -- Also renamed the function to getAdaptorBoundary for consistency across the codebase	2013-01-25 16:05:17 -05:00
Mark DePristo	e445c71161	LIBS optimization for adapter clipping -- GATKSAMRecords now cache the result of the getAdapterBoundary, allowing us to avoid repeating a lot of work in LIBS -- Added unittests to cover adapter clipping	2013-01-25 16:05:17 -05:00
Ami Levy-Moonshine	f50db01742	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-25 15:55:56 -05:00
Ami Levy-Moonshine	b4447cdca2	In cases where one uses VariantContextUtils.GenotypeMergeType.REQUIRE_UNIQUE we used to verify that the samples names are unique in VariantContextUtils.simpleMerge for each VCs. It couse to a bug that was reported on the forum (when a VCs had 2 VC from the same sample). Now we will check it only in CombineVariants.init using the headers. A new function was added to SamplesUtils with unitTests in CVunitTest.java.	2013-01-25 15:49:51 -05:00
Khalid Shakir	c58e02a3bd	Added a QFunction.jobLocalDir for optionally tracking a node local directory that may have faster intermediate storage, with SGF ensuring that if the directory happens to be on the same machine that it get's a clone specific sub-directory to avoid collisions.	2013-01-25 14:28:04 -05:00
Ami Levy-Moonshine	fc22a5c71c	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-25 11:47:38 -05:00
Ami Levy-Moonshine	eaf6279d48	adding RBP to the general calling pipeline and few other small changes to it (to make it run with the current bundel file names	2013-01-25 11:47:30 -05:00
Mark DePristo	3f95f39be3	Updating HC md5s for new cutting algorithm and default band pass filter parameters	2013-01-25 11:07:29 -05:00
Mark DePristo	008b617577	Cleanup the getLIBS function in LocusIterator -- Now throws an UnsupportedOperationException in the base class. Only LocusView implements this function and actually returns the LIBS	2013-01-25 11:07:28 -05:00
Eric Banks	f7b80116d6	Don't let users play with the different exact model implementations.	2013-01-25 10:52:02 -05:00
Eric Banks	6dd0e1ddd6	Pulled out the --regenotype functionality from SelectVariants into its own tool: RegenotypeVariants. This allows us to move SelectVariants into the public suite of tools now.	2013-01-25 09:42:04 -05:00
Mark DePristo	c7a29b1d39	Fixed NPE in ActiveRegionUnitTest by allowing null supporting states in ActiveRegion	2013-01-24 13:48:00 -05:00
Mark DePristo	592f90aaef	ActivityProfile now cuts intelligently at the best local minimum when in a larger than max size active region -- This new algorithm is essential to properly handle activity profiles that have many large active regions generated from lots of dense variant events. The new algorithm passes unit tests and passes visualize visual inspection of both running on 1000G and NA12878 -- Misc. commenting of the code -- Updated ActiveRegionExtension to include a min active region size -- Renamed ActiveRegionExtension to ActiveRegionTraversalParameters, as it carries more than just the traversal extension now	2013-01-24 13:48:00 -05:00
Mark DePristo	c96b64973a	Soft clip probability propagation is capped by the MAX_PROB_PROPAGATION_DISTANCE, which is 50 bp	2013-01-24 13:48:00 -05:00
Mark DePristo	0c94e3d96e	Adaptively compute the band pass filter from the sigma, up to a maximum size of 50 bp -- Previously we allowed band pass filter size to be specified along with the sigma. But now that sigma is controllable from walkers and from the command line, we instead compute the filter size given the kernel from the sigma, including all kernel points with p > 1e-5 in the kernel. This means that if you use a smaller kernel you get a small band size and therefore faster ART -- Update, as discussed with Ryan, the sigma and band size to 17 bp for HC (default ART wide) and max band size of 50 bp	2013-01-24 13:47:59 -05:00
Mark DePristo	9e43a2028d	Making band pass filter size, sigma, active region max size and extension all accessible from the command line	2013-01-24 13:47:59 -05:00
Mark DePristo	cd91e365f4	Optimize getCurrentContigLength and getLocForOffset in ActivityProfile	2013-01-24 13:47:59 -05:00
Eric Banks	26ef400f85	More reviews	2013-01-24 13:20:12 -05:00
Eric Banks	6790e103e0	Moving lots of walkers back from protected to public (along with several of the VA annotations). Let's see whether Mauricio's automatic git hook really works!	2013-01-24 11:42:49 -05:00
Mauricio Carneiro	9e003b3296	more updates to the licensing scripts	2013-01-24 00:04:27 -07:00
Mauricio Carneiro	e1c1a4de4c	Moving licensing scripts to bash instead of tcsh	2013-01-23 22:59:44 -07:00
Mauricio Carneiro	42b056e8ea	Forgot the unit test.	2013-01-23 21:18:27 -07:00
Mauricio Carneiro	36c7c418e6	Adding the licenses to the files	2013-01-23 21:15:06 -07:00
Mauricio Carneiro	243fcde840	Adding license to SplitReads I got caught !	2013-01-23 21:12:36 -07:00
Mauricio Carneiro	643a508564	Added atlassian intellij plugin file to .gitignore	2013-01-23 20:55:28 -07:00
Mauricio Carneiro	a4fbf9df1e	SplitReads walker implementation (for AGBT talk) - walker simulates sequencing with different lengths to evaluate mapping/alignment biases relative to read length - split : splits reads n-ways generating 2^n reads for each read of the same length. - chop : chops the right end tail of the read creating 1 smaller read as if the sequencer stopped short. - mate information is preserved for chopped reads, and re-indexed for split reads so that each split still points at the corresponding split on the mate. - added systematic unit tests GSATDG-23	2013-01-23 20:55:28 -07:00
Chris Hartl	a3b98daf1a	Merge branch 'master' of gsa2:/humgen/gsa-scr1/chartl/dev/unstable	2013-01-23 14:49:34 -05:00
Chris Hartl	7fcfa4668c	Since GenotypeConcordance is now a standalone walker, remove the old GenotypeConcordance evaluation module and the associated integration tests.	2013-01-23 14:47:23 -05:00
Mauricio Carneiro	fc54a5da55	Adding the new bash script GSATDG-9	2013-01-23 12:14:34 -07:00
Mauricio Carneiro	6588b4bacd	tcsh -> bash David is convinced that the error is because i'm using tcsh instead of bash. Let's see if he's right :-) GSATDG-9	2013-01-23 12:10:34 -07:00
Mauricio Carneiro	8e8993da27	oops... forgot to change sys.argv to filename GSATDG-9	2013-01-23 12:01:06 -07:00
Mauricio Carneiro	820bec5572	Dropping xargs - continuing the effort to reduce blob size GSATDG-9	2013-01-23 11:54:20 -07:00

1 2 3 4 5 ...

11700 Commits (3744d1a5961ab117b2f67b6cc7c4f7bd2184591a) All Branches Search

11700 Commits (3744d1a5961ab117b2f67b6cc7c4f7bd2184591a)

All Branches