gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	562f2406d7	Added check that BaseRecalibrator is not being run on a reduced bam. - Throws user exception if it is. - Can be turned off with --allow_bqsr_on_reduced_bams_despite_repeated_warnings argument. - Added test to check this is working. - Added docs to BQSRReadTransformer explaining why this check is not performed on PrintReads end. - Added small bug fix to GenomeAnalysisEngine that I uncovered in this process. - Added comment about not changing the program record name, as per reviewer comments. - Removed unused variable.	2013-02-06 10:14:27 -05:00
Eric Banks	4e5ff3d6f1	Bug fix for NPE in HC with --dbsnp argument. - I had added the framework in the VA engine but should not have hooked it up to the HC yet since the RefMetaDataTracker is always null. - Added contracts and docs to the relevant methods in the VA engine so that this doesn't happen in the future.	2013-02-05 21:59:19 -05:00
David Roazen	e7e76ed76e	Replace org.broadinstitute.variant with jar built from the Picard repo The migration of org.broadinstitute.variant into the Picard repo is complete. This commit deletes the org.broadinstitute.variant sources from our repo and replaces it with a jar built from a checkout of the latest Picard-public svn revision.	2013-02-05 17:24:25 -05:00
Mauricio Carneiro	f6bc5be6b4	Fixing license on Yossi's file Somebody needs to set up the license hook ;-)	2013-02-05 11:14:43 -05:00
MauricioCarneiro	050c4794a5	Merge pull request #11 from yfarjoun/per_sample2 -Added Per-Sample Contamination Removal to UnifiedGenotyper: Added an @A...	2013-02-05 08:04:29 -08:00
Eric Banks	00c98ff0cf	Need to reset the static counter before tests are run or else we won't be deterministic. Also need to give credit where credit is due: David was right that this was not a non-deterministic Bamboo failure...	2013-02-05 10:41:46 -05:00
Yossi Farjoun	de03f17be4	-Added Per-Sample Contamination Removal to UnifiedGenotyper: Added an @Advanced option to the StandardCallerArgumentCollection, a file which should contain two columns, Sample (String) and Fraction (Double) that form the Sample-Fraction map for the per-sample AlleleBiasedDownsampling. -Integration tests to UnifiedGenotyper (Using artificially contaminated BAMs created from a mixure of two broadly concented samples) were added -includes throwing an exception in HC if called using per-sample contamination file (not implemented); tested in a new integration test. -(Note: HaplotypeCaller already has "Flat" contamination--using the same fraction for all samples--what it doesn't have is _per-sample_ AlleleBiasedDownsampling, which is what has been added here to the UnifiedGenotyper. -New class: DefaultHashMap (a Defaulting HashMap...) and new function: loadContaminationFile (which reads a Sample-Fraction file and returns a map). -Unit tests to the new class and function are provided. -Added tests to see that malformed contamination files are found and that spaces and tabs are now read properly. -Merged the integration tests that pertain to biased downsampling, whether HaplotypeCaller or unifiedGenotyper, into a new IntegrationTest class.	2013-02-04 18:24:36 -05:00
Mark DePristo	a281fa6548	Resolves Genome Sequence Analysis GSA-750 Don't print an endless series of starting messages from the ProgressMeter -- The progress meter isn't started until the GATK actually calls execute on the microscheduler. Now we get a message saying "Creating shard strategy" while this (expensive) operation runs	2013-02-04 15:47:30 -05:00
Chris Hartl	3c99010be4	Part 1 of Variant Annotator Unit tests: PerReadAlleleLikelihoodMap - Added contract enforcement for public methods - Refactored the conversion from read -> (allele -> likelihood) to allele -> list[read] into its own method - added method documentation for non getters/setters - finals, finals everywhere - Add in a unit test for the PerReadAlleleLikelihoodMap. Complete coverage except for .clear() and a method that is a straight call into a separately-tested utility class.	2013-02-04 14:16:06 -05:00
Guillermo del Angel	5521bf3dd7	Fix bad contract implementation	2013-02-03 16:15:14 -05:00
Guillermo del Angel	f31bf37a6f	First step in better BQSR unit tests for covariates (not done yet): more test coverage in basic covariates, test logging several read groups/read lengths and more combinations simultaneously. Add basic Javadocs headers for PerReadAlleleLikehoodMap.	2013-02-03 15:31:30 -05:00
Mark DePristo	8d08780582	GATKRunReport now tracks the errorMessage and errorThrown during post for later analysis -- This is primarily useful in the unit tests, as I now print out additional information on why a test might have failed, if it in fact did.	2013-02-02 19:24:31 -05:00
Mark DePristo	6382d5bdc9	Final cleanup and unit testing for GATKRunReport -- Bringing code up to document, style, and code coverage specs -- Move GATKRunReportUnitTest to private -- Fully expand GATKRunReportUnitTests to coverage writing and reading GATKRunReport to local disk, to standard out, to AWS. -- Move documentation URL from GATKRunReport to UserException -- Delete a few unused files from s3GATKReport -- Added capabilities to GATKRunReport to make testing easier -- Added capabilities to deserialize GATKRunReports from an InputStream	2013-02-02 15:06:56 -05:00
Mark DePristo	eb17230c2f	Update AWS access and private keys to the new GATK2LogUploader user -- Updated EncryptAWSKeys to write the key into the correct resources directory	2013-02-02 15:06:56 -05:00
Eric Banks	03df5e6ee6	- Added more comprehensive tests for consensus creation to RR. Still need to add tests for I/D ops. - Added RR qual correctness tests (note that this is a case where we don't add code coverage but still need to test critical infrastructure). - Also added minor cleanup of BaseUtils	2013-02-01 15:37:19 -05:00
David Roazen	c6581e4953	Update MD5s to reflect version number change in the BAM header I've confirmed via a script that all of these differences only involve the version number bump in the BAM headers and nothing else: < @HD VN:1.0 GO:none SO:coordinate --- > @HD VN:1.4 GO:none SO:coordinate	2013-02-01 13:51:31 -05:00
David Roazen	c4b0ba4d45	Temporarily back out the Picard team's patches to GATKBAMIndex from December These patches to GATKBAMIndex are causing massive BAM index reading errors in combination with the latest version of Picard. The bug is either in the patches themselves or in the underlying SeekableBufferedStream class they rely on. Until the cause can be identified, we are temporarily backing out these changes so that we can continue to run with the latest Picard/Tribble. This reverts commits: 81483ec21e528790dfa719d18cdee27d577ca98e 68cf0309db490b79eecdabb4034987ff825ffea8 54bb68f28ad5fe1b3df01702e9c5e108106a0176	2013-02-01 13:51:31 -05:00
David Roazen	1fb182d951	Restore Utils.appendArray() This utility method was used by the PipelineTest class, and deleting it was causing tests to not compile.	2013-02-01 13:51:31 -05:00
Mark DePristo	6d9816f1a5	Cleanup unused utils functions, and add unit test for one (append)	2013-02-01 13:51:31 -05:00
Mark DePristo	206eab80e3	Expanded unit tests for AlignmentUtils -- Added JIRA entries for the remaining capabilities to be fixed up and unit tested	2013-02-01 13:51:31 -05:00
David Roazen	292037dfda	Rev picard, sam-jdk, and tribble This is a necessary prerequisite for the org.broadinstitute.variant migration. -Picard and sam-jdk go from version 1.67.1197 to 1.84.1337 -Picard-private goes from version 2375 to 2662 -Tribble goes from version 119 to 1.84.1337 -RADICALLY trimmed down the list of classes we extract from Picard-private (jar goes from 326993 bytes to 6445 bytes!)	2013-02-01 13:51:30 -05:00
Ryan Poplin	e07cefb058	Updating AlignmentUtils.consolidateCigar() to the GATK coding standards.	2013-02-01 13:51:30 -05:00
Mark DePristo	c3c4e2785b	UnitTest for calcNumHighQualityBases in AlignmentUtils	2013-01-31 13:57:23 -05:00
David Roazen	6ec1e613a2	Move AWS keys to a resources subdirectory within the phonehome package Resources must be in a subdirectory called "resources" in the package hierarchy to be picked up by the packaging system. Adding each resource manually to the jars in build.xml does not cause the resource to be added to the standalone GATK jar when we package the GATK, so it's best to always use this convention.	2013-01-31 11:56:34 -05:00
Ryan Poplin	496727ac5e	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-31 11:51:08 -05:00
Ryan Poplin	ac033ce41a	Intermediate commit of new bubble assembly graph traversal algorithm for the HaplotypeCaller. Adding functionality for a path from an assembly graph to calculate its own cigar string from each of the bubbles instead of doing a massive Smith-Waterman alignment between the path's full base composition and the reference.	2013-01-31 11:32:19 -05:00
Eric Banks	9c0207f8ef	Fixing BQSR/BAQ bug: If a read had an existing BAQ tag, was clipped by our engine, and couldn't have the BAQ recalculated (for whatever reason), then we would fail in the BQSR because we would default to using the old tag (which no longer matched the length of the read bases). The right thing to do here is to remove the old BAQ tag when RECALCULATE and ADD_TAG are the BAQ modes used but BAQ cannot be recalculated. Added a unit test to ensure that the tags are removed in such a case.	2013-01-31 11:03:17 -05:00
Mark DePristo	404ee9a6e4	More aggressive checking of AWS key quality upon startup in the GATK	2013-01-31 09:08:38 -05:00
Ryan Poplin	438c98035b	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-30 17:12:28 -05:00
Ryan Poplin	bb29bd7df7	Use base List and Map types in the HaplotypeCaller when possible.	2013-01-30 17:09:27 -05:00
Mark DePristo	b707331332	Encrypt GATK AWS keys using the GATK private key, and decrypt as needed as a resource when uploading to AWS logs -- Has the overall effect that the GATK user AWS keys are no longer visible in the gatk source as plain text. This will stop AWS from emailing me (they crawl the web looking for keys) -- Added utility EncryptAWSKeys that takes as command line arguments the GATK user AWS access and secret keys, encrypts them with the GATK private key, and writes out the resulting file to resources in phonehome. -- GATKRunReport now decrypts as needed these keys using the GATK public key as resources in the GATK bundle -- Refactored the essential function of Resource (reading the resource) from IOUtils into the class itself. Now how to get the data in the resouce is straightforward -- Refactored md5 calculation code from a byte[] into Utils. Added unit tests -- Committing the encrypted AWS keys -- #resolves https://jira.broadinstitute.org/browse/GSA-730	2013-01-30 16:42:23 -05:00
David Roazen	591df2be44	Move additional VariantContext utility methods back to the GATK Thanks to Eric for his feedback	2013-01-30 13:58:17 -05:00
David Roazen	9985f82a7a	Move BaseUtils back to the GATK by request, along with associated utility methods	2013-01-30 13:09:44 -05:00
Mark DePristo	1ff78679ca	UnitTesting example for copying -- Example combinatorial unit tests, plus unit tests that create reads and bam files, pileups, variant context (from scratch and from a file), and genome locs	2013-01-30 11:19:08 -05:00
Eric Banks	d067c7f136	Resolving merge conflicts	2013-01-30 10:47:59 -05:00
Eric Banks	9025567cb8	Refactoring the SimpleGenomeLoc into the now public utility UnvalidatingGenomeLoc and the RR-specific FinishedGenomeLoc. Moved the merging utility methods into GenomeLoc and moved the unit tests around accordingly.	2013-01-30 10:45:29 -05:00
Mark DePristo	4852c7404e	GenomeLocs are already comparable, so I'm removing the less complete GenomeLocComparator class and updating ReduceReads and CompressionStash to use built-in comparator	2013-01-30 10:12:27 -05:00
Mark DePristo	45603f58cd	Refactoring and unit testing GenomeLocParser -- Moved previously inner class to MRUCachingSAMSequenceDictionary, and unit test to 100% coverage -- Fully document all functions in GenomeLocParser -- Unit tests for things like parsePosition (shocking it wasn't tested!) -- Removed function to specifically create GenomeLocs for VariantContexts. The fact that you must incorporate END attributes in the context means that createGenomeLoc(Feature) works correctly -- Depreciated (and moved functionality) of setStart, setStop, and incPos to GenomeLoc -- Unit test coverage at like 80%, moving to 100% with next commit	2013-01-30 09:47:47 -05:00
Mark DePristo	8562bfaae1	Optimize GenomeLocParser.createGenomeLoc -- The new version is roughly 2x faster than the previous version. The key here was to cleanup the workflow for validateGenomeLoc and remove the now unnecessary synchronization blocks from the CachingSequencingDictionary, since these are now thread local variables -- #resolves https://jira.broadinstitute.org/browse/GSA-724	2013-01-30 09:47:47 -05:00
Mark DePristo	69dd5cc902	AutoFormattingTimeUnitTest should be in utils	2013-01-30 09:47:47 -05:00
Mark DePristo	92c5635e19	Cleanup, document, and unit test ActiveRegion -- All functions tested. In the testing / review I discovered several bugs in the ActiveRegion routines that manipulate reads. New version should be correct -- Enforce correct ordering of supporting states in constructor -- Enforce read ordering when adding reads to an active region in add -- Fix bug in HaplotypeCaller map with new updating read spans. Now get the full span before clipping down reads in map, so that variants are correctly placed w.r.t. the full reference sequence -- Encapsulate isActive field with an accessor function -- Make sure that all state lists are unmodifiable, and that the docs are clear about this -- ActiveRegion equalsExceptReads is for testing only, so make it package protected -- ActiveRegion.hardClipToRegion must resort reads as they can become out of order -- Previous version of HC clipped reads but, due to clipping, these reads could no longer overlap the active region. The old version of HC kept these reads, while the enforced contracts on the ActiveRegion detected this was a problem and those reads are removed. Has a minor impact on PLs and RankSumTest values -- Updating HaplotypeCaller MD5s to reflect changes to ActiveRegions read inclusion policy	2013-01-30 09:47:12 -05:00
David Roazen	6449c320b4	Fix the CachingIndexedFastaSequenceFileUnitTest BaseUtils.convertIUPACtoN() no longer throws a UserException, since it's in org.broadinstitute.variant	2013-01-29 21:07:16 -05:00
Mauricio Carneiro	29fd536c28	Updating licenses manually Please check that your commit hook is properly pointing at ../../private/shell/pre-commit Conflicts: public/java/test/org/broadinstitute/variant/VariantBaseTest.java	2013-01-29 17:27:53 -05:00
David Roazen	a536e1da84	Move some VCF/VariantContext methods back to the GATK based on feedback -Moved some of the more specialized / complex VariantContext and VCF utility methods back to the GATK. -Due to this re-shuffling, was able to return things like the Pair class back to the GATK as well.	2013-01-29 16:56:55 -05:00
Ami Levy-Moonshine	a1908a0eca	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-29 16:33:20 -05:00
Ami Levy-Moonshine	4aaef495c6	correct the help message	2013-01-29 16:33:12 -05:00
Ryan Poplin	bf25196a0b	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-28 22:33:13 -05:00
Ryan Poplin	e9c3a0acdf	fix typo	2013-01-28 22:18:58 -05:00
Ami Levy-Moonshine	a8a68697f1	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-28 20:18:51 -05:00
Guillermo del Angel	5995f01a01	Big intermediate commit (mostly so that I don't have to go again through merge/rebase hell) in expanding BQSR capabilities. Far from done yet: a) Add option to stratify CalibrateGenotypeLikelihoods by repeat - will add integration test in next push. b) Simulator to produce BAM files with given error profile - for now only given SNP/indel error rate can be given. A bad context can be specified and if such context is present then error rate is increased to given value. c) Rewrote RepeatLength covariate to do the right thing - not fully working yet, work in progress. d) Additional experimental covariates to log repeat unit and combined repeat unit+length. Needs code refactoring/testing	2013-01-28 19:55:46 -05:00
Ami Levy-Moonshine	3f5c2e4989	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-28 19:04:52 -05:00
Ami Levy-Moonshine	c103623cf6	bug fix in my new function at SampleUtils.java	2013-01-28 19:04:39 -05:00
Ryan Poplin	d665a8ba0c	The Bayesian calculation of Qemp in the BQSR is now hierarchical. This fixes issues in which the covariate bins were very sparse and the prior estimate being used was the original quality score. This resulted in large correction factors for each covariate which breaks the equation. There is also now a new option, qlobalQScorePrior, which can be used to ignore the given (very high) quality scores and instead use this value as the prior.	2013-01-28 15:56:33 -05:00
Tad Jordan	8777e02aa5	R issue in Queue fixed. GSA-721	2013-01-28 14:42:20 -05:00
David Roazen	f63f27aa13	org.broadinstitute.variant refactor, part 2 -removed sting dependencies from test classes -removed org.apache.log4j dependency -misc cleanup	2013-01-28 09:03:46 -05:00
David Roazen	3744d1a596	Collapse the downsampling fork in the GATK engine With LegacyLocusIteratorByState deleted, the legacy downsampling implementation was already non-functional. This commit removes all remaining code in the engine belonging to the legacy implementation.	2013-01-28 01:50:30 -05:00
Mark DePristo	63913d516f	Add join call to Progress meter unit test so we actually know the daemon thread has finished	2013-01-27 16:52:45 -05:00
Mark DePristo	14d8afe413	Remove startSearchAt state variable from ActivityProfile -- New algorithm will only try to create an active region if there's at least maxREgionSize + propagation distance states in the list. When that's true, we are guaranteed to actually find a region. So this algorithm is not only truly correct but as super fast, as we only ever do the search for the end of the region when we will certainly find one, and actually generate a region.	2013-01-27 14:10:08 -05:00
Mark DePristo	c97a361b5d	Added realistic BandPassFilterUnitTest that ensures quality results for 1000G phase I VCF and NA12878 VCF -- Helped ID more bugs in the ActivityProfile, necessitating a new algorithm for popping off active regions. This new algorithm requires that at least maxRegionSize + prob. propagation distance states have been examined. This ensures that the incremental results are the same as you get reading in an entire profile and running getRegions on the full profile -- TODO is to remove incremental search start algorithm, as this is no longer necessary, and nicely eliminates a state variable I was always uncomfortable with	2013-01-27 14:10:08 -05:00
Mark DePristo	72b2e77eed	Linearize the findEndOfRegion algorithm in ActivityProfile, radically improving its performance -- Previous algorithm was O(N^2) -- #resolve GSA-723 https://jira.broadinstitute.org/browse/GSA-723	2013-01-27 14:10:06 -05:00
Mark DePristo	0fb238b61e	TraverseActiveRegions Optimizations and Bugfixes: make sure to record position of current locus to discharge active regions when there's no data -- Now records the position of the current locus, as well as that of the last read. Necessary when passing through regions with no reads. The previous version would keep accumulating empty active regions, and never discharge them until end of traversal (if there was no reads in the future) or until a read was finally found -- Protected a call to logger.debug with if ( logger.isDebugEnabled()) to avoid a lot of overhead in writing unseen debugger logging information	2013-01-27 14:10:06 -05:00
Mark DePristo	93d88cdc68	Optimization: LocusReferenceView now passes along the contig index to createGenomeLoc, speeding up their creation -- Also cleaned up some unused methods	2013-01-27 14:10:06 -05:00
Mark DePristo	52a28968a9	ART optimization: BandPassActivityProfile only applies the gaussian filter if the state probability > 0	2013-01-27 14:10:06 -05:00
Mauricio Carneiro	705cccaf63	Making SplitReads output FastQ's instead of BAM - eliminates one step in my pipeline - BAM is too finicky and maintaining parameters that wouldn't be useful was becoming a headache, better avoided.	2013-01-27 02:36:31 -05:00
Mauricio Carneiro	6ea7133d95	Updating licenses of latest moved files	2013-01-26 13:46:52 -05:00
Ami Levy-Moonshine	99cb8d68e9	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-25 16:07:38 -05:00
Mark DePristo	b8c0b05785	Add contract to ensure that getAdapterBoundary returns the right result -- Also renamed the function to getAdaptorBoundary for consistency across the codebase	2013-01-25 16:05:17 -05:00
Mark DePristo	e445c71161	LIBS optimization for adapter clipping -- GATKSAMRecords now cache the result of the getAdapterBoundary, allowing us to avoid repeating a lot of work in LIBS -- Added unittests to cover adapter clipping	2013-01-25 16:05:17 -05:00
Ami Levy-Moonshine	b4447cdca2	In cases where one uses VariantContextUtils.GenotypeMergeType.REQUIRE_UNIQUE we used to verify that the samples names are unique in VariantContextUtils.simpleMerge for each VCs. It couse to a bug that was reported on the forum (when a VCs had 2 VC from the same sample). Now we will check it only in CombineVariants.init using the headers. A new function was added to SamplesUtils with unitTests in CVunitTest.java.	2013-01-25 15:49:51 -05:00
Ami Levy-Moonshine	fc22a5c71c	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-25 11:47:38 -05:00
Ami Levy-Moonshine	eaf6279d48	adding RBP to the general calling pipeline and few other small changes to it (to make it run with the current bundel file names	2013-01-25 11:47:30 -05:00
Mark DePristo	008b617577	Cleanup the getLIBS function in LocusIterator -- Now throws an UnsupportedOperationException in the base class. Only LocusView implements this function and actually returns the LIBS	2013-01-25 11:07:28 -05:00
Eric Banks	6dd0e1ddd6	Pulled out the --regenotype functionality from SelectVariants into its own tool: RegenotypeVariants. This allows us to move SelectVariants into the public suite of tools now.	2013-01-25 09:42:04 -05:00
Mark DePristo	c7a29b1d39	Fixed NPE in ActiveRegionUnitTest by allowing null supporting states in ActiveRegion	2013-01-24 13:48:00 -05:00
Mark DePristo	592f90aaef	ActivityProfile now cuts intelligently at the best local minimum when in a larger than max size active region -- This new algorithm is essential to properly handle activity profiles that have many large active regions generated from lots of dense variant events. The new algorithm passes unit tests and passes visualize visual inspection of both running on 1000G and NA12878 -- Misc. commenting of the code -- Updated ActiveRegionExtension to include a min active region size -- Renamed ActiveRegionExtension to ActiveRegionTraversalParameters, as it carries more than just the traversal extension now	2013-01-24 13:48:00 -05:00
Mark DePristo	c96b64973a	Soft clip probability propagation is capped by the MAX_PROB_PROPAGATION_DISTANCE, which is 50 bp	2013-01-24 13:48:00 -05:00
Mark DePristo	0c94e3d96e	Adaptively compute the band pass filter from the sigma, up to a maximum size of 50 bp -- Previously we allowed band pass filter size to be specified along with the sigma. But now that sigma is controllable from walkers and from the command line, we instead compute the filter size given the kernel from the sigma, including all kernel points with p > 1e-5 in the kernel. This means that if you use a smaller kernel you get a small band size and therefore faster ART -- Update, as discussed with Ryan, the sigma and band size to 17 bp for HC (default ART wide) and max band size of 50 bp	2013-01-24 13:47:59 -05:00
Mark DePristo	9e43a2028d	Making band pass filter size, sigma, active region max size and extension all accessible from the command line	2013-01-24 13:47:59 -05:00
Mark DePristo	cd91e365f4	Optimize getCurrentContigLength and getLocForOffset in ActivityProfile	2013-01-24 13:47:59 -05:00
Eric Banks	6790e103e0	Moving lots of walkers back from protected to public (along with several of the VA annotations). Let's see whether Mauricio's automatic git hook really works!	2013-01-24 11:42:49 -05:00
Mark DePristo	ee8039bf25	Fix trivial call in unit test	2013-01-23 13:51:58 -05:00
Mark DePristo	09edc6baeb	TraverseActiveRegions now writes out very nice active region and activity profile IGV formatted files	2013-01-23 13:46:01 -05:00
Mark DePristo	8e8126506b	Renaming IncrementalActivityProfile to ActivityProfile -- Also adding a work in progress functionality to make it easy to visualize activity profiles and active regions in IGV	2013-01-23 13:46:01 -05:00
Mark DePristo	e917f56df8	Remove old ActivityProfile and old BandPassActivityProfile	2013-01-23 13:46:01 -05:00
Mark DePristo	7fd27a5167	Add band pass filtering activity profile -- Based on the new incremental activity profile -- Unit Tested! Fixed a few bugs with the old band pass filter -- Expand IncrementalActivityProfileUnitTest to test the band pass filter as well for basic properties -- Add new UnitTest for BandPassIncrementalActivityProfile -- Added normalizeFromRealSpace to MathUtils -- Cleanup unused code in new activity profiles	2013-01-23 13:46:01 -05:00
Mark DePristo	eb60235dcd	Working version of incremental active region traversals -- The incremental version now processes active regions as soon as they are ready to be processed, instead of waiting until the end of the shard as in the previous version. This means that ART walkers will now take much less memory than previously. On chr20 of NA12878 the majority of regions are processed with as few as 500 reads in memory. Over the whole chr20 only 5K reads were ever held in ART at one time. -- Fixed bug in the way active regions worked with shard boundaries. The new implementation no longer see shard boundaries in any meaningful way, and that uncovered a problem that active regions were always being closed across shard boundaries. This behavior was actually encoded in the unit tests, so those needed to be updated as well. -- Changed the way that preset regions work in ART. The new contract ensures that you get exactly the regions you requested. the isActive function is still called, but its result has no impact on the regions. With this functionality is should be possible to use the HC as a generic assembly by forcing it to operate over very large regions -- Added a few misc. useful functions to IncrementalActivityProfile	2013-01-23 13:46:00 -05:00
Mark DePristo	ce160931d5	Optimize creation of reads in ArtificialBAMBuilder -- Now caches the reads so subsequent calls to makeReads() don't reallocate the reads from scratch each time	2013-01-23 13:46:00 -05:00
Mark DePristo	e050f649fd	IncrementalActivityProfile, complete with extensive unit tests -- This is an activity profile compatible with fetching its implied active regions incrementally, as activity profile states are added	2013-01-23 13:45:21 -05:00
Mark DePristo	8d9b0f1bd5	Restructure ActivityProfiler into root class ActivityProfile and derived class BandPassActivityProfile -- Required before I jump in an redo the entire activity profile so it's can be run imcrementally -- This restructuring makes the differences between the two functionalities clearer, as almost all of the functionality is in the base class. The only functionality provided by the BandPassActivityProfile is isolated to a finalizeProfile function overloaded from the base class. -- Renamed ActivityProfileResult to ActivityProfileState, as this is a clearer indication of its actual functionality. Almost all of the misc. walker changes are due to this name update -- Code cleanup and docs for TraverseActiveRegions -- Expanded unit tests for ActivityProfile and ActivityProfileState	2013-01-23 13:45:21 -05:00
Mark DePristo	42b807a5fe	Unit tests for ActivityProfileResult	2013-01-23 13:45:20 -05:00
Mauricio Carneiro	7b8b064165	Last manual license update (hopefully) if everyone updates their git hook accordingly, this will be the last time I have to manually run the script. GSATDG-5	2013-01-18 16:13:07 -05:00
Ami Levy-Moonshine	0fb7b73107	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-18 15:03:42 -05:00
Ami Levy-Moonshine	826c29827b	change the default VCFs gatherer of the GATK (not just the UG)	2013-01-18 15:03:12 -05:00
Eric Banks	6a903f2c23	I finally gave up on trying to get the Haplotype/Allele merging to work in the HaplotypeCaller. I've resigned myself instead to create a mapping from Allele to Haplotype. It's cheap so not a big deal, but really shouldn't be necessary. Ryan and I are talking about refactoring for GATK2.5.	2013-01-18 01:21:08 -05:00
Eric Banks	ded659232b	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2013-01-16 22:49:56 -05:00
Eric Banks	a623cca89a	Bug fix for HaplotypeCaller, as reported on the forum: when reduced reads didn't completely overlap a deletion call, we were incorrectly trying to find the reference position of a base on the read that didn't exist. Added integration test to cover this case.	2013-01-16 22:47:58 -05:00
Mark DePristo	738c24a3b1	Add tests to ensure that all insertion reads appear in the active region traversal	2013-01-16 16:25:36 -05:00
Eric Banks	79bc818022	Bug fix for VariantsToVCF: old dbSNP files can have '-' as reference base and those records always need to be padded.	2013-01-16 16:15:58 -05:00
Mark DePristo	2a42b47e4a	Massive expansion of ActiveRegionTraversal unit tests, resulting in several bugfixes to ART -- UnitTests now include combinational tiling of reads within and spanning shard boundaries -- ART now properly handles shard transitions, and does so efficiently without requiring hash sets or other collections of reads -- Updating HC and CountReadsInActiveRegions integration tests	2013-01-16 15:30:00 -05:00
Mark DePristo	ddcb33fcf8	Cache result of getLocation() in Shard so we don't performance expensive calculation over and over	2013-01-16 15:30:00 -05:00

1 2 3 4 5 ...

3427 Commits (65d31ba4adfecb5cfa7efbb4e30e60c7a7975c71)