gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Ryan Poplin	4093909a56	Updating VQSR docs. Removing references to old best practices pages.	2012-08-01 14:30:24 -04:00
Eric Banks	52b93cab62	Merged bug fix from Stable into Unstable	2012-08-01 13:17:36 -04:00
Eric Banks	22bf052828	Fixing BQSR GATK docs	2012-08-01 13:17:16 -04:00
Eric Banks	459832ee16	Fixed bug in FastaAlternateReferenceMaker when input VCF has overlapping deletions as reported a while back on GS	2012-08-01 10:45:04 -04:00
Eric Banks	a4a41458ef	Update docs of FastaAlternateReferenceMaker as promised in older GS thread	2012-08-01 10:33:41 -04:00
Eric Banks	38e5419b11	Merged bug fix from Stable into Unstable	2012-08-01 09:50:31 -04:00
Eric Banks	56f8afab97	Requested by Geraldine: adding a utility to register deprecated walkers (and the major version of the first release since they were removed) so that the User Error printed out for e.g. CountCovariates now states: Walker CountCovariates is no longer available in the GATK; it has been deprecated since version 2.0.	2012-08-01 09:50:00 -04:00
Guillermo del Angel	0528337467	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-31 18:17:50 -04:00
Guillermo del Angel	4a23f3cd11	Simple cleanup of pool caller code - since usage is much more general than just calling pools, AF calculation models and GL calculation models are renamed from Pool -> GeneralPloidy. Also, don't have users specify special arguments for -glm and -pnrm. Instead, when running UG with sample ploidy != 2, the correct general ploidy modules are automatically detected and loaded. -glm now reverts to old [SNP\|INDEL\|BOTH] usage	2012-07-31 16:34:20 -04:00
Eric Banks	6cb10cef96	Fixed older GS reported bug. Actually, the problem really lies in Picard (can't set max records in RAM without it throwing an exception, reported on their JIRA) so I just masked out the problem by removing this never-used argument from this rarely-used tool.	2012-07-31 16:00:36 -04:00
Eric Banks	ab53d73459	Quick fix to user error catching	2012-07-31 15:50:32 -04:00
Eric Banks	10111450aa	Fixed AlignmentUtils bug for handling Ns in the CIGAR string. Added a UG integration test that calls a BAM with such reads (provided by a user on GetSatisfaction).	2012-07-31 15:37:22 -04:00
Mark DePristo	f7133ffc31	Cleanup syntax errors from BQSR reorganization	2012-07-31 08:11:05 -04:00
Mark DePristo	dad9bb1192	Changes order of writing BaseRecalibrator results so that if R blows up you still get a meaningful tree	2012-07-31 08:11:04 -04:00
Mark DePristo	0c4e729e13	Working version of adaptive context calculations -- Uses chi2 test for independences to determine if subcontext is worth representing. Give excellent visual results -- Writes out analysis output file producing excellent results in R -- Trivial reformatting of MathUtils	2012-07-31 08:11:04 -04:00
Mark DePristo	93640b382e	Preliminary version of adaptive context covariate algorithm -- Works according to visual inspection of output tree	2012-07-31 08:11:04 -04:00
Mark DePristo	315d25409f	Improvement to RecalDatum and VisualizeContextTree -- Reorganize functions in RecalDatum so that error rate can be computed indepentently. Added unit tests. Removed equals() method, which is a buggy without it's associated implementation for hashcode -- New class RecalDatumTree based on QualIntervals that inherits from RecalDatum but includes the concept of sub data -- VisualizeContextTree now uses RecalDatumTree and can trivially compute the penalty function for merging nodes, which it displays in the graph	2012-07-31 08:11:04 -04:00
Mark DePristo	57b45bfb1e	Extensive unit tests, contacts, and documentation for RecalDatum	2012-07-31 08:11:03 -04:00
Mark DePristo	e00ed8bc5e	Cleanup BQSR classes -- Moved most of BQSR classes (which are used throughout the codebase) to utils.recalibration. It's better in my opinion to keep commonly used code in utils, and only specialized code in walkers. As code becomes embedded throughout GATK its should be refactored to live in utils -- Removed unncessary imports of BQSR in VQSR v3 -- Now ready to refactor QualQuantizer and unit test into a subclass of RecalDatum, refactor unit tests into RecalDatum unit tests, and generalize into hierarchical recal datum that can be used in QualQuantizer and the analysis of adaptive context covariate -- Update PluginManager to sort the plugins and interfaces. This allows us to have a deterministic order in which the plugin classes come back, which caused BQSR integration tests to temporarily change because I moved my classes around a bit.	2012-07-31 08:11:03 -04:00
Mark DePristo	191294eedc	Initial cleanup of RecalDatum for move and further refactoring -- Moved Datum, the now unnecessary superclass, into RecalDatum -- Fixed some obviously dangerous synchronization errors in RecalDatum, though these may not have caused problems because they may not have been called in parallel mode	2012-07-31 08:11:03 -04:00
Mark DePristo	0670316288	Be clearer that dcov 50 is good for 4x, should use 200 for >30x	2012-07-31 08:11:02 -04:00
Mark DePristo	874dbf5b58	Maximum wait for GATK run report upload reduced to 10 seconds	2012-07-31 08:11:02 -04:00
Ryan Poplin	7ed06ee7b9	Updating FindCoveredIntervals to use the changes to the ActiveRegionWalker.	2012-07-30 12:16:27 -04:00
Ryan Poplin	13591b169f	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-30 12:13:24 -04:00
Eric Banks	0b30588d67	Catch yet another class of User Errors	2012-07-30 11:59:56 -04:00
Eric Banks	5743694196	Merged bug fix from Stable into Unstable	2012-07-30 11:35:28 -04:00
Eric Banks	79195b97a3	Adding categories for the remaining uncategorized walkers	2012-07-30 11:35:08 -04:00
Eric Banks	2b1b00ade5	All integration tests and VC/Allele unit tests are passing	2012-07-27 17:03:49 -04:00
Eric Banks	beb7610195	Resolving merge conflicts	2012-07-27 15:52:02 -04:00
Eric Banks	27e7e11ec0	Allele refactoring checkpoint #3 : all integration tests except for PoolCaller are passing now. Fixed a couple of bugs from old code that popped up during md5 difference review. Added VariantContextUtils.requiresPaddingBase() method for tools that create alleles to use for determining whether or not to add the ref padding base. One of the HaplotypeCaller tests wasn't passing because of RankSumTest differences, so I added a TODO for Ryan to look into this.	2012-07-27 15:48:40 -04:00
Ryan Poplin	22bb4804f0	HaplotypeCaller now use an excessive number of high quality soft clips as a triggering signal in order to capture both end points of a large deletion in a single active region.	2012-07-27 12:44:02 -04:00
Ryan Poplin	a0890126a8	ActiveRegionWalker's isActive function returns a results object now instead of just a double.	2012-07-27 11:01:39 -04:00
Eric Banks	ef335b6213	Several more walkers have been brought up to use the new Allele representation.	2012-07-27 02:14:25 -04:00
Eric Banks	9e2209694a	Re-enable reverse trimming of alleles in UG engine when sub-selecting alleles after genotyping. UG integration tests now pass.	2012-07-27 00:47:15 -04:00
Eric Banks	baf3e33730	Allele refactoring checkpoint 2: all code finally compiles, AD and STR annotations are fixed, and most of the UG integration tests pass.	2012-07-26 23:27:11 -04:00
Ryan Poplin	35e803e110	Merged bug fix from Stable into Unstable	2012-07-26 14:00:04 -04:00
Ryan Poplin	4f741b4cd7	Smoothing in the BQSR bins should be one error observation and one non-error observation.	2012-07-26 13:59:02 -04:00
Guillermo del Angel	2ae890155c	Improvements to indel calling in pool caller: a) Compute per-read likelihoods in reference sample to determine wheter a read is informative or not. b) Fixed bugs in unit tests. c) Fixed padding-related bugs when computing matches/mismatches in ErrorModel, d) Added a couple of more integration tests to increase test coverage, including testing odd ploidy	2012-07-26 13:43:00 -04:00
Eric Banks	a694d1b5de	Merge branch 'master' into allelePadding	2012-07-26 01:53:14 -04:00
Eric Banks	32516a2f60	Initial checkpoint commit of VariantContext/Allele refactoring. There were just too many problems associated with the different representation of alleles in VCF (padded) vs. VariantContext (unpadded). We are moving VC to use the VCF representation. No more reference base for indels in VC and no more trimming and padding of alleles. Even reverse trimming has been stopped (the theory being that writers of VCF now know what they are doing and often want the reverse padding if they put it there; this has been requested on GetSatisfaction). Code compiles but presumably pretty much all tests with indels with fail at this point.	2012-07-26 01:50:39 -04:00
Mark DePristo	8c418a15da	Sorting out HMS error handling (fingers crossed) -- Check if a traversal error occurred in the last shard -- Catch ExecutionException from the TreeReducer and throw as our HMS execption -- ShardTraverser just throws the exception as formatted by the HMS, rather than wrapping it as a RuntimeException itself -- EngineFeaturesIntegrationTests now uses public exampleFASTA (faster), and does 1000x iterations (slower)	2012-07-25 23:13:12 -04:00
Mark DePristo	9242f63a4d	On the way to really sorting out HMS error handling -- Better error message when a traveral error occurs (a real bug) -- EngineFeaturesIntegrationTest runs the multi-threaded error testing routines 50x times -- A bit of cleanup in WalkerTest	2012-07-25 22:11:10 -04:00
Eric Banks	7eb3f54750	Added category docs for the remaining public walkers (I think I got them all). I removed a couple of totally unnecessary walkers.	2012-07-25 21:40:28 -04:00
Eric Banks	2982b24c4b	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable	2012-07-25 20:36:53 -04:00
Eric Banks	0a98a6aa8d	Adding extraDocs tag per Mauricio's request	2012-07-25 18:23:18 -04:00
Mauricio Carneiro	fce5cb9f35	Few category changes	2012-07-25 17:23:02 -04:00
Eric Banks	05fa377a8e	Adding GATK categories to standard walkers. Will add to remaining walkers after the next successful release (so that I can see which walkers are public and still need it).	2012-07-25 16:05:47 -04:00
Mauricio Carneiro	d46cf47bd1	Updating Read Filter documentation	2012-07-25 15:05:47 -04:00
Eric Banks	6a3bfa3811	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable	2012-07-25 14:11:11 -04:00
Eric Banks	357e0b35af	Register GATK-full-only walkers and rethrow the missing walker error as a not supported in GATK lite error	2012-07-25 14:11:03 -04:00
Roger Zurawicki	5b74763096	Removed Categories. We will use DocumentedGATKFeatures to create categories in our documentation. Eric I guess will be in charge of this. We need to remove walkers and think how to categorize everything. Tools can be hidden from GATKdocs with the @Hidden annotation Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2012-07-25 13:46:24 -04:00
Eric Banks	a5721a8846	Context covariate optimizations were not suited for multiple threads, so I removed them (since that ended up being much, much easier than trying to make the covariates thread local). Added -nt 2 layer to BQSR integration tests to confirm that it now works with multiple threads.	2012-07-25 13:38:07 -04:00
Eric Banks	e0c07f5567	Reverting old commits that made error handling better because ultimately they made things worse.	2012-07-25 12:37:59 -04:00
Mark DePristo	fcefa61bce	Remove reference dependence in BCF2Codec -- Adding BCF2Codec to VCF.jar and associated unit tests Signed-off-by: Mark DePristo <depristo@broadinstitute.org>	2012-07-25 08:56:38 -04:00
Mark DePristo	19a257a5c1	Multiple bugfixes -- VariantFiltration now properly sets passFilters in VC -- BCF2 writer now properly decodes lazy BCF genotype data that it uses. Improper use generated a horrible subtle bug but the good news is that the extra checks I put in (unnecessarily a few days ago) caught the bug! Signed-off-by: Mark DePristo <depristo@broadinstitute.org>	2012-07-25 08:56:38 -04:00
Mark DePristo	3066894215	Bugfix for BCF2 -- Always decode genotypes block when writing out a BCF file. If the header changes (and we currently don't know this easily) then the dictionary keys used in the genotypes block may be invalid. Temporarily added a private static boolean that turns off writing of the blocks until Eric and his team rewrite the header. Signed-off-by: Mark DePristo <depristo@broadinstitute.org>	2012-07-25 08:56:38 -04:00
Guillermo del Angel	eb55061fd0	a) Document BEAGLE codec, b) Bug fix: inbreeding coefficient shouldn't be computed for non-diploid organisms in current implementaiton	2012-07-24 12:16:15 -04:00
Mauricio Carneiro	348e86159e	Moving doclets to public	2012-07-23 23:52:14 -04:00
Mauricio Carneiro	5cd98a36b9	Making ForumAPIUtils public	2012-07-23 17:44:24 -04:00
Mauricio Carneiro	3d92f041f3	forgot to delete the merging line	2012-07-23 17:35:07 -04:00
Roger Zurawicki	f3c504769b	Added the ability to update the Forum GATKDocs looks for a key on gsa4, and updates the forum with new walker if it exists. More changes were made to the GATKDocs. Works nicely with bootstrap on and offline. Cleaned up the code as well Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2012-07-23 17:17:33 -04:00
Khalid Shakir	46ca49b63d	Removed 'Walker' suffix from packages/GATKEngine.xml that were breaking the packaged release. Archived AnalyzeCovariates scripts and removed references in build packages / GATK extensions.	2012-07-23 16:32:31 -04:00
Ryan Poplin	2a14bbe4f0	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-23 11:28:26 -04:00
Ryan Poplin	10d143c35c	Adding error model header names in the BQSR recal plot. Making the downsampling of points look a little nicer.	2012-07-23 11:28:17 -04:00
Eric Banks	675ccab2fa	Renaming BQSR to BaseRecalibrator	2012-07-23 10:17:17 -04:00
Ryan Poplin	2e486d83e2	Updating HaplotypeCaller docs and expanding integration tests.	2012-07-23 10:05:42 -04:00
Mauricio Carneiro	921eaad33f	Generalized the default platform parameter in BQSRv2 Parameter wasn't working outside of the BQSR walker. It now takes the information on the recalibration report in other tools (PrintReads for example) and treats all reads as coming from the defined default platform.	2012-07-20 17:29:13 -04:00
Mauricio Carneiro	5dc2143142	Removed support for walkers ending with "Walker" from the engine. If your walker has "Walker" in the name, you will have to use "Walker" on the -T to access it.	2012-07-20 17:27:11 -04:00
Mauricio Carneiro	d446d34227	GATK Error messages now point to the new website instead of GetSatisfaction.	2012-07-20 17:27:11 -04:00
Mauricio Carneiro	116885a450	Removed the "Walker" suffix from all walkers that had it. * Did not touch archived walkers... those can be named whatever. * Kept abstract classes that end in Walker untouched (e.g. LocusWalker, ReadWalker, ...) * Renamed a few inner classes due to conflict when stripping off Walker from their outer classes: ContigStats, FlagStats and FastaStats.	2012-07-20 17:27:11 -04:00
Christopher Hartl	3ee46cced2	Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-19 21:25:40 -04:00
Christopher Hartl	af383c30b5	Ensure that the gene summary has a header line	2012-07-19 21:24:04 -04:00
Mark DePristo	2ca5fc62a2	Support for MISSING BCF2 type -- Heng wants to use 0x0? to represent any missing type value, which in our implementation was invalid. Updated our codebase to support this construct. Heng said he'll update the BCF2 quick reference. -- Enabled integration test reading Heng's ex2.bcf file -- GATK now only warns in the case where the END info field isn't the same (or +1 due to padding) as the getEnd() function as determined by the GATK. Turns out there's a single record in the 1000G SV call set that doesn't have the right length -- VariantContextTestProvider now tests that X = Y where X -> writing -> reading -> writing -> reading = Y for a variety of variant context inputs X -- Added integration test reading 1000G SV chr1 calls (from Chris)	2012-07-19 16:14:26 -04:00
Guillermo del Angel	c16f9f2f15	a) Use new method to check for GATK Like, b) minor improvements to indel pool caller (more to come): brain-dead, quick way to limit number of alt alleles to genotype. We can't process too many alt alleles because of the combinatorial explosion of GL values with high ploidy, and some STR validation targets had up to 12 alt alleles, resulting of GL vectors of > 1e8 elements. Can't use pileup elements since typically not many alleles will be in one pileup, and different alleles will appear in different samples, TBD a nicer solution. c) Commit to posterity scala script for large scale validation calling, still work in progress	2012-07-19 10:24:08 -04:00
Eric Banks	e370030e6c	As requested by Mark, I've broken out the code to pull out the protected subclass when available (and otherwise use the public version) into the GATKLiteUtils class. People should use this code instead of reimplementing all of the java reflection on their own.	2012-07-18 22:44:37 -04:00
Eric Banks	d46ccec04e	Adding Unit Tests to cover the exception catching for Picard errors: because we are using String matching, we want to ensure that we know if/when the exception text changes underneath us.	2012-07-18 21:48:58 -04:00
Mark DePristo	74e153ff4a	FisherStrand now uses RankSumTest isUsableBase to decide if a read should be included in testing -- Previously used hardcoded MAPQ > 20 && QUAL > 20 but now uses isUsableBase -- Updating MD5s as appropriate	2012-07-18 16:07:47 -04:00
Mark DePristo	dede3a30e9	Improvements to the validation report of VariantEval -- If eval has genotypes and comp has genotypes, then subset the genotypes of comp down to the samples being evaluated when considering TP, FP, FN, TN status. This is important in the case where you want to use this to assess, for example, the quality of calls on NA12878 but you have a CEU trio comp VCF. The previous version was counting sites polymorphic in mom against the calls in NA12878. -- Added testdata VCF and integrationtests to ensure this behavior continues in the future -- TODO: actually run integration tests when I have an internet connection	2012-07-18 16:07:47 -04:00
Mark DePristo	559a4826be	Improvements to the validation report of VariantEval -- If eval has genotypes and comp has genotypes, then subset the genotypes of comp down to the samples being evaluated when considering TP, FP, FN, TN status. This is important in the case where you want to use this to assess, for example, the quality of calls on NA12878 but you have a CEU trio comp VCF. The previous version was counting sites polymorphic in mom against the calls in NA12878. -- Added testdata VCF and integrationtests to ensure this behavior continues in the future	2012-07-18 16:07:46 -04:00
Mark DePristo	dc292c0317	FisherStrand now includes all reads and bases, regardless of mapping quality and base quality, just like other annotations -- This actually proved to be a problem with Ion Torrent data where the base quality can be quite low, and so we need to include Q15 bases for calling effectively.	2012-07-18 16:07:46 -04:00
Eric Banks	2c0f073ab1	Make -qq arg hidden for now since it's still very experimental	2012-07-18 15:43:25 -04:00
Eric Banks	b46c85e8b4	More bad BAM file catching	2012-07-18 15:26:31 -04:00
Eric Banks	659eee13a6	Handle NPE generated in UG when non-standard reference bases are present in the fasta	2012-07-18 15:16:27 -04:00
Eric Banks	9af2cfe283	Catch underlying file system problems that get masked as Tribble index errors. There's also a quick patch to the HMS that isn't really the ultimate fix needed; Mark and I will review at a later point.	2012-07-18 15:11:38 -04:00
Eric Banks	4c730542f0	Handle RuntimeExceptions thrown by Picard that are really User Errors. I will add unit tests for these as best I can later.	2012-07-18 13:56:35 -04:00
Eric Banks	ae08d35138	Catch 'too many open files' errors that show up when trying to read the bam index. All that needs to be done is to flesh out the original error message (because it will get caught later and rethrown correctly).	2012-07-18 12:57:34 -04:00
Eric Banks	f2fe59a9d4	Wow, there are a ton of errors captured having to do with being unable to merge the temp Tribble output. I'm expanding the error message a bit to help see if we can do anything going forward.	2012-07-18 12:31:59 -04:00
Eric Banks	e4db8dde91	Enabled a whole other bunch of integration tests for BQSRv2. While I was there I also changed the default context size for indels to 3 (from 8) since that's what works best in the current implementation (as suggested by Ryan). At this point, all of the new core tools (ReduceReads, BQSRv2, HaplotypeCaller, UG extensions) have been moved over to protected and should be stable. Looks like we are pretty much ready for GATK 2.0!	2012-07-17 23:36:43 -04:00
Eric Banks	a8d08ea18d	As a user pointed out, it is not valid for a GenomeLoc to have a start or stop equal to 0.	2012-07-17 22:18:43 -04:00
Guillermo del Angel	29273abab7	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-17 16:58:12 -04:00
Guillermo del Angel	731bbba2e6	Bug fixes for integration test, use correct new UG syntax	2012-07-17 16:57:59 -04:00
Eric Banks	33be41ecf5	Cleaning up integration test	2012-07-17 16:06:04 -04:00
Eric Banks	8dbc9cb29c	Add the ability to emit the original quals in the OQ tag	2012-07-17 15:52:56 -04:00
Guillermo del Angel	40b8c7172c	Pool Caller refactoring in preparation of GATK 2.0: a) PoolCallerUnifiedArgumentCollection disappeared, and arguments moved to UnifiedArgumentCollection. b) PoolCallerWalker is no longer needed and redundant, all functionality subsumed by UG. UG now checks if GATK is lite - if so, don't allow ploidy > 2. c) Moved pool classes from private to protected. d) Changed the way to specify ploidy. Instead of specifying samples per pool and having ploidy = 2*samplesPerPool, have user specify ploidy directly, which is cleaner. Update tests accordingly. We can now call triploid seedless grape genotypes correctly in theory. e) Renamed argument -reference to -reference_sample_calls since the former is ambiguous and it's not clear what it refers to.	2012-07-17 15:27:04 -04:00
Laurent Francioli	68d0e4dd6d	- Multi-allelic sites are now correctly ignored - Reporting of mendelian violations enhanced - Corrected TP overflow by caping it to Bye.MAX_VALUE -Updated integrationtests to reflect changes in MVF file output Signed-off-by: Eric Banks <ebanks@broadinstitute.org>	2012-07-17 15:21:10 -04:00
Eric Banks	b0d99fd10d	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-17 15:12:28 -04:00
Eric Banks	305db8c0d1	Total rewrite of the isGATKLite() functionality with help of Khalid/David. PluginManager was not working for us.	2012-07-17 15:11:03 -04:00
Ryan Poplin	6efbcd99f1	HaplotypeCaller is now an AnnotatorCompatibleWalker with all the rights and privileges pertaining thereto. Enabling the ClippingRankSumTest after showing it was useful for 1000 Genomes calling.	2012-07-17 14:38:36 -04:00
Eric Banks	110886e8b9	Oops, got the logic wrong.	2012-07-17 13:37:11 -04:00
Eric Banks	a963b37424	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-17 13:15:37 -04:00

1 2 3 4 5 ...

2202 Commits (06258c8a0154db2ac3fca6e20fe7036e27794485)