gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	9e2209694a	Re-enable reverse trimming of alleles in UG engine when sub-selecting alleles after genotyping. UG integration tests now pass.	2012-07-27 00:47:15 -04:00
Eric Banks	baf3e33730	Allele refactoring checkpoint 2: all code finally compiles, AD and STR annotations are fixed, and most of the UG integration tests pass.	2012-07-26 23:27:11 -04:00
Ryan Poplin	35e803e110	Merged bug fix from Stable into Unstable	2012-07-26 14:00:04 -04:00
Ryan Poplin	4f741b4cd7	Smoothing in the BQSR bins should be one error observation and one non-error observation.	2012-07-26 13:59:02 -04:00
Guillermo del Angel	2ae890155c	Improvements to indel calling in pool caller: a) Compute per-read likelihoods in reference sample to determine wheter a read is informative or not. b) Fixed bugs in unit tests. c) Fixed padding-related bugs when computing matches/mismatches in ErrorModel, d) Added a couple of more integration tests to increase test coverage, including testing odd ploidy	2012-07-26 13:43:00 -04:00
Eric Banks	a694d1b5de	Merge branch 'master' into allelePadding	2012-07-26 01:53:14 -04:00
Eric Banks	32516a2f60	Initial checkpoint commit of VariantContext/Allele refactoring. There were just too many problems associated with the different representation of alleles in VCF (padded) vs. VariantContext (unpadded). We are moving VC to use the VCF representation. No more reference base for indels in VC and no more trimming and padding of alleles. Even reverse trimming has been stopped (the theory being that writers of VCF now know what they are doing and often want the reverse padding if they put it there; this has been requested on GetSatisfaction). Code compiles but presumably pretty much all tests with indels with fail at this point.	2012-07-26 01:50:39 -04:00
Mark DePristo	8c418a15da	Sorting out HMS error handling (fingers crossed) -- Check if a traversal error occurred in the last shard -- Catch ExecutionException from the TreeReducer and throw as our HMS execption -- ShardTraverser just throws the exception as formatted by the HMS, rather than wrapping it as a RuntimeException itself -- EngineFeaturesIntegrationTests now uses public exampleFASTA (faster), and does 1000x iterations (slower)	2012-07-25 23:13:12 -04:00
Mark DePristo	9242f63a4d	On the way to really sorting out HMS error handling -- Better error message when a traveral error occurs (a real bug) -- EngineFeaturesIntegrationTest runs the multi-threaded error testing routines 50x times -- A bit of cleanup in WalkerTest	2012-07-25 22:11:10 -04:00
Mark DePristo	5671992db3	RMDTrackBuilderUnitTest now uses private/testdata file to avoid filesystem race conditions	2012-07-25 22:05:04 -04:00
Eric Banks	7eb3f54750	Added category docs for the remaining public walkers (I think I got them all). I removed a couple of totally unnecessary walkers.	2012-07-25 21:40:28 -04:00
Eric Banks	2982b24c4b	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable	2012-07-25 20:36:53 -04:00
Eric Banks	0a98a6aa8d	Adding extraDocs tag per Mauricio's request	2012-07-25 18:23:18 -04:00
Mauricio Carneiro	fce5cb9f35	Few category changes	2012-07-25 17:23:02 -04:00
Eric Banks	05fa377a8e	Adding GATK categories to standard walkers. Will add to remaining walkers after the next successful release (so that I can see which walkers are public and still need it).	2012-07-25 16:05:47 -04:00
Mauricio Carneiro	d46cf47bd1	Updating Read Filter documentation	2012-07-25 15:05:47 -04:00
Eric Banks	6a3bfa3811	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable	2012-07-25 14:11:11 -04:00
Eric Banks	357e0b35af	Register GATK-full-only walkers and rethrow the missing walker error as a not supported in GATK lite error	2012-07-25 14:11:03 -04:00
Roger Zurawicki	5b74763096	Removed Categories. We will use DocumentedGATKFeatures to create categories in our documentation. Eric I guess will be in charge of this. We need to remove walkers and think how to categorize everything. Tools can be hidden from GATKdocs with the @Hidden annotation Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2012-07-25 13:46:24 -04:00
Eric Banks	a5721a8846	Context covariate optimizations were not suited for multiple threads, so I removed them (since that ended up being much, much easier than trying to make the covariates thread local). Added -nt 2 layer to BQSR integration tests to confirm that it now works with multiple threads.	2012-07-25 13:38:07 -04:00
Eric Banks	e0c07f5567	Reverting old commits that made error handling better because ultimately they made things worse.	2012-07-25 12:37:59 -04:00
Mark DePristo	16947e93f2	Integration test to ensure VariantFiltration makes . -> PASS/FAIL like VQSR Signed-off-by: Mark DePristo <depristo@broadinstitute.org>	2012-07-25 08:56:39 -04:00
Mark DePristo	fcefa61bce	Remove reference dependence in BCF2Codec -- Adding BCF2Codec to VCF.jar and associated unit tests Signed-off-by: Mark DePristo <depristo@broadinstitute.org>	2012-07-25 08:56:38 -04:00
Mark DePristo	19a257a5c1	Multiple bugfixes -- VariantFiltration now properly sets passFilters in VC -- BCF2 writer now properly decodes lazy BCF genotype data that it uses. Improper use generated a horrible subtle bug but the good news is that the extra checks I put in (unnecessarily a few days ago) caught the bug! Signed-off-by: Mark DePristo <depristo@broadinstitute.org>	2012-07-25 08:56:38 -04:00
Mark DePristo	3066894215	Bugfix for BCF2 -- Always decode genotypes block when writing out a BCF file. If the header changes (and we currently don't know this easily) then the dictionary keys used in the genotypes block may be invalid. Temporarily added a private static boolean that turns off writing of the blocks until Eric and his team rewrite the header. Signed-off-by: Mark DePristo <depristo@broadinstitute.org>	2012-07-25 08:56:38 -04:00
Guillermo del Angel	eb55061fd0	a) Document BEAGLE codec, b) Bug fix: inbreeding coefficient shouldn't be computed for non-diploid organisms in current implementaiton	2012-07-24 12:16:15 -04:00
Mauricio Carneiro	348e86159e	Moving doclets to public	2012-07-23 23:52:14 -04:00
Mauricio Carneiro	5cd98a36b9	Making ForumAPIUtils public	2012-07-23 17:44:24 -04:00
Mauricio Carneiro	3d92f041f3	forgot to delete the merging line	2012-07-23 17:35:07 -04:00
Roger Zurawicki	f3c504769b	Added the ability to update the Forum GATKDocs looks for a key on gsa4, and updates the forum with new walker if it exists. More changes were made to the GATKDocs. Works nicely with bootstrap on and offline. Cleaned up the code as well Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>	2012-07-23 17:17:33 -04:00
Khalid Shakir	46ca49b63d	Removed 'Walker' suffix from packages/GATKEngine.xml that were breaking the packaged release. Archived AnalyzeCovariates scripts and removed references in build packages / GATK extensions.	2012-07-23 16:32:31 -04:00
Ryan Poplin	2a14bbe4f0	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-23 11:28:26 -04:00
Ryan Poplin	10d143c35c	Adding error model header names in the BQSR recal plot. Making the downsampling of points look a little nicer.	2012-07-23 11:28:17 -04:00
Eric Banks	675ccab2fa	Renaming BQSR to BaseRecalibrator	2012-07-23 10:17:17 -04:00
Ryan Poplin	2e486d83e2	Updating HaplotypeCaller docs and expanding integration tests.	2012-07-23 10:05:42 -04:00
Guillermo del Angel	39f45127f3	Fix md5's broken by recent changes to FisherStrand calculation	2012-07-21 14:41:38 -04:00
Mauricio Carneiro	65f4b67b86	Fixing walker unit test with the new naming convention	2012-07-20 17:50:29 -04:00
Mauricio Carneiro	921eaad33f	Generalized the default platform parameter in BQSRv2 Parameter wasn't working outside of the BQSR walker. It now takes the information on the recalibration report in other tools (PrintReads for example) and treats all reads as coming from the defined default platform.	2012-07-20 17:29:13 -04:00
Mauricio Carneiro	5dc2143142	Removed support for walkers ending with "Walker" from the engine. If your walker has "Walker" in the name, you will have to use "Walker" on the -T to access it.	2012-07-20 17:27:11 -04:00
Mauricio Carneiro	d446d34227	GATK Error messages now point to the new website instead of GetSatisfaction.	2012-07-20 17:27:11 -04:00
Mauricio Carneiro	116885a450	Removed the "Walker" suffix from all walkers that had it. * Did not touch archived walkers... those can be named whatever. * Kept abstract classes that end in Walker untouched (e.g. LocusWalker, ReadWalker, ...) * Renamed a few inner classes due to conflict when stripping off Walker from their outer classes: ContigStats, FlagStats and FastaStats.	2012-07-20 17:27:11 -04:00
Christopher Hartl	3ee46cced2	Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-19 21:25:40 -04:00
Christopher Hartl	af383c30b5	Ensure that the gene summary has a header line	2012-07-19 21:24:04 -04:00
Mark DePristo	2ca5fc62a2	Support for MISSING BCF2 type -- Heng wants to use 0x0? to represent any missing type value, which in our implementation was invalid. Updated our codebase to support this construct. Heng said he'll update the BCF2 quick reference. -- Enabled integration test reading Heng's ex2.bcf file -- GATK now only warns in the case where the END info field isn't the same (or +1 due to padding) as the getEnd() function as determined by the GATK. Turns out there's a single record in the 1000G SV call set that doesn't have the right length -- VariantContextTestProvider now tests that X = Y where X -> writing -> reading -> writing -> reading = Y for a variety of variant context inputs X -- Added integration test reading 1000G SV chr1 calls (from Chris)	2012-07-19 16:14:26 -04:00
Guillermo del Angel	c16f9f2f15	a) Use new method to check for GATK Like, b) minor improvements to indel pool caller (more to come): brain-dead, quick way to limit number of alt alleles to genotype. We can't process too many alt alleles because of the combinatorial explosion of GL values with high ploidy, and some STR validation targets had up to 12 alt alleles, resulting of GL vectors of > 1e8 elements. Can't use pileup elements since typically not many alleles will be in one pileup, and different alleles will appear in different samples, TBD a nicer solution. c) Commit to posterity scala script for large scale validation calling, still work in progress	2012-07-19 10:24:08 -04:00
Eric Banks	5f5edeca63	Reverting move of BQSR tests to public, as per DR's email	2012-07-19 10:02:05 -04:00
Eric Banks	e370030e6c	As requested by Mark, I've broken out the code to pull out the protected subclass when available (and otherwise use the public version) into the GATKLiteUtils class. People should use this code instead of reimplementing all of the java reflection on their own.	2012-07-18 22:44:37 -04:00
Eric Banks	d46ccec04e	Adding Unit Tests to cover the exception catching for Picard errors: because we are using String matching, we want to ensure that we know if/when the exception text changes underneath us.	2012-07-18 21:48:58 -04:00
Eric Banks	9c1ab1b0c0	Move BQSR integration test and its dependent files into public; previously there was a protected->private dependency.	2012-07-18 21:11:33 -04:00
Mark DePristo	994c5c31c1	Enabling VariantEval integration tests for ValidationReport	2012-07-18 16:07:47 -04:00
Mark DePristo	74e153ff4a	FisherStrand now uses RankSumTest isUsableBase to decide if a read should be included in testing -- Previously used hardcoded MAPQ > 20 && QUAL > 20 but now uses isUsableBase -- Updating MD5s as appropriate	2012-07-18 16:07:47 -04:00
Mark DePristo	dede3a30e9	Improvements to the validation report of VariantEval -- If eval has genotypes and comp has genotypes, then subset the genotypes of comp down to the samples being evaluated when considering TP, FP, FN, TN status. This is important in the case where you want to use this to assess, for example, the quality of calls on NA12878 but you have a CEU trio comp VCF. The previous version was counting sites polymorphic in mom against the calls in NA12878. -- Added testdata VCF and integrationtests to ensure this behavior continues in the future -- TODO: actually run integration tests when I have an internet connection	2012-07-18 16:07:47 -04:00
Mark DePristo	559a4826be	Improvements to the validation report of VariantEval -- If eval has genotypes and comp has genotypes, then subset the genotypes of comp down to the samples being evaluated when considering TP, FP, FN, TN status. This is important in the case where you want to use this to assess, for example, the quality of calls on NA12878 but you have a CEU trio comp VCF. The previous version was counting sites polymorphic in mom against the calls in NA12878. -- Added testdata VCF and integrationtests to ensure this behavior continues in the future	2012-07-18 16:07:46 -04:00
Mark DePristo	dc292c0317	FisherStrand now includes all reads and bases, regardless of mapping quality and base quality, just like other annotations -- This actually proved to be a problem with Ion Torrent data where the base quality can be quite low, and so we need to include Q15 bases for calling effectively.	2012-07-18 16:07:46 -04:00
Eric Banks	2c0f073ab1	Make -qq arg hidden for now since it's still very experimental	2012-07-18 15:43:25 -04:00
Eric Banks	b46c85e8b4	More bad BAM file catching	2012-07-18 15:26:31 -04:00
Eric Banks	659eee13a6	Handle NPE generated in UG when non-standard reference bases are present in the fasta	2012-07-18 15:16:27 -04:00
Eric Banks	9af2cfe283	Catch underlying file system problems that get masked as Tribble index errors. There's also a quick patch to the HMS that isn't really the ultimate fix needed; Mark and I will review at a later point.	2012-07-18 15:11:38 -04:00
Eric Banks	4c730542f0	Handle RuntimeExceptions thrown by Picard that are really User Errors. I will add unit tests for these as best I can later.	2012-07-18 13:56:35 -04:00
Eric Banks	ae08d35138	Catch 'too many open files' errors that show up when trying to read the bam index. All that needs to be done is to flesh out the original error message (because it will get caught later and rethrown correctly).	2012-07-18 12:57:34 -04:00
Eric Banks	f2fe59a9d4	Wow, there are a ton of errors captured having to do with being unable to merge the temp Tribble output. I'm expanding the error message a bit to help see if we can do anything going forward.	2012-07-18 12:31:59 -04:00
Eric Banks	e4db8dde91	Enabled a whole other bunch of integration tests for BQSRv2. While I was there I also changed the default context size for indels to 3 (from 8) since that's what works best in the current implementation (as suggested by Ryan). At this point, all of the new core tools (ReduceReads, BQSRv2, HaplotypeCaller, UG extensions) have been moved over to protected and should be stable. Looks like we are pretty much ready for GATK 2.0!	2012-07-17 23:36:43 -04:00
Eric Banks	a8d08ea18d	As a user pointed out, it is not valid for a GenomeLoc to have a start or stop equal to 0.	2012-07-17 22:18:43 -04:00
Guillermo del Angel	29273abab7	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-17 16:58:12 -04:00
Guillermo del Angel	731bbba2e6	Bug fixes for integration test, use correct new UG syntax	2012-07-17 16:57:59 -04:00
Eric Banks	33be41ecf5	Cleaning up integration test	2012-07-17 16:06:04 -04:00
Eric Banks	8dbc9cb29c	Add the ability to emit the original quals in the OQ tag	2012-07-17 15:52:56 -04:00
Guillermo del Angel	40b8c7172c	Pool Caller refactoring in preparation of GATK 2.0: a) PoolCallerUnifiedArgumentCollection disappeared, and arguments moved to UnifiedArgumentCollection. b) PoolCallerWalker is no longer needed and redundant, all functionality subsumed by UG. UG now checks if GATK is lite - if so, don't allow ploidy > 2. c) Moved pool classes from private to protected. d) Changed the way to specify ploidy. Instead of specifying samples per pool and having ploidy = 2*samplesPerPool, have user specify ploidy directly, which is cleaner. Update tests accordingly. We can now call triploid seedless grape genotypes correctly in theory. e) Renamed argument -reference to -reference_sample_calls since the former is ambiguous and it's not clear what it refers to.	2012-07-17 15:27:04 -04:00
Laurent Francioli	68d0e4dd6d	- Multi-allelic sites are now correctly ignored - Reporting of mendelian violations enhanced - Corrected TP overflow by caping it to Bye.MAX_VALUE -Updated integrationtests to reflect changes in MVF file output Signed-off-by: Eric Banks <ebanks@broadinstitute.org>	2012-07-17 15:21:10 -04:00
Eric Banks	b0d99fd10d	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-17 15:12:28 -04:00
Eric Banks	305db8c0d1	Total rewrite of the isGATKLite() functionality with help of Khalid/David. PluginManager was not working for us.	2012-07-17 15:11:03 -04:00
Ryan Poplin	6efbcd99f1	HaplotypeCaller is now an AnnotatorCompatibleWalker with all the rights and privileges pertaining thereto. Enabling the ClippingRankSumTest after showing it was useful for 1000 Genomes calling.	2012-07-17 14:38:36 -04:00
Eric Banks	110886e8b9	Oops, got the logic wrong.	2012-07-17 13:37:11 -04:00
Eric Banks	a963b37424	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-17 13:15:37 -04:00
Eric Banks	3a64398d07	Cleaned up the isGATKLite check	2012-07-17 12:46:16 -04:00
Eric Banks	62c5228048	1) Revert previous change - indel recalibration is turned on by default and users of the Lite version will need to turn it off to avoid a User Error. 2) Implemented the engine.isGATKLite() method.	2012-07-17 12:23:40 -04:00
Chris Saunders	1913d1bbd0	Put RunReport S3 upload on timeout thread Move the RunReport S3 upload process onto a separate thread with a timeout allowing the parent to continue. Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2012-07-17 12:19:39 -04:00
Eric Banks	40618ac471	A bunch of BQSR changes: 1) by default we do not emit indel quals, but they can be turned on with --enable_indel_quals. 2) We check whether or not we are running in Lite mode (not done yet) and if so and the user is trying to recalibrate indels, we throw a User Error (not supported). 3) Like v1 we now allow the user to set the qual value below which we don't recalibrate (this was the remaining source of differences in the v1 vs. v2 plots).	2012-07-17 10:52:43 -04:00
Eric Banks	d5b3a2eabf	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-17 00:32:53 -04:00
Eric Banks	f657b8bda8	Complete overhaul of the BQSRv2 integration tests. Much more comprehensive. Still need to deal with a few tests that need some modifications before I'm done, but I'll take care of that sometime tomorrow.	2012-07-17 00:32:34 -04:00
Eric Banks	a003148d50	Move AnalyzeCovariates over too.	2012-07-16 16:11:56 -04:00
Eric Banks	0a89adbcdb	Add utility decorators so that classes can tell you which package source they come from if they want to (suggested by Khalid). Using those decorators, we can easily pull out the BQSR updateDataForPileupElement() method into a standard RecalibrationEngine and an AdvancedRecalibrationEngine and use the protected one (AdvancedRE) if available (otherwise, the public one).	2012-07-16 15:34:50 -04:00
Eric Banks	52baac1e16	Move BQSRv2 into public and v1 into the archive.	2012-07-16 14:23:38 -04:00
Khalid Shakir	07822d6c0f	Fixed input annotations for master/test files on DiffObjectsWalker.	2012-07-16 13:33:11 -04:00
Eric Banks	2a830939df	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-14 23:49:59 -04:00
Eric Banks	f29cadd7e2	By default, don't quantize quals in BQSRv2	2012-07-14 23:49:48 -04:00
Eric Banks	75543a3f22	ReadClipper.clipRead's claim that it doesn't modify the original read was false. Ultimately, GATKSAMRecord.clone (as documented) creates a soft copy of the read - so modifying e.g. the bases of the cloned read means that you modify the bases of the original read too. Because of this, when the BQSRv2 Context covariate was writing Ns over the low quality tails of the reads they got propagated out to the output BAM file (very bad). I've updated the ReadClipper docs and cleaned up the code (no reason to use a clone of the read anymore given that we are already modifying the original). For now, the simplest thing is to have the Context covariate store the original bases, overwrite low quality Ns, compute covariates, and rewrite the original bases; we can update later if needed.	2012-07-13 18:50:27 -04:00
Ryan Poplin	443f02ffc2	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-13 16:09:24 -04:00
Khalid Shakir	6dfcc486e8	In ApplyRecalibration marking filter as PASS instead of '.' when the site passes by calling .passFilters().	2012-07-13 15:40:56 -04:00
Ami Levy Moonshine	5d0a7335ea	remove unnecessary use in the PRIORITY list remove unneeded imports	2012-07-13 15:27:08 -04:00
Ryan Poplin	d70bb59182	HaplotypeCaller now calls insertion events that aren't fully assembled as symbolic alleles.	2012-07-10 14:22:23 -06:00
Guillermo del Angel	279dff9f81	Bug fix when specifying a JEXL expression for a field that doesn't exist: we should treat the whole expression as false, but we were rethrowing the JEXL exception in this case. Added integration test to cover this in SelectVariants	2012-07-10 13:59:00 -04:00
Mauricio Carneiro	7eb45b4038	Fixed BQSR IntegrationTests * BinaryTag covariate is Experimental, not Standard (this was breaking integration tests) * New parameter in the Recalibration report requires new MD5 for one of the integration tests.	2012-07-09 13:55:12 -04:00
Eric Banks	dd0c47ab7e	Don't cast to a specific walker type since any walker can use the VA engine	2012-07-09 10:25:58 -04:00
Mark DePristo	5b0ade67c8	Updates to VCF processing for better BCF processing -- getMetaData now split into getMetaDataInSortedOrder() [old functionality] and getMetaDataInOriginalOrder() [according to the header order]. Important as BCF uses the order of elements in the header in the offsets to keys, and we were automatically sorting the BCF2 header which is out of order in samtools and the whole system was going crazy -- Updating GATK code to use the appropriate header function (this is why so many files have changed) -- BCF2 code was busted in not differentiating PASS from . from FILTER in VC (tests coming that will actually stress this) -- Bugfix for adding contig lines to BCF2 header dictionary -- VCFHeader metaData no longer sorted internally. The system now maintains the data in header order, and only sorts output as requested in API -- VCFWriter and BCF2Writer now explictly sort their header lines -- Don't allow filters to be added that are PASS in the contract	2012-07-08 15:44:33 -07:00
Mark DePristo	63f5262e45	mergeInfoWithMaxAC is no longer hidden in CombineVariants	2012-07-08 15:44:32 -07:00
Mark DePristo	66aee613e2	Bugfix for set key in mergeInfoWithMaxAC. -- Previous version was always setting set=source of info with highest AC. Should actually have been set to the set annotation value itself.	2012-07-08 15:44:32 -07:00
Mark DePristo	91f0ed8059	Fixed nasty Rscript typo in VariantRecalibrator when compactPDF is available	2012-07-08 15:44:32 -07:00
Mark DePristo	87b090c362	Update VariantRecalibator error message to use -resource not old -B syntax	2012-07-08 15:44:31 -07:00
Mauricio Carneiro	125e6c1a47	added BinaryTagCovariate for ancient dna analysis	2012-07-06 15:03:20 -04:00
Mauricio Carneiro	e93b025b39	Fixing unit test with the new clipping behavior for weird cigars, we no longer can assert the final number of bases in the unit test, so I'm taking this bit off the unit test.	2012-07-06 12:08:09 -04:00
Mauricio Carneiro	f603d4c48c	Fixing PairHMMIndelErrorModel boundary issue When checking the limits of a read to clip, it wasn't considering reads that may already been clipped before.	2012-07-06 11:48:04 -04:00
Eric Banks	dd571d9aa0	Added a --no_indel_quals argument that when used with -BQSR inhibits the writing of base insertion and base deletion quality tags.	2012-07-04 01:22:20 -04:00
Eric Banks	33306d2e20	Changing the logic of the -standard argument; the way it stands currently one can never turn off the cycle or context covariates. Now they are on by default and users must opt out of them to turn them off.	2012-07-04 00:21:21 -04:00
Eric Banks	7d30558e6f	Only 'pad' the cycle covariate for indels, not substitutions	2012-07-03 23:47:01 -04:00
Mauricio Carneiro	17efbbf8b1	Fixed ReadClipperUnitTest The behavior of the clipping on weird cigar strings such as 1I1S1H and 9S56H has changed, and the test has to change accordingly.	2012-07-03 16:38:51 -04:00
Eric Banks	22f1afddaa	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-03 14:55:59 -04:00
Eric Banks	617eebd204	More misc cleanup	2012-07-03 14:55:37 -04:00
Eric Banks	344c3aeb1d	Cleanup from previous commit	2012-07-03 14:42:44 -04:00
Ryan Poplin	9e8e78de15	Adding the model name to the VQSR filter lines so that they don't get clobbered with consecutive VQSR runs for SNPs and then indels.	2012-07-03 14:30:37 -04:00
Eric Banks	0b37d44b0d	Optimizations for the RecalDatum to make BQSR (Count Covariates) much faster. Needs some cleanup.	2012-07-03 13:05:11 -04:00
Eric Banks	031322ff00	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-03 00:12:59 -04:00
Eric Banks	a4670113bd	Refactored/renamed the nested integer array; cleaned up code a bit.	2012-07-03 00:12:33 -04:00
Ryan Poplin	f92139dd82	Ooops, UG VA path for rank sum tests aren't happy with empty lists. Disabling clipping rank sum test for now.	2012-07-02 21:12:42 -04:00
Ryan Poplin	7e7b4cd1b9	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-02 16:37:54 -04:00
Ryan Poplin	b807ff63ef	HaplotypeCaller now creates MNP and complex substitutions by using LD information to decide if events segregate together on haplotypes. Added unit test.	2012-07-02 16:37:39 -04:00
Mauricio Carneiro	3cea080aa8	Cache SoftStart() and SoftEnd() in the GATKSAMRecord these are costly operations when done repeatedly on the same read.	2012-07-02 16:22:00 -04:00
Mauricio Carneiro	88a02fa2cb	Fixing but for reads with cigars like 9S54H When hard-clipping predict when the read is going to be fully hard clipped to the point where only soft/hard-clips are left in the read and preemptively eliminate the read before the SAMRecord mathematics on malformed cigars kills the GATK.	2012-07-02 16:22:00 -04:00
Mark DePristo	1b0a775773	Disabling bcf2 reading from samtools because it's 1 basis; updating select variants integrationtest	2012-07-02 15:55:42 -04:00
Eric Banks	cac72bce91	Initial version of int indexed mapping for BQSR. Will be cleaned up in a bit.	2012-07-02 14:33:33 -04:00
Mark DePristo	602729c09d	Moved parallel tests from SelectVariants to separate SelectVariantsParallelIntegrationTest -- Enabled previous tests -- all now working -- Added modern test against new VCF as well	2012-07-02 11:40:28 -04:00
Mark DePristo	bcd2e13d8b	Adding duplicate header line keys is a logger.debug not logger.warn message now	2012-07-02 11:39:34 -04:00
Mark DePristo	01e04992f8	Fixed compatibilities in AbstractVCFCodec that resulted in key=; being parsed as written as key; in VCF output	2012-07-02 11:38:59 -04:00
Eric Banks	c94c8a9c09	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-02 08:53:01 -04:00
Mark DePristo	7aff4446d4	Added unit tests for header repairing capabilities in the GATK engine	2012-07-01 15:38:10 -04:00
Mark DePristo	480b32e759	BCF2 is now officially zero-based open-interval, and that's how the GATK does it now	2012-07-01 14:59:27 -04:00
Ryan Poplin	b6093ff02c	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-07-01 10:32:37 -04:00
Mark DePristo	9b87dcda4f	Fixing remaining integration test errors. Adding missing ex2.bcf	2012-06-30 16:23:11 -04:00
Mark DePristo	5ad9a98a15	Minor bugfixes / consistency fixes to filter strings of Genotypes and AC/AF annotations -- GenotypeBuilder now sorts the list of filter strings so that the output is in a consistent order -- calculateChromosomeCounts removes the AC/AF fields entirely when there are no alt alleles, to be on VCF spec for A defined info field values	2012-06-30 11:22:49 -04:00
Mark DePristo	385a3c630f	Added check in VariantContext.validate to ensure that getEnd() == END value when present -- Fixed bug in VariantDataManager that this validation mode was intended to detect going forward -- Still no VariantRecalibrationWalkersIntegrationTest for indels with BCF2 but that's because LowQual is missing from test VCF	2012-06-30 11:22:48 -04:00
Mark DePristo	893630af53	Enabling symbolic alleles in BCF2 -- Bugfix for VCFDiffableReader: don't add null filters to object -- BCF2Codec uses new VCFAlleleClipper to handle clipping / unclipping of alleles -- AbstractVCFCodec: decodeLoc uses full decode() [still doesn't decode genotypes] to avoid dangerous code duplication. Refactored code that clipped alleles and determined end position into updateBuilderAllelesAndStop method that uses new VCFAlleleClipper. Fixed bug by ensuring the VCF codec always uses the END field in the INFO when it's provided, not just in the case where the there's a biallelic symbolic allele -- Brand new home for allele clipping / padding routines in VCFAlleleClipper. Actually documented this code, which results in lots of **** negative comments on the code quality. Eric has promised that he and Ami are going to rethink this code from scratch. Fixed many nasty bugs in here, cleaning up unnecessary branches, etc. Added UnitTests in VCFAlleleClipper that actually test the code full. In the process of testing I discovered lots of edge cases that don't work, and I've commented out failing tests or manually skipped them, noting how this tests need to be fixed. Even introduced some minor optimizations -- VariantContext: validateAllele was broken in the case where there were mixed symbolic and concrete alleles, failing validation for no reason. Fixed. -- Added computeEndFromAlleles() function to VariantContextUtils and VariantContextBuilder for convenience calculating where the VC really ends given alleles --	2012-06-30 11:22:48 -04:00
Mark DePristo	16276f81a1	BCF2 with support symbolic alleles -- refactored allele clipping / padding code into VCFAlleleClipping class, and added much needed docs and TODOs for methods dev guys -- Added real unit tests for (some) clipping operations in VCFUtilsUnitTest	2012-06-30 11:22:48 -04:00
Mark DePristo	86feea917e	Updating MD5s to reflect new FT fixed count of 1 not UNBOUNDED	2012-06-30 11:22:47 -04:00
Mark DePristo	6bea28ae6f	Genotype filters are now just Strings, not Set<String>	2012-06-30 11:22:47 -04:00
Guillermo del Angel	f631be8d80	UnifiedGenotyperEngine.calculateGenotypes() is not only used in UG but in other walkers - vc attributes shouldn't be inherited by default or it may cause undefined behaviour in those walkers, so only inherit attributes from input vc in case of UG calling this function	2012-06-29 23:51:52 -04:00
Guillermo del Angel	65037b87da	Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-06-29 11:08:44 -04:00
Guillermo del Angel	5a9a37ba01	Pool caller improvements: a) Log ref sample depth at every called site (will add more ref-related annotations later), b) Make -glm POOLBOTH work in case we want to genotype snp's and indels together, c) indel bug fix (pool and non-pool): prevent a bad GenomeLoc to be formed if we're running GGA and incoming alleles are larger than ref window size (typically 400 bb)	2012-06-29 11:08:16 -04:00
Eric Banks	96ea334bf2	Disable caching in BQSR for now since it significantly slows down computation; will look into this in a bit.	2012-06-28 15:27:44 -04:00
Ryan Poplin	05791ebf80	Adding the Clipping rank sum test: If alternate-supporting reads have more hard clipping than reference-supporting reads this is evidence for error.	2012-06-28 13:22:56 -04:00
Ryan Poplin	d12ec92a55	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-06-28 12:57:59 -04:00
Ryan Poplin	5bb0693888	Bug fix for HC GGA mode. Shouldn't try to add an indel into the haplotype if that haplotype already contains the event of interest. Misc minor assembly param changes. Turning off capping of base qualities by base indel qualities until we can evaluate that change.	2012-06-28 12:57:51 -04:00
Khalid Shakir	1ce0b9d519	Throwing UnknownTribbleType exception instead of CommandLineException when an unknown tribble type is specified.	2012-06-28 11:28:04 -04:00
Mark DePristo	734bb5366b	Special case the situation where we have ploidy == 0 (no GT values) to implicitly assume we have diploid samples -- numLikelihoods no longer allows even ploidy == 0 in requires -- VCFCompoundHeaderLine handles the case where ploidy == 0 => implicit ploidy == 2	2012-06-28 10:06:07 -04:00
Mark DePristo	064cc56335	Update integration tests to reflect new FT header line standard and new DiagnoseTargets field names	2012-06-28 10:06:06 -04:00
Mark DePristo	64d7e93209	Massive bugfixes -- Previous version was reading the size of the encoded genotypes vector for each genotype. This only worked because I never wrote out genotype field values with > 15 elements. Mauricio's killer DiagnoseTargets VCF uncovered the bug. Unfortunately since symbolic allele clipping is still busted those tests are still diabled -- GenotypeContext getMaxPloidy was returning -1 in the case where there are no genotypes, but the answer should be 0.	2012-06-28 10:06:06 -04:00
Mark DePristo	7144154f53	VCFWriter and BCFWriter no longer allow missing samples in the VC compared to their header -- They now throw an error, as its really unsafe to write out ./. as a special case in the VCFWriter as occurred previously. -- Added convenience method in VariantContextUtils.addMissingSamples(vc, allSamples) that returns a complete VC where samples are given ./. Genotype objects -- This allows us to properly pass tests of creating / writing / reading VCFs and BCFs, which previously differed because the VC from the VCF would actually be different from its original VC -- Updated UG, UGEngine, GenotypeAndValidateWalker, CombineVariants, and VariantsToVCF to manage the master list of samples they are writing out and addMissingSamples via the VCU function	2012-06-28 10:06:06 -04:00
Mark DePristo	4811a00891	GENOTYPE_FILTER_KEY is now a VCFStandardHeaderLine	2012-06-28 10:06:05 -04:00
Mark DePristo	93426a44b1	Fixes for DiagnoseTargets to be VCF/BCF2 spec complaint -- Don't use DP for average interval depth but rather AVG_INTERVAL_DP, which is a float now, not an int -- Don't add PASS filter value to genotypes, as this is actually considered failing filters in the GATK. Genotype filters should be empty for PASSing sites	2012-06-28 10:06:05 -04:00
Eric Banks	dc7636b923	Refactor the ContextCovariate to significantly reduce runtime	2012-06-28 02:29:35 -04:00
Eric Banks	1fafd9f6c8	NestedHashMap-based implementation of BQSRv2 along with a few minor optimizations. Not a huge runtime upgrade over the long bitset approach, but it allows us to implement further optimizations going forward. Integration test change because the original version had a bug in the quantized qual table creation.	2012-06-27 16:55:49 -04:00

1 2 3 4 5 ...

2525 Commits (4cbd11faf5a7331e1f95daf66d0f86f65c8f4833)