gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Ryan Poplin	eb63221875	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable	2012-08-30 09:19:35 -04:00
Eric Banks	150a969279	Be careful with String manipulation when constructing alleles in SomaticIndelDetector	2012-08-29 15:13:28 -04:00
Eric Banks	3d476487c6	LIBS is totally busted for deletions. Putting a check in AD for bad pileup event bases so that we don't produce busted alleles. We must fix LIBS ASAP.	2012-08-27 12:13:12 -04:00
Mark DePristo	dcc972a557	Usability cleanup for BQSR -- I'm seeing a lot of people trying to use BinaryTagCovariate in the community. They really shouldn't do this, so I moved it to private. -- Throw an exception if its required bintag argument is missing -- Check explicitly if user is requesting DinucCovariate and tell them that its been retired in favor of ContextCovariate -- Show the type (Required, Experimental, Standard) of the covariates when running --list	2012-08-25 14:53:00 -04:00
Ryan Poplin	5f8574bd15	Fixing typo in error message.	2012-08-24 10:48:41 -04:00
Ryan Poplin	e5cfdb4811	Bug fix for popular _Duplicate allele added to VariantContext_ error reported on the forum. It seems to be due to lower case bases in the reference being treated as reference mismatches. We would try to turn these mismatches into SNP events, for example c/C. We now uppercase the result from IndexedFastaSequenceFile.getSubsequenceAt()	2012-08-22 14:39:35 -04:00
Eric Banks	03017855e4	WTF - why is support for whole-read insertions all messed up in LIBS? I've pushed a temporary patch for now (the right solution should certainly not be implemented in stable; LIBS needs to be better thought out). Added another unit test.	2012-08-22 00:24:01 -04:00
Eric Banks	40d5efc804	Fix for Adam K's reported bug: we weren't handling reads that were entirely insertions properly in LIBS. Specifically, the event bases were off-by-one (which was disasterous in Adam's case with a 1bp read). Added a unit test to cover this case.	2012-08-20 23:12:41 -04:00
Ryan Poplin	5db3bd6fd2	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-20 15:28:57 -04:00
Ryan Poplin	464d49509a	Pulling out common caller arguments into its own StandardCallerArgumentCollection base class so that every caller isn't exposed to the unused arguments from every other caller.	2012-08-20 15:28:39 -04:00
Eric Banks	4450d66c64	Fixing the docs for DP and AD	2012-08-20 15:10:24 -04:00
Ryan Poplin	c67d708c51	Bug fix in HaplotypeCaller for non-regular bases in the reference or reads. Those events don't get created any more. Bug fix for advanced GenotypeFullActiveRegion mode: custom variant annotations created by the HC don't make sense when in this mode so don't try to calculate them.	2012-08-20 13:41:08 -04:00
Eric Banks	154f65e0de	Temporarily disabling multi-threaded usage of BaseRecalibrator for performance reasons.	2012-08-20 12:43:17 -04:00
Eric Banks	97b191f578	Thanks to Guillermo I was able to isolate an instance of where the MLEAC > AN. It turns out that this is valid, e.g. when PLs are all 0s for a sample we no-call it but it's allowed to factor into the MLE (since that's the contract with the exact model). Removing the check in UG and instead protecting for it in the AlleleCount stratification.	2012-08-20 01:16:23 -04:00
Mark DePristo	7fa76f719b	Print "Parsing data stream with BCF version BCFx.y" in BCF2 codec as .debug not .info	2012-08-19 10:32:55 -04:00
Mark DePristo	9121b98167	CombineVariants outputs the first non-MISSING qual, not the maximum -- When merging multiple VCF records at a site, the combined VCF record has the QUAL of the first VCF record with a non-MISSING QUAL value. The previous behavior was to take the max QUAL, which resulted in sometime strange downstream confusion.	2012-08-19 10:29:38 -04:00
Mauricio Carneiro	d16cb68539	Updated and more thorough version of the BadCigar read filter * No reads with Hard/Soft clips in the middle of the cigar * No reads starting with deletions (with or without preceding clips) * No reads ending in deletions (with or without follow-up clips) * No reads that are fully hard or soft clipped * No reads that have consecutive indels in the cigar (II, DD, ID or DI) Also added systematic test for good cigars and iterative test for bad cigars.	2012-08-17 17:05:27 -04:00
Mark DePristo	980685af16	Fix GSA-137: Having both DataSource.REFERENCE and DataSource.REFERENCE_BASES is confusing to end users. -- Removed REFERENCE_BASES option. You only have REFERENCE now. There's no efficiency savings for the REFERENCE_BASES option any longer, since the reference bases are loaded lazy so if you don't use them there's effectively no cost to making the RefContext that could load them.	2012-08-17 14:55:38 -04:00
Eric Banks	2676b7fc2e	Put in a sanity check that MLEAC <= AN	2012-08-17 11:49:53 -04:00
Mark DePristo	daa26cc64e	Print to logger not to System.out in CachingIndexFastaSequenceFile when profiling cache performance	2012-08-17 11:49:02 -04:00
Mark DePristo	be0f8beebb	Fixed GSA-434: GATK should generate error when gzipped FASTA is passed in. -- The GATK sort of handles this now, but only if you have the exactly correct sequence dictionary and FAI files associated with the reference. If you do, the file can be .gz. If not, the GATK will fail on creating the FAI and DICT files. Added an error message that handles this case and clearly says what to do.	2012-08-17 11:49:02 -04:00
Mark DePristo	a3d2764d11	Fixed: GSA-392 @arguments with just a short name get the wrong argument bindings -- Now blows up if an argument begins with -. Implementation isn't pretty, as it actually blows up during Queue extension creation with a somewhat obscure error message but at least its something.	2012-08-17 11:49:01 -04:00
Mark DePristo	4c0f198d48	Potential fix for GSA-484: Incomplete writing of temp BCF when running CombineVariants in parallel -- Keep reading from BCF2 input stream when read(byte[]) returns < number of needed bytes -- It's possible (I think) that the failure in GSA-484 is due to multi-threading writing/reading of BCF2 records where the underlying stream is not yet flushed so read(byte[]) returns a partial result. No loops until we get all of the needed bytes or EOF is encounted	2012-08-17 11:49:01 -04:00
Mark DePristo	de3be45806	Proper function call in BCF2Decoder to validateReadBytes	2012-08-17 11:49:01 -04:00
Eric Banks	53383e82ec	Hmm, not good. Fixing the math in PBT resulted in changed MD5s for integration tests that look like significant changes. I am reverting and will report this to Laurent.	2012-08-16 21:41:18 -04:00
Eric Banks	65c594afff	Better error message for reads that begin/end with a deletion in LIBS	2012-08-16 21:27:07 -04:00
Mark DePristo	6a2862e8bc	GSA-483: Bug in GATKdocs for Enums -- Fixed to no long show constants in enums as constant values in the gatkdocs	2012-08-16 16:24:17 -04:00
Eric Banks	3253fc216b	FindBugs 'Maintainability' fixes	2012-08-16 15:53:06 -04:00
Eric Banks	05cbf1c8c0	FindBugs 'Efficiency' fixes	2012-08-16 15:40:52 -04:00
Mark DePristo	d8071c66ed	Removing SlowGenotype object from GATK	2012-08-16 15:23:06 -04:00
Eric Banks	a22e7a5358	Should've run 'ant clean' instead of just 'ant'. In any event, these are 2 cases where we are setting a class's internal static variable directly. Very dangerous.	2012-08-16 15:07:32 -04:00
Eric Banks	47b4f7b7e5	One final FindBugs related fix. I think it's safe to consider these changes 'fixes' that are allowed to go in during a code freeze.	2012-08-16 14:59:05 -04:00
Eric Banks	ded0e11b45	Killing off some FindBugs 'Realiability' issues	2012-08-16 14:00:48 -04:00
Eric Banks	dac3958461	Killing off some FindBugs 'Usability' issues	2012-08-16 13:32:44 -04:00
Eric Banks	611d9b61e2	Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-08-16 13:05:36 -04:00
Eric Banks	2df04dc48a	Fix for performance problem in GGA mode related to previous --regenotype commit. Instead of trying to hack around the determination of the calculation model when it's not needed, just simply overload the calculateGenotypes() method to add one that does simple genotyping. Re-enabling the Pool Caller integration tests.	2012-08-16 13:05:17 -04:00
Mark DePristo	132cdfd9c1	GSA-488: MLEAC > AN error when running variant eval fixed	2012-08-16 13:03:14 -04:00
Mark DePristo	4e42988c66	GSA-485: Remove repairVCFHeader from GATK codebase -- Removed half-a*ssed attempt to automatically repair VCF files with bad headers, which allowed users to provide a replacement header overwriting the file's actually header on the fly. Not a good idea, really. Eric has promised to create a utility that walks through a VCF file and creates a meaningful header field based on the file's contents (if this ever becomes a priority)	2012-08-16 13:03:13 -04:00
Mark DePristo	52bfe8db8a	Make sure the storage writer is closed before running mergeInfo in multi-threaded output management -- It's not clear this is cause of GSA-484 but it will help confirm that it's not the cause	2012-08-16 13:03:13 -04:00
Mark DePristo	7a247df922	Added -bcf argument to VCFWriter output to force BCF regardless of file extension -- Now possible to do -o /dev/stdout -bcf -l DEBUG > tmp.bcf and create a valid BCF2 file -- Cleanup code to make sure extensions easier by moving to a setX model in VariantContextWriterStub	2012-08-16 13:03:13 -04:00
Mark DePristo	28c8e3e6d7	Cleanup BCF2Codec -- Remove FORBID_SYMBOLIC global that is no longer necessary -- all error handling goes via error() function	2012-08-16 13:03:13 -04:00
Mark DePristo	9dc694b2e9	Meaningful error message and keeping tmp file when mergeInfo fails -- BCF2 is failing for some reason when merging tmp. files with parallel combine variants. ThreadLocalOutputTracker no longer sets deleteOnExit on the tmp file, as this prevents debugging. And it's unnecessary because each mergeInto was deleting files as appropriate -- MergeInfo in VariantContextWriterStorage only deletes the intermediate output if an error occurs	2012-08-16 13:03:13 -04:00
Eric Banks	f368e568db	Implementing support in BaseRecalibrator for SOLiD no call strategies other than throwing an exception. For some reason we never transfered these capabilities into BQSRv2 earlier.	2012-08-15 22:52:56 -04:00
Eric Banks	9d09230c26	Better docs for verbose output of Pileup	2012-08-15 21:55:08 -04:00
Mark DePristo	c0a31b2e5b	CombineVariants parallel integration tests -- All tests but one (using old bad VCF3 input) run unmodified with parallel code. -- Disabled UNSAFE_VCF_PROCESSING for all but that test, which changes md5s because the output files have fixed headers -- Minor optimizations to simpleMerge	2012-08-15 21:13:16 -04:00
Mark DePristo	669c43031a	BCF2 optimizations; parallel CombineVariants -- BCF2 now determines whether it can safely write out raw genotype blocks, which is true in the case where the VCF header of the input is a complete, ordered subset of the output header. Added utilities to determine this and extensive unit tests (headerLinesAreOrderedConsistently) -- Cleanup collapseStringList and exploreStringList for new unit tests of BCF2Utils. Fixed bug in edge case that never occurred in practice -- VCFContigHeaderLine now provides its own key (VCFHeader.CONTIG_KEY) directly instead of requiring the user to provide it (and hoping its right) -- More ways to access the data in VCFHeader -- BCF2Writer uses a cache to avoid recomputing unnecessarily whether raw genotype blocks can be emitted directly into the output -- Optimization of fullyDecodeAttributes -- attributes.size() is expensive and unnecessary. We just guess that on average we need ~10 elements for the attribute map -- CombineVariants optimization -- filters are online HashSet but are sorted at the end by creating a TreeSet -- makeCombinations is now makePermutations, and you can request to create the permutations with or without replacement	2012-08-15 21:13:16 -04:00
Mark DePristo	ae4d4482ac	Parallel combine variants! -- CombineVariants is now TreeReducible! -- Integration tests running in parallel all pass except one (will fix) due to incorrect use of db=0 flag on input from old VCF format	2012-08-15 21:13:15 -04:00
Mark DePristo	bd7ed0d028	Enable efficient parallel output of BCF2 -- Previous IO stub was hardcoded to write VCF. So when you ran -nt 2 -o my.bcf you actually created intermediate VCF files that were then encoded single threaded as BCF. Now we emit natively per thread BCF, and use the fast mergeInfo code to read BCF -> write BCF. Upcoming optimizations to avoid decoding genotype data unnecessarily will enable us to really quickly process BCF2 in parallel -- VariantContextWriterStub forces BCF output for intermediate files -- Nicer debug log message in BCF2Codec -- Turn off debug logging of BCF2LazyGenotypesDecoder -- BCF2FieldWriterManager now uses .debug not .info, so you won't see all of that field manager debugging info with BCF2 any longer -- VariantContextWriterFactory.isBCFOutput now has version that accepts just a file path, not path + options	2012-08-15 21:13:15 -04:00
Mark DePristo	9459e6203a	Clean, documented implementation of ThreadFactory that monitors running / blocking / waiting time of threads it creates -- Expanded unit tests -- Support for clean logging of results to logger -- Refactored MyTime into AutoFormattingTime in Utils, out of TraversalEngine, for cleanliness and reuse -- Added docs and contracts to StateMonitoringThreadFactory	2012-08-15 21:13:15 -04:00
Mark DePristo	be3230a1fd	Initial implementation of ThreadFactory that monitors running / blocking / waiting time of threads it creates -- Created makeCombinations utility function (very useful!). Moved template from VariantContextTestProvider -- UnitTests for basic functionality	2012-08-15 21:13:15 -04:00

1 2 3 4 5 ...

2267 Commits (b9dab068eebd962a7af3fd73a18430c118cfb9fd)