Commit Graph

10548 Commits (bfbf1686cd0f71c94dea59c84b6c74c71f0ae1af)

Author SHA1 Message Date
Ryan Poplin c67d708c51 Bug fix in HaplotypeCaller for non-regular bases in the reference or reads. Those events don't get created any more. Bug fix for advanced GenotypeFullActiveRegion mode: custom variant annotations created by the HC don't make sense when in this mode so don't try to calculate them. 2012-08-20 13:41:08 -04:00
Guillermo del Angel 5b5fee56cf Next iteration of new VA interface: extend changes to per-genotype annotations as well. Will allow to have AD correctly implemented at last (that change not done yet) 2012-08-20 12:52:15 -04:00
Eric Banks 154f65e0de Temporarily disabling multi-threaded usage of BaseRecalibrator for performance reasons. 2012-08-20 12:43:17 -04:00
Menachem Fromer 37dd7209df Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-20 12:31:34 -04:00
Guillermo del Angel c384677917 Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-20 10:27:25 -04:00
Eric Banks 97b191f578 Thanks to Guillermo I was able to isolate an instance of where the MLEAC > AN. It turns out that this is valid, e.g. when PLs are all 0s for a sample we no-call it but it's allowed to factor into the MLE (since that's the contract with the exact model). Removing the check in UG and instead protecting for it in the AlleleCount stratification. 2012-08-20 01:16:23 -04:00
Guillermo del Angel 963ad03f8b Second step of interface cleanup for variant annotator: several bug fixes, don't hash pileup elements to Maps because the hashCode() for a pileup element is not implemented and strange things can happen. Still several things to do, not done yet 2012-08-19 21:18:18 -04:00
Mark DePristo 7fa76f719b Print "Parsing data stream with BCF version BCFx.y" in BCF2 codec as .debug not .info 2012-08-19 10:32:55 -04:00
Mark DePristo 9121b98167 CombineVariants outputs the first non-MISSING qual, not the maximum
-- When merging multiple VCF records at a site, the combined VCF record has the QUAL of the first VCF record with a non-MISSING QUAL value.  The previous behavior was to take the max QUAL, which resulted in sometime strange downstream confusion.
2012-08-19 10:29:38 -04:00
Guillermo del Angel d9641e3d57 Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-19 09:23:21 -04:00
David Roazen 342a5b68ed Bring bamboo performance test runner script under version control 2012-08-18 21:08:29 -04:00
Mark DePristo d3206e35e0 Cleanup and expansion of GATKPerformanceOfTime
-- Does BQSR parallelism test
-- Does CountLoci parallelism test
-- Updated R script
2012-08-18 18:47:26 -04:00
Mauricio Carneiro d16cb68539 Updated and more thorough version of the BadCigar read filter
* No reads with Hard/Soft clips in the middle of the cigar
   * No reads starting with deletions (with or without preceding clips)
   * No reads ending in deletions (with or without follow-up clips)
   * No reads that are fully hard or soft clipped
   * No reads that have consecutive indels in the cigar (II, DD, ID or DI)

 Also added systematic test for good cigars and iterative test for bad cigars.
2012-08-17 17:05:27 -04:00
Mark DePristo 980685af16 Fix GSA-137: Having both DataSource.REFERENCE and DataSource.REFERENCE_BASES is confusing to end users.
-- Removed REFERENCE_BASES option.  You only have REFERENCE now.  There's no efficiency savings for the REFERENCE_BASES option any longer, since the reference bases are loaded lazy so if you don't use them there's effectively no cost to making the RefContext that could load them.
2012-08-17 14:55:38 -04:00
Eric Banks 2676b7fc2e Put in a sanity check that MLEAC <= AN 2012-08-17 11:49:53 -04:00
Mark DePristo 0a706c9105 Add support for CombineVariants nt option in GATKPerformanceOverTime
-- Also includes some nicer PDF formatting
2012-08-17 11:49:02 -04:00
Mark DePristo bf6c0aaa57 Fix for missing formatter in R 2.15
-- VariantCallQC now works on newest ESP call set
2012-08-17 11:49:02 -04:00
Mark DePristo daa26cc64e Print to logger not to System.out in CachingIndexFastaSequenceFile when profiling cache performance 2012-08-17 11:49:02 -04:00
Mark DePristo be0f8beebb Fixed GSA-434: GATK should generate error when gzipped FASTA is passed in.
-- The GATK sort of handles this now, but only if you have the exactly correct sequence dictionary and FAI files associated with the reference.  If you do, the file can be .gz.  If not, the GATK will fail on creating the FAI and DICT files.  Added an error message that handles this case and clearly says what to do.
2012-08-17 11:49:02 -04:00
Mark DePristo a3d2764d11 Fixed: GSA-392 @arguments with just a short name get the wrong argument bindings
-- Now blows up if an argument begins with -.  Implementation isn't pretty, as it actually blows up during Queue extension creation with a somewhat obscure error message but at least its something.
2012-08-17 11:49:01 -04:00
Mark DePristo 4c0f198d48 Potential fix for GSA-484: Incomplete writing of temp BCF when running CombineVariants in parallel
-- Keep reading from BCF2 input stream when read(byte[]) returns < number of needed bytes
-- It's possible (I think) that the failure in GSA-484 is due to multi-threading writing/reading of BCF2 records where the underlying stream is not yet flushed so read(byte[]) returns a partial result.  No loops until we get all of the needed bytes or EOF is encounted
2012-08-17 11:49:01 -04:00
Mark DePristo de3be45806 Proper function call in BCF2Decoder to validateReadBytes 2012-08-17 11:49:01 -04:00
Mark DePristo 67ebd65512 Bugfix for potential SEGFAULT with JNA getting execution hosts for LSF with multiple hosts 2012-08-17 11:49:01 -04:00
Mark DePristo 54e7302daf Improvements to GATKPerformanceOverTime
-- CombineVariants parallelism test
-- Easy way to ask for specific runs with enum argument
-- Update for R to handle new outputs
2012-08-17 11:49:01 -04:00
Eric Banks 53383e82ec Hmm, not good. Fixing the math in PBT resulted in changed MD5s for integration tests that look like significant changes. I am reverting and will report this to Laurent. 2012-08-16 21:41:18 -04:00
Eric Banks 65c594afff Better error message for reads that begin/end with a deletion in LIBS 2012-08-16 21:27:07 -04:00
Guillermo del Angel b61ecc7c19 Fix merge conflicts 2012-08-16 20:45:52 -04:00
Guillermo del Angel d26183e0ec First preliminary big refactoring of UG annotation engine. Goals: a) Remove gigantic hack that cached per-read haplotype likelihoods in a static array so that annotations would go back and retrieve them, b) unify interface for annotations between HaplotypeCaller and UnifiedGenotyper, c) as a consequence, removed and cleaned duplicated code. As a bonus, annotations have now more relevant info to help them compute values.
Major idea is that per-read haplotype likelihoods are now stored in a single unified object of class PerReadAlleleLikelihoodMap. Class implementation in theory hides internal storage details from outside work (still may need work cleaning up interface), and this object(or rather, a Map from Sample->perReadAlleleLikelihoodMap) is produced by UGCalcLikelihoods. The genotype calculation is also able to potentially use this info if needed. All InfoFieldAnnotations now get an extra argument with this map. Currently, this map is only produced for indels in UG, or for all variants within HaplotypeCaller. If this map is absent (SNPs in UG), the old Pileup interface is used, but it's avoided whenever possible. FORMAT annotations are not yet changed but will be focus of second step. Major benefit will be that annotations will be able to very easily discard non-informative reads for certain events. HaplotypeCaller also uses this new class, and no longer hard-codes the mapping of allele ->list(reads) but instead uses the same objects and interfaces as the rest of the modules. Code still needs further testing/cleaning/reviewing/debugging
2012-08-16 20:36:53 -04:00
Mark DePristo 6a2862e8bc GSA-483: Bug in GATKdocs for Enums
-- Fixed to no long show constants in enums as constant values in the gatkdocs
2012-08-16 16:24:17 -04:00
Eric Banks 3253fc216b FindBugs 'Maintainability' fixes 2012-08-16 15:53:06 -04:00
Eric Banks 05cbf1c8c0 FindBugs 'Efficiency' fixes 2012-08-16 15:40:52 -04:00
Mark DePristo d8071c66ed Removing SlowGenotype object from GATK 2012-08-16 15:23:06 -04:00
Eric Banks a22e7a5358 Should've run 'ant clean' instead of just 'ant'. In any event, these are 2 cases where we are setting a class's internal static variable directly. Very dangerous. 2012-08-16 15:07:32 -04:00
Eric Banks 47b4f7b7e5 One final FindBugs related fix. I think it's safe to consider these changes 'fixes' that are allowed to go in during a code freeze. 2012-08-16 14:59:05 -04:00
Eric Banks ded0e11b45 Killing off some FindBugs 'Realiability' issues 2012-08-16 14:00:48 -04:00
Eric Banks dac3958461 Killing off some FindBugs 'Usability' issues 2012-08-16 13:32:44 -04:00
Eric Banks 611d9b61e2 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-16 13:05:36 -04:00
Eric Banks 2df04dc48a Fix for performance problem in GGA mode related to previous --regenotype commit. Instead of trying to hack around the determination of the calculation model when it's not needed, just simply overload the calculateGenotypes() method to add one that does simple genotyping. Re-enabling the Pool Caller integration tests. 2012-08-16 13:05:17 -04:00
Mark DePristo 132cdfd9c1 GSA-488: MLEAC > AN error when running variant eval fixed 2012-08-16 13:03:14 -04:00
Mark DePristo 4e42988c66 GSA-485: Remove repairVCFHeader from GATK codebase
-- Removed half-a*ssed attempt to automatically repair VCF files with bad headers, which allowed users to provide a replacement header overwriting the file's actually header on the fly.  Not a good idea, really.  Eric has promised to create a utility that walks through a VCF file and creates a meaningful header field based on the file's contents (if this ever becomes a priority)
2012-08-16 13:03:13 -04:00
Mark DePristo 52bfe8db8a Make sure the storage writer is closed before running mergeInfo in multi-threaded output management
-- It's not clear this is cause of GSA-484 but it will help confirm that it's not the cause
2012-08-16 13:03:13 -04:00
Mark DePristo 7a247df922 Added -bcf argument to VCFWriter output to force BCF regardless of file extension
-- Now possible to do -o /dev/stdout -bcf -l DEBUG > tmp.bcf and create a valid BCF2 file
-- Cleanup code to make sure extensions easier by moving to a setX model in VariantContextWriterStub
2012-08-16 13:03:13 -04:00
Mark DePristo 28c8e3e6d7 Cleanup BCF2Codec
-- Remove FORBID_SYMBOLIC global that is no longer necessary
-- all error handling goes via error() function
2012-08-16 13:03:13 -04:00
Mark DePristo 9dc694b2e9 Meaningful error message and keeping tmp file when mergeInfo fails
-- BCF2 is failing for some reason when merging tmp. files with parallel combine variants.  ThreadLocalOutputTracker no longer sets deleteOnExit on the tmp file, as this prevents debugging.  And it's unnecessary because each mergeInto was deleting files as appropriate
-- MergeInfo in VariantContextWriterStorage only deletes the intermediate output if an error occurs
2012-08-16 13:03:13 -04:00
Mark DePristo a9a1c499fd Update md5 in VariantRecalibrationWalkers test for BCF2 -- only encoding differences 2012-08-16 13:03:13 -04:00
Eric Banks 04be0c92bf Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-08-15 23:13:32 -04:00
Eric Banks 9035b554fb Adding tests for the --solid_nocall_strategy argument 2012-08-15 23:13:24 -04:00
David Roazen fa7605c643 Convert external.build.dir and external.dist.dir back to paths
The previous push fixed the external classpath issue but broke external
builds in a new way by changing the above from paths to properties. This
was a mistake, since external builds require absolute, not relative, paths

Thanks to akiezun for the bug report and patch
2012-08-15 23:04:10 -04:00
Eric Banks f368e568db Implementing support in BaseRecalibrator for SOLiD no call strategies other than throwing an exception. For some reason we never transfered these capabilities into BQSRv2 earlier. 2012-08-15 22:52:56 -04:00
Eric Banks 9d09230c26 Better docs for verbose output of Pileup 2012-08-15 21:55:08 -04:00