Guillermo del Angel
1aa856e0e3
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-22 15:53:47 -04:00
Guillermo del Angel
e29469eeeb
Forgot to update 2 integration test md5's (in this cases, changes are legit because of the code revamp of AD, it's simpler if AD is not output when a site is not variant, as genotype DP conveys the same information)
2012-08-22 15:53:33 -04:00
Ryan Poplin
fe3069b278
Merged bug fix from Stable into Unstable
2012-08-22 14:40:34 -04:00
Ryan Poplin
e5cfdb4811
Bug fix for popular _Duplicate allele added to VariantContext_ error reported on the forum. It seems to be due to lower case bases in the reference being treated as reference mismatches. We would try to turn these mismatches into SNP events, for example c/C. We now uppercase the result from IndexedFastaSequenceFile.getSubsequenceAt()
2012-08-22 14:39:35 -04:00
Ryan Poplin
63213e8eb5
Expanding the HaplotypeCaller integration tests to cover a wider range of data
2012-08-22 14:18:44 -04:00
Eric Banks
944e1c299d
Docs for --keepOriginalAC were wrong in SelectVariants
2012-08-22 13:07:13 -04:00
Eric Banks
2409aa9bfd
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-22 12:54:43 -04:00
Eric Banks
94540ccc27
Using the simple VCBuilder constructor and then subsequently trying to modify attributes was throwing a NPE. This is easily solved (without a performance hit) by initializing the attributes map to an immutable Collections.emptyMap(). Added unit test to cover this case.
2012-08-22 12:54:29 -04:00
Guillermo del Angel
901f47d8af
Final step (for now) in VA refactoring: update MD5's because, a) since it's not guaranteed that we'll iterate through reads/pileups in the same order, the rank sum dithering will change annotations, b) FS uses new generic threshold to distinguish uninformative reads (it used to use ad-hoc thresholds), c) AD definition changed and throws away uninformative reads, d) shortened general ploidy integration tests for quicker debugging. May have missed some MD5's in the update so there may be lingering test failures still
2012-08-22 11:38:51 -04:00
Guillermo del Angel
7df0abf49b
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-22 11:36:41 -04:00
Eric Banks
9e76e8aa0b
Just noticed that the efficient conversion to uppercase method is redundant since it's already implemented efficiently in Picard; let's just have a single implementation.
2012-08-22 11:26:08 -04:00
Christopher Hartl
20601f034e
Updating the checkType() function to include the new StructuralIndel variant type. Fixes outstanding broken integration test.
2012-08-22 07:33:10 -07:00
Eric Banks
c7ce3e1cf5
Merged bug fix from Stable into Unstable
2012-08-22 00:24:40 -04:00
Eric Banks
03017855e4
WTF - why is support for whole-read insertions all messed up in LIBS? I've pushed a temporary patch for now (the right solution should certainly not be implemented in stable; LIBS needs to be better thought out). Added another unit test.
2012-08-22 00:24:01 -04:00
Mark DePristo
6ce8016ae7
GSA-491: Add hidden tag to GATK that propagates to the GATK logs
2012-08-21 14:44:18 -04:00
Guillermo del Angel
6a8cf1c84a
Enable and adapt HaplotypeScore and MappingQualityZero as active region annotations now that we have per-read likelihoods passed in to annotations
2012-08-21 14:35:40 -04:00
Guillermo del Angel
d0644b3565
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-21 10:35:23 -04:00
Ryan Poplin
94e7f677ad
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-21 10:21:47 -04:00
Guillermo del Angel
418ace463a
More merge conflict resolution
2012-08-21 10:15:52 -04:00
Ryan Poplin
10961db3ce
Another round of FindBugs fixes. Object returns its internal reference to an externally mutable array. Very dangerous.
2012-08-21 09:35:55 -04:00
Ryan Poplin
605acaae9c
Another round of FindBugs fixes. Object internally stores a reference to an externally mutable array. Very dangerous.
2012-08-21 09:33:58 -04:00
Ryan Poplin
55b7949d68
Another round of FindBugs fixes. Comparator doesn't implement Serializable.
2012-08-21 09:20:55 -04:00
Christopher Hartl
ba8622ff0d
number of stashed changes are lurking in here. In order of importance:
...
- Fix for M_Trieb's error report on the forum, and addition of integration tests to cover the walker.
- Addition of StructuralIndel as a class of variation within the VariantContext. These are for variants with a full alt allele that's >150bp in length.
- Adaptation of the MVLikelihoodRatio to work for a set of trios (takes the max over the trios of the MVLR)
- InsertSizeDistribution changed to use the new gatk report output (it was previously broken)
- RetrogeneDiscovery changed to be compatible with the new gatk report
- A maxIndelSize argument added to SelectVariants
- ByTranscriptEvaluator rewritten for cleanliness
- VariantRecalibrator modified to not exclude structural indels from recalibration if the mode is INDEL
- Documentation added to DepthOfCoverageIntegrationTest (no, don't yell at chartl ;_; )
Also sorry for the long commit history behind this that is the result of fixing merge conflicts. Because this *also* fixes a conflict (from git stash apply), for some reason I can't rebase all of them away. I'm pretty sure some of the commit notes say "this note isn't important because I'm going to rebase it anyway".
2012-08-21 07:08:58 -04:00
Eric Banks
3dfe8df262
Merged bug fix from Stable into Unstable
2012-08-20 23:12:58 -04:00
Eric Banks
40d5efc804
Fix for Adam K's reported bug: we weren't handling reads that were entirely insertions properly in LIBS. Specifically, the event bases were off-by-one (which was disasterous in Adam's case with a 1bp read). Added a unit test to cover this case.
2012-08-20 23:12:41 -04:00
Eric Banks
286b658fab
Re-enabling parallelism in the BaseRecalibrator now that the release is out.
2012-08-20 21:25:14 -04:00
Guillermo del Angel
7bbd2a7a20
Fixing merge conflicts
2012-08-20 20:38:25 -04:00
Guillermo del Angel
2041cb853c
New implementation of AD - ignore now non-informative reads based on per-read likelihoods
2012-08-20 20:31:34 -04:00
Ryan Poplin
77fbaec044
Another round of FindBugs fixes. Class implements its own compareTo() but uses base Object.equals() which can lead to unpredictable behavior.
2012-08-20 16:55:00 -04:00
Ryan Poplin
5e28bca630
Another round of FindBugs fixes. Should be static inner class.
2012-08-20 16:15:48 -04:00
Ryan Poplin
5db3bd6fd2
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-20 15:28:57 -04:00
Ryan Poplin
464d49509a
Pulling out common caller arguments into its own StandardCallerArgumentCollection base class so that every caller isn't exposed to the unused arguments from every other caller.
2012-08-20 15:28:39 -04:00
Eric Banks
4450d66c64
Fixing the docs for DP and AD
2012-08-20 15:10:24 -04:00
Ryan Poplin
c67d708c51
Bug fix in HaplotypeCaller for non-regular bases in the reference or reads. Those events don't get created any more. Bug fix for advanced GenotypeFullActiveRegion mode: custom variant annotations created by the HC don't make sense when in this mode so don't try to calculate them.
2012-08-20 13:41:08 -04:00
Guillermo del Angel
5b5fee56cf
Next iteration of new VA interface: extend changes to per-genotype annotations as well. Will allow to have AD correctly implemented at last (that change not done yet)
2012-08-20 12:52:15 -04:00
Eric Banks
154f65e0de
Temporarily disabling multi-threaded usage of BaseRecalibrator for performance reasons.
2012-08-20 12:43:17 -04:00
Guillermo del Angel
c384677917
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-20 10:27:25 -04:00
Eric Banks
97b191f578
Thanks to Guillermo I was able to isolate an instance of where the MLEAC > AN. It turns out that this is valid, e.g. when PLs are all 0s for a sample we no-call it but it's allowed to factor into the MLE (since that's the contract with the exact model). Removing the check in UG and instead protecting for it in the AlleleCount stratification.
2012-08-20 01:16:23 -04:00
Guillermo del Angel
963ad03f8b
Second step of interface cleanup for variant annotator: several bug fixes, don't hash pileup elements to Maps because the hashCode() for a pileup element is not implemented and strange things can happen. Still several things to do, not done yet
2012-08-19 21:18:18 -04:00
Mark DePristo
7fa76f719b
Print "Parsing data stream with BCF version BCFx.y" in BCF2 codec as .debug not .info
2012-08-19 10:32:55 -04:00
Mark DePristo
9121b98167
CombineVariants outputs the first non-MISSING qual, not the maximum
...
-- When merging multiple VCF records at a site, the combined VCF record has the QUAL of the first VCF record with a non-MISSING QUAL value. The previous behavior was to take the max QUAL, which resulted in sometime strange downstream confusion.
2012-08-19 10:29:38 -04:00
Guillermo del Angel
d9641e3d57
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-19 09:23:21 -04:00
Mauricio Carneiro
d16cb68539
Updated and more thorough version of the BadCigar read filter
...
* No reads with Hard/Soft clips in the middle of the cigar
* No reads starting with deletions (with or without preceding clips)
* No reads ending in deletions (with or without follow-up clips)
* No reads that are fully hard or soft clipped
* No reads that have consecutive indels in the cigar (II, DD, ID or DI)
Also added systematic test for good cigars and iterative test for bad cigars.
2012-08-17 17:05:27 -04:00
Mark DePristo
980685af16
Fix GSA-137: Having both DataSource.REFERENCE and DataSource.REFERENCE_BASES is confusing to end users.
...
-- Removed REFERENCE_BASES option. You only have REFERENCE now. There's no efficiency savings for the REFERENCE_BASES option any longer, since the reference bases are loaded lazy so if you don't use them there's effectively no cost to making the RefContext that could load them.
2012-08-17 14:55:38 -04:00
Eric Banks
2676b7fc2e
Put in a sanity check that MLEAC <= AN
2012-08-17 11:49:53 -04:00
Mark DePristo
daa26cc64e
Print to logger not to System.out in CachingIndexFastaSequenceFile when profiling cache performance
2012-08-17 11:49:02 -04:00
Mark DePristo
be0f8beebb
Fixed GSA-434: GATK should generate error when gzipped FASTA is passed in.
...
-- The GATK sort of handles this now, but only if you have the exactly correct sequence dictionary and FAI files associated with the reference. If you do, the file can be .gz. If not, the GATK will fail on creating the FAI and DICT files. Added an error message that handles this case and clearly says what to do.
2012-08-17 11:49:02 -04:00
Mark DePristo
a3d2764d11
Fixed: GSA-392 @arguments with just a short name get the wrong argument bindings
...
-- Now blows up if an argument begins with -. Implementation isn't pretty, as it actually blows up during Queue extension creation with a somewhat obscure error message but at least its something.
2012-08-17 11:49:01 -04:00
Mark DePristo
4c0f198d48
Potential fix for GSA-484: Incomplete writing of temp BCF when running CombineVariants in parallel
...
-- Keep reading from BCF2 input stream when read(byte[]) returns < number of needed bytes
-- It's possible (I think) that the failure in GSA-484 is due to multi-threading writing/reading of BCF2 records where the underlying stream is not yet flushed so read(byte[]) returns a partial result. No loops until we get all of the needed bytes or EOF is encounted
2012-08-17 11:49:01 -04:00
Mark DePristo
de3be45806
Proper function call in BCF2Decoder to validateReadBytes
2012-08-17 11:49:01 -04:00