Guillermo del Angel
1aa856e0e3
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-22 15:53:47 -04:00
Guillermo del Angel
e29469eeeb
Forgot to update 2 integration test md5's (in this cases, changes are legit because of the code revamp of AD, it's simpler if AD is not output when a site is not variant, as genotype DP conveys the same information)
2012-08-22 15:53:33 -04:00
Menachem Fromer
b1b9c0b132
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-22 15:26:39 -04:00
Ryan Poplin
fe3069b278
Merged bug fix from Stable into Unstable
2012-08-22 14:40:34 -04:00
Ryan Poplin
e5cfdb4811
Bug fix for popular _Duplicate allele added to VariantContext_ error reported on the forum. It seems to be due to lower case bases in the reference being treated as reference mismatches. We would try to turn these mismatches into SNP events, for example c/C. We now uppercase the result from IndexedFastaSequenceFile.getSubsequenceAt()
2012-08-22 14:39:35 -04:00
Ryan Poplin
63213e8eb5
Expanding the HaplotypeCaller integration tests to cover a wider range of data
2012-08-22 14:18:44 -04:00
Eric Banks
944e1c299d
Docs for --keepOriginalAC were wrong in SelectVariants
2012-08-22 13:07:13 -04:00
Eric Banks
2409aa9bfd
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-22 12:54:43 -04:00
Eric Banks
94540ccc27
Using the simple VCBuilder constructor and then subsequently trying to modify attributes was throwing a NPE. This is easily solved (without a performance hit) by initializing the attributes map to an immutable Collections.emptyMap(). Added unit test to cover this case.
2012-08-22 12:54:29 -04:00
Guillermo del Angel
901f47d8af
Final step (for now) in VA refactoring: update MD5's because, a) since it's not guaranteed that we'll iterate through reads/pileups in the same order, the rank sum dithering will change annotations, b) FS uses new generic threshold to distinguish uninformative reads (it used to use ad-hoc thresholds), c) AD definition changed and throws away uninformative reads, d) shortened general ploidy integration tests for quicker debugging. May have missed some MD5's in the update so there may be lingering test failures still
2012-08-22 11:38:51 -04:00
Guillermo del Angel
7df0abf49b
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-22 11:36:41 -04:00
Eric Banks
9e76e8aa0b
Just noticed that the efficient conversion to uppercase method is redundant since it's already implemented efficiently in Picard; let's just have a single implementation.
2012-08-22 11:26:08 -04:00
Christopher Hartl
20601f034e
Updating the checkType() function to include the new StructuralIndel variant type. Fixes outstanding broken integration test.
2012-08-22 07:33:10 -07:00
Eric Banks
c7ce3e1cf5
Merged bug fix from Stable into Unstable
2012-08-22 00:24:40 -04:00
Eric Banks
03017855e4
WTF - why is support for whole-read insertions all messed up in LIBS? I've pushed a temporary patch for now (the right solution should certainly not be implemented in stable; LIBS needs to be better thought out). Added another unit test.
2012-08-22 00:24:01 -04:00
Mark DePristo
1acf18aa25
run_performance_tests use bsub and gsa
...
-- Confirmed that running on gsa queue is fine with sufficient iterations (3)
2012-08-21 16:26:12 -04:00
Mark DePristo
cb9ba4f660
Expand intervals processed for many GATKPerformanceOverTime commands
...
-- For the high NT tests the total runtime may be too short to really assess nt efficiency vs. start up costs. Reworked underlying test data and intervals so that most tests run in 10-20 hrs for -nt 1.
2012-08-21 16:25:31 -04:00
Mark DePristo
1d707e7b31
Linear and Quadratic fits for GATKPerformanceOverTime.R
2012-08-21 14:44:18 -04:00
Mark DePristo
e43ff31eab
GATKPerformance over time lives (mark II)
...
-- Now uses new tagging capabilities so that 2.x runs will tag their logs as GATKPerformanceOverTime
-- Update bamboo runs, reverting back to gsa4 (it's slower but the results are less variable -- you were right david!).
2012-08-21 14:44:18 -04:00
Mark DePristo
df36b4384c
Update analyzeRunReports.py to handle new GATK tag argument
2012-08-21 14:44:18 -04:00
Mark DePristo
6ce8016ae7
GSA-491: Add hidden tag to GATK that propagates to the GATK logs
2012-08-21 14:44:18 -04:00
Mark DePristo
9eec33ec3b
Complete GSA-497: Let Queue write out runInfo on the fly, after each job group finishes running
...
-- Queue will incrementally now write out its jobReport.txt file whenever jobs finish running (FAIL or DONE)
-- This makes it far easier to track what's going on, or to analyze incrementally performance results coming out of Queue
-- Generally cleaned up the QJobsReporting code, creating a new clean class QJobsReporter that holds all of the information on what to do log and where to put into, which was previously scattered in QCommandLine and QJobReport
2012-08-21 14:44:18 -04:00
Guillermo del Angel
6a8cf1c84a
Enable and adapt HaplotypeScore and MappingQualityZero as active region annotations now that we have per-read likelihoods passed in to annotations
2012-08-21 14:35:40 -04:00
Guillermo del Angel
d0644b3565
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-21 10:35:23 -04:00
Ryan Poplin
94e7f677ad
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-21 10:21:47 -04:00
Guillermo del Angel
418ace463a
More merge conflict resolution
2012-08-21 10:15:52 -04:00
Ryan Poplin
10961db3ce
Another round of FindBugs fixes. Object returns its internal reference to an externally mutable array. Very dangerous.
2012-08-21 09:35:55 -04:00
Ryan Poplin
605acaae9c
Another round of FindBugs fixes. Object internally stores a reference to an externally mutable array. Very dangerous.
2012-08-21 09:33:58 -04:00
Ryan Poplin
55b7949d68
Another round of FindBugs fixes. Comparator doesn't implement Serializable.
2012-08-21 09:20:55 -04:00
Christopher Hartl
ba8622ff0d
number of stashed changes are lurking in here. In order of importance:
...
- Fix for M_Trieb's error report on the forum, and addition of integration tests to cover the walker.
- Addition of StructuralIndel as a class of variation within the VariantContext. These are for variants with a full alt allele that's >150bp in length.
- Adaptation of the MVLikelihoodRatio to work for a set of trios (takes the max over the trios of the MVLR)
- InsertSizeDistribution changed to use the new gatk report output (it was previously broken)
- RetrogeneDiscovery changed to be compatible with the new gatk report
- A maxIndelSize argument added to SelectVariants
- ByTranscriptEvaluator rewritten for cleanliness
- VariantRecalibrator modified to not exclude structural indels from recalibration if the mode is INDEL
- Documentation added to DepthOfCoverageIntegrationTest (no, don't yell at chartl ;_; )
Also sorry for the long commit history behind this that is the result of fixing merge conflicts. Because this *also* fixes a conflict (from git stash apply), for some reason I can't rebase all of them away. I'm pretty sure some of the commit notes say "this note isn't important because I'm going to rebase it anyway".
2012-08-21 07:08:58 -04:00
Eric Banks
3dfe8df262
Merged bug fix from Stable into Unstable
2012-08-20 23:12:58 -04:00
Eric Banks
40d5efc804
Fix for Adam K's reported bug: we weren't handling reads that were entirely insertions properly in LIBS. Specifically, the event bases were off-by-one (which was disasterous in Adam's case with a 1bp read). Added a unit test to cover this case.
2012-08-20 23:12:41 -04:00
Khalid Shakir
3514fb6e66
Changed the default memory limit from none to 2GB upon suggestions from delangel, carneiro, and depristo.
2012-08-20 21:41:13 -04:00
Eric Banks
286b658fab
Re-enabling parallelism in the BaseRecalibrator now that the release is out.
2012-08-20 21:25:14 -04:00
Eric Banks
5b1781fdac
Merge remote-tracking branch 'unstable/master'
2012-08-20 21:18:54 -04:00
Guillermo del Angel
7bbd2a7a20
Fixing merge conflicts
2012-08-20 20:38:25 -04:00
Guillermo del Angel
2041cb853c
New implementation of AD - ignore now non-informative reads based on per-read likelihoods
2012-08-20 20:31:34 -04:00
Ryan Poplin
77fbaec044
Another round of FindBugs fixes. Class implements its own compareTo() but uses base Object.equals() which can lead to unpredictable behavior.
2012-08-20 16:55:00 -04:00
Ryan Poplin
5e28bca630
Another round of FindBugs fixes. Should be static inner class.
2012-08-20 16:15:48 -04:00
Ryan Poplin
a9472c1980
Another round of FindBugs fixes. Inefficient use of keySet iterator instead of entrySet iterator.
2012-08-20 16:11:45 -04:00
Ryan Poplin
5db3bd6fd2
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-20 15:28:57 -04:00
Ryan Poplin
464d49509a
Pulling out common caller arguments into its own StandardCallerArgumentCollection base class so that every caller isn't exposed to the unused arguments from every other caller.
2012-08-20 15:28:39 -04:00
Eric Banks
4450d66c64
Fixing the docs for DP and AD
2012-08-20 15:10:24 -04:00
Ryan Poplin
c67d708c51
Bug fix in HaplotypeCaller for non-regular bases in the reference or reads. Those events don't get created any more. Bug fix for advanced GenotypeFullActiveRegion mode: custom variant annotations created by the HC don't make sense when in this mode so don't try to calculate them.
2012-08-20 13:41:08 -04:00
Guillermo del Angel
5b5fee56cf
Next iteration of new VA interface: extend changes to per-genotype annotations as well. Will allow to have AD correctly implemented at last (that change not done yet)
2012-08-20 12:52:15 -04:00
Eric Banks
154f65e0de
Temporarily disabling multi-threaded usage of BaseRecalibrator for performance reasons.
2012-08-20 12:43:17 -04:00
Menachem Fromer
37dd7209df
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-20 12:31:34 -04:00
Guillermo del Angel
c384677917
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-20 10:27:25 -04:00
Eric Banks
97b191f578
Thanks to Guillermo I was able to isolate an instance of where the MLEAC > AN. It turns out that this is valid, e.g. when PLs are all 0s for a sample we no-call it but it's allowed to factor into the MLE (since that's the contract with the exact model). Removing the check in UG and instead protecting for it in the AlleleCount stratification.
2012-08-20 01:16:23 -04:00
Guillermo del Angel
963ad03f8b
Second step of interface cleanup for variant annotator: several bug fixes, don't hash pileup elements to Maps because the hashCode() for a pileup element is not implemented and strange things can happen. Still several things to do, not done yet
2012-08-19 21:18:18 -04:00