Mark DePristo
1200848bbf
Part II of GSA-462: Consistent RODBinding access across Ref and Read trackers
...
-- Deleted ReadMetaDataTracker
-- Added function to ReadShard to give us the span from the left most position of the reads in the shard to the right most, which is needed for the new view
2012-08-30 10:15:10 -04:00
Ryan Poplin
57d997f06f
Fixing bug from when FragmentUtils merging function moved over to the soft clipped start instead of the unclipped start
2012-08-30 10:10:43 -04:00
Ryan Poplin
35baf0b155
This along with Mauricio's previous commit (thanks!) fixes GSA-522. There are no longer any modifications to reads in the map calls of ActiveRegion walkers. Added the bam which identified this error as a new integration test.
2012-08-30 09:07:36 -04:00
Ryan Poplin
e12ae65d33
Changing the commenting style in the BQSR
2012-08-29 11:27:45 -04:00
Ryan Poplin
18eca3544e
Initial commit of the delocalized BQSR written as a read walker.
2012-08-28 15:24:20 -04:00
Mark DePristo
0f4acaae1b
Update MD5s with new FS score
2012-08-28 08:06:47 -04:00
Mark DePristo
b3fd74f0c4
HaplotypeCaller forbids BAQ
2012-08-24 13:25:05 -04:00
Ryan Poplin
fe3069b278
Merged bug fix from Stable into Unstable
2012-08-22 14:40:34 -04:00
Ryan Poplin
e5cfdb4811
Bug fix for popular _Duplicate allele added to VariantContext_ error reported on the forum. It seems to be due to lower case bases in the reference being treated as reference mismatches. We would try to turn these mismatches into SNP events, for example c/C. We now uppercase the result from IndexedFastaSequenceFile.getSubsequenceAt()
2012-08-22 14:39:35 -04:00
Ryan Poplin
63213e8eb5
Expanding the HaplotypeCaller integration tests to cover a wider range of data
2012-08-22 14:18:44 -04:00
Guillermo del Angel
901f47d8af
Final step (for now) in VA refactoring: update MD5's because, a) since it's not guaranteed that we'll iterate through reads/pileups in the same order, the rank sum dithering will change annotations, b) FS uses new generic threshold to distinguish uninformative reads (it used to use ad-hoc thresholds), c) AD definition changed and throws away uninformative reads, d) shortened general ploidy integration tests for quicker debugging. May have missed some MD5's in the update so there may be lingering test failures still
2012-08-22 11:38:51 -04:00
Guillermo del Angel
6a8cf1c84a
Enable and adapt HaplotypeScore and MappingQualityZero as active region annotations now that we have per-read likelihoods passed in to annotations
2012-08-21 14:35:40 -04:00
Guillermo del Angel
d0644b3565
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-21 10:35:23 -04:00
Ryan Poplin
94e7f677ad
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-21 10:21:47 -04:00
Guillermo del Angel
418ace463a
More merge conflict resolution
2012-08-21 10:15:52 -04:00
Ryan Poplin
10961db3ce
Another round of FindBugs fixes. Object returns its internal reference to an externally mutable array. Very dangerous.
2012-08-21 09:35:55 -04:00
Ryan Poplin
605acaae9c
Another round of FindBugs fixes. Object internally stores a reference to an externally mutable array. Very dangerous.
2012-08-21 09:33:58 -04:00
Ryan Poplin
55b7949d68
Another round of FindBugs fixes. Comparator doesn't implement Serializable.
2012-08-21 09:20:55 -04:00
Eric Banks
286b658fab
Re-enabling parallelism in the BaseRecalibrator now that the release is out.
2012-08-20 21:25:14 -04:00
Guillermo del Angel
7bbd2a7a20
Fixing merge conflicts
2012-08-20 20:38:25 -04:00
Ryan Poplin
77fbaec044
Another round of FindBugs fixes. Class implements its own compareTo() but uses base Object.equals() which can lead to unpredictable behavior.
2012-08-20 16:55:00 -04:00
Ryan Poplin
a9472c1980
Another round of FindBugs fixes. Inefficient use of keySet iterator instead of entrySet iterator.
2012-08-20 16:11:45 -04:00
Ryan Poplin
5db3bd6fd2
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-20 15:28:57 -04:00
Ryan Poplin
464d49509a
Pulling out common caller arguments into its own StandardCallerArgumentCollection base class so that every caller isn't exposed to the unused arguments from every other caller.
2012-08-20 15:28:39 -04:00
Ryan Poplin
c67d708c51
Bug fix in HaplotypeCaller for non-regular bases in the reference or reads. Those events don't get created any more. Bug fix for advanced GenotypeFullActiveRegion mode: custom variant annotations created by the HC don't make sense when in this mode so don't try to calculate them.
2012-08-20 13:41:08 -04:00
Eric Banks
154f65e0de
Temporarily disabling multi-threaded usage of BaseRecalibrator for performance reasons.
2012-08-20 12:43:17 -04:00
Guillermo del Angel
963ad03f8b
Second step of interface cleanup for variant annotator: several bug fixes, don't hash pileup elements to Maps because the hashCode() for a pileup element is not implemented and strange things can happen. Still several things to do, not done yet
2012-08-19 21:18:18 -04:00
Guillermo del Angel
b61ecc7c19
Fix merge conflicts
2012-08-16 20:45:52 -04:00
Guillermo del Angel
d26183e0ec
First preliminary big refactoring of UG annotation engine. Goals: a) Remove gigantic hack that cached per-read haplotype likelihoods in a static array so that annotations would go back and retrieve them, b) unify interface for annotations between HaplotypeCaller and UnifiedGenotyper, c) as a consequence, removed and cleaned duplicated code. As a bonus, annotations have now more relevant info to help them compute values.
...
Major idea is that per-read haplotype likelihoods are now stored in a single unified object of class PerReadAlleleLikelihoodMap. Class implementation in theory hides internal storage details from outside work (still may need work cleaning up interface), and this object(or rather, a Map from Sample->perReadAlleleLikelihoodMap) is produced by UGCalcLikelihoods. The genotype calculation is also able to potentially use this info if needed. All InfoFieldAnnotations now get an extra argument with this map. Currently, this map is only produced for indels in UG, or for all variants within HaplotypeCaller. If this map is absent (SNPs in UG), the old Pileup interface is used, but it's avoided whenever possible. FORMAT annotations are not yet changed but will be focus of second step. Major benefit will be that annotations will be able to very easily discard non-informative reads for certain events. HaplotypeCaller also uses this new class, and no longer hard-codes the mapping of allele ->list(reads) but instead uses the same objects and interfaces as the rest of the modules. Code still needs further testing/cleaning/reviewing/debugging
2012-08-16 20:36:53 -04:00
Eric Banks
05cbf1c8c0
FindBugs 'Efficiency' fixes
2012-08-16 15:40:52 -04:00
Eric Banks
dac3958461
Killing off some FindBugs 'Usability' issues
2012-08-16 13:32:44 -04:00
Eric Banks
2df04dc48a
Fix for performance problem in GGA mode related to previous --regenotype commit. Instead of trying to hack around the determination of the calculation model when it's not needed, just simply overload the calculateGenotypes() method to add one that does simple genotyping. Re-enabling the Pool Caller integration tests.
2012-08-16 13:05:17 -04:00
Eric Banks
9035b554fb
Adding tests for the --solid_nocall_strategy argument
2012-08-15 23:13:24 -04:00
Mark DePristo
3556c36668
Disable general ploidy integration tests because they are running forever
2012-08-15 21:13:16 -04:00
Mark DePristo
243af0adb1
Expanded the BQSR reporting script
...
-- Includes header page
-- Table of arguments (Arguments)
-- Summary of counts (RecalData0)
-- Summary of counts by qual (RecalData1)
-- Fixed bug in output that resulted in covariates list always being null (updated md5s accordingly)
-- BQSR.R loads all relevant libaries now, include gplots, grid, and gsalib to run correctly
2012-08-12 13:45:14 -04:00
Eric Banks
eca9613356
Adding support of X and = CIGAR operators to the GATK
2012-08-10 14:54:07 -04:00
Ryan Poplin
2a113977a9
Resolving merge conflicts with the new MD5s
2012-08-10 11:47:00 -04:00
Ryan Poplin
5f82ffd5d8
Adding LowQual filter to the output of the HaplotypeCaller.
2012-08-10 11:25:14 -04:00
Ryan Poplin
9887bc4410
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-09 16:31:06 -04:00
Mauricio Carneiro
abb168e1ba
Merged bug fix from Stable into Unstable
2012-08-09 16:09:58 -04:00
Mauricio Carneiro
67d4148b32
Fixing but reported by Thomas in the forum where reads were soft-clipped beyond the limits of the contig and ReduceReads was failing with a NoSuchElement exception. Now we hard clip anything that goes beyond the boundaries of the contig.
2012-08-09 15:58:18 -04:00
Mauricio Carneiro
58420098ac
Merged bug fix from Stable into Unstable
2012-08-09 13:02:23 -04:00
Mauricio Carneiro
c6132ebe26
Fixed divide by zero bug when downsampler goes over regions where reads are all filtered out. Added Guillermo's bug report as an integration test
2012-08-09 13:02:11 -04:00
Ryan Poplin
e48727dae3
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-09 10:31:10 -04:00
Guillermo del Angel
5be7e0621d
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-09 09:58:34 -04:00
Guillermo del Angel
71ee8d87b3
Rename per-sample ML allelic fractions and counts so that they don't have the same name as the per-site INFO fields, and clarify wording in VCF header
2012-08-09 09:58:20 -04:00
Mauricio Carneiro
250ffd2ad7
Merged bug fix from Stable into Unstable
2012-08-08 15:50:07 -04:00
Mauricio Carneiro
78c1556186
Fixing ReduceReads downsampling bug -- downsampled reads were not being excluded from the read window, causing them to trail back and get caught by the sliding window exception
2012-08-08 15:49:31 -04:00
Ryan Poplin
1223d77546
Removing argument from HaplotypeCaller that was made unneccesary by recent improvements to triggering around large events
2012-08-08 15:13:20 -04:00
Eric Banks
4b2e3cec0b
Quick pass of FindBugs 'inefficient use of keySet iterator instead of entrySet iterator' fixes for core tools.
2012-08-08 14:29:41 -04:00
Guillermo del Angel
3e2752667c
Intermediate checkin for ReducedReads with HaplotypeCaller - change min read count over k-mer to average count over k-mer when doing assembly of a reduced read (not optimal, currently trying max and then will decide on best approach), fix merge conflicts
2012-08-08 12:07:33 -04:00
Eric Banks
2c76f71a03
Update -maxAlleles argument in integration tests
2012-08-06 22:48:04 -04:00
Guillermo del Angel
c66a896b8e
Fix UG integration test broken by new -maxAltAlleles nomenclature
2012-08-06 21:29:21 -04:00
Guillermo del Angel
97c5ed4feb
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-06 20:22:31 -04:00
Guillermo del Angel
238d55cb61
Fixes for running HaplotypeCaller with reduced reads: a) minor refactoring, pulled out code to compute mean representative count to ReadUtils, b) Don't use min representative count over kmer when constructing de Bruijn graph - this creates many paths with multiplicity=1 and makes us lose a lot of SNP's at edge of capture targets. Use mean instead
2012-08-06 20:22:12 -04:00
Ryan Poplin
d85b38e4da
Updating HaplotypeCaller integration tests
2012-08-06 12:02:19 -04:00
Ryan Poplin
b8709d8c67
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-06 11:41:28 -04:00
Ryan Poplin
973d1d47ed
Merging together the computeDiploidHaplotypeLikelihoods functions in the HaplotypeCaller's LikelihoodEngine so they both benefit from the ReducedRead's RepresentativeCount
2012-08-06 11:40:07 -04:00
Ryan Poplin
b7eec2fd0e
Bug fixes related to the changes in allele padding. If a haplotype started with an insertion it led to array index out of bounds. Haplotype allele insert function is now very simple because all alleles are treated the same way. HaplotypeUnitTest now uses a variant context instead of creating Allele objects directly.
2012-08-05 12:29:10 -04:00
Guillermo del Angel
d2e8eb7b23
Fixed 2 haplotype caller unit tests: a) new interface for addReadLikelihoods() including read counts, b) disable test that test basic DeBruijn graph assembly, not ready yet
2012-08-03 14:26:51 -04:00
Ryan Poplin
c3b6e2b143
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-03 13:14:43 -04:00
Ryan Poplin
ff80f17721
Using PathComparatorTotalScore in the assembly graph traversal does a better job of capturing low frequency branches that are inside high frequnecy haplotypes.
2012-08-03 13:14:37 -04:00
Guillermo del Angel
6f8e7692d4
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-08-03 12:24:37 -04:00
Guillermo del Angel
9e25b209e0
First pass of implementation of Reduced Reads with HaplotypeCaller. Main changes: a) Active region: scale PL's by representative count to determine whether region is active. b) Scale per-read, per-haplotype likelihoods by read representative counts. A read representative count is (temporarily) defined as the average representative count over all bases in read, TBD whether this is good enough to avoid biases in GL's. c) DeBruijn assembler inserts kmers N times in graph, where N is min representative count of read over kmer span - TBD again whether this is the best approach. d) Bug fixes in FragmentUtils: logic to merge fragments was wrong in cases where there is discrepancy of overlaps between unclipped/soft clipped bases. Didn't affect things before but RR makes prevalence of hard-clipped bases in CIGARs more prevalent so this was exposed. e) Cache read representative counts along with read likelihoods associated with a Haplotype. Code can/should be cleaned up and unified with PairHMMIndelErrorModelCode, as well as refactored to support arbitrary ploidy in HaplotypeCaller
2012-08-03 12:24:23 -04:00
Ryan Poplin
3ece4c4993
Merged bug fix from Stable into Unstable
2012-08-02 11:41:36 -04:00
Ryan Poplin
cb8bc18aeb
Fix for error in HaplotypeCaller. HC has a UG argument collection for the UG engine but some of those arguments aren't appropriate to set.
2012-08-02 11:41:06 -04:00
Guillermo del Angel
9ac72dbd4d
Merged bug fix from Stable into Unstable
2012-08-01 10:56:45 -04:00
Guillermo del Angel
01265f78e6
Add sanity check and possible bug fix for forum user: if haplotypes cannot be created from given alleles when genotyping indels (e.g. too close to contig boundary, etc.) in pool mode, empty allele list, signifying site can't be genotyped
2012-08-01 10:50:00 -04:00
Guillermo del Angel
4a23f3cd11
Simple cleanup of pool caller code - since usage is much more general than just calling pools, AF calculation models and GL calculation models are renamed from Pool -> GeneralPloidy. Also, don't have users specify special arguments for -glm and -pnrm. Instead, when running UG with sample ploidy != 2, the correct general ploidy modules are automatically detected and loaded. -glm now reverts to old [SNP|INDEL|BOTH] usage
2012-07-31 16:34:20 -04:00
Mark DePristo
e00ed8bc5e
Cleanup BQSR classes
...
-- Moved most of BQSR classes (which are used throughout the codebase) to utils.recalibration. It's better in my opinion to keep commonly used code in utils, and only specialized code in walkers. As code becomes embedded throughout GATK its should be refactored to live in utils
-- Removed unncessary imports of BQSR in VQSR v3
-- Now ready to refactor QualQuantizer and unit test into a subclass of RecalDatum, refactor unit tests into RecalDatum unit tests, and generalize into hierarchical recal datum that can be used in QualQuantizer and the analysis of adaptive context covariate
-- Update PluginManager to sort the plugins and interfaces. This allows us to have a deterministic order in which the plugin classes come back, which caused BQSR integration tests to temporarily change because I moved my classes around a bit.
2012-07-31 08:11:03 -04:00
Guillermo del Angel
e6b326c189
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-30 21:32:19 -04:00
Guillermo del Angel
6c9d3ec155
Remerge after changes to allele construction code. More cleanups/fixes to artificial read pileup provider
2012-07-30 21:32:03 -04:00
Ryan Poplin
3dabb90eb0
Updating example active region walker integration test.
2012-07-30 21:26:16 -04:00
Ryan Poplin
c2b57ee444
updating HC integration tests after these changes.
2012-07-30 12:41:40 -04:00
Ryan Poplin
13591b169f
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-30 12:13:24 -04:00
Ryan Poplin
48b9495460
Fixes to the likelihood based LD calculation for deciding when to combine consecutive events.
2012-07-30 12:12:56 -04:00
Ryan Poplin
9002758ede
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-30 12:10:00 -04:00
Ryan Poplin
7a73042cd3
Bug fix for the case of merging two VCs when a deletion deletes the padding base for a consecutive indel. Added unit test to cover this case.
2012-07-30 12:09:23 -04:00
Eric Banks
5743694196
Merged bug fix from Stable into Unstable
2012-07-30 11:35:28 -04:00
Eric Banks
79195b97a3
Adding categories for the remaining uncategorized walkers
2012-07-30 11:35:08 -04:00
Guillermo del Angel
5b9a1af7fe
Intermediate fix for pool GL unit test: fix up artificial read pileup provider to give consistent data. b) Increase downsampling in pool integration tests with reference sample, and shorten MT tests so they don't last too long
2012-07-30 09:56:10 -04:00
Eric Banks
99b15b2b3a
Final checkpoint: all tests pass. Note that there were bugs in the PoolGenotypeLikelihoodsUnitTest that needed fixing and eventually led to my needing to disable one of the tests (with a note for Guillermo to look into it). Also note that while I have moved over the GATK to use the new non-null representation of Alleles, I didn't remove all of the now-superfluous code throughout to do padding checking on merges; we'll need to do this on a subsequent push.
2012-07-29 01:07:59 -04:00
Eric Banks
2b1b00ade5
All integration tests and VC/Allele unit tests are passing
2012-07-27 17:03:49 -04:00
Eric Banks
beb7610195
Resolving merge conflicts
2012-07-27 15:52:02 -04:00
Eric Banks
27e7e11ec0
Allele refactoring checkpoint #3 : all integration tests except for PoolCaller are passing now. Fixed a couple of bugs from old code that popped up during md5 difference review. Added VariantContextUtils.requiresPaddingBase() method for tools that create alleles to use for determining whether or not to add the ref padding base. One of the HaplotypeCaller tests wasn't passing because of RankSumTest differences, so I added a TODO for Ryan to look into this.
2012-07-27 15:48:40 -04:00
Ryan Poplin
22bb4804f0
HaplotypeCaller now use an excessive number of high quality soft clips as a triggering signal in order to capture both end points of a large deletion in a single active region.
2012-07-27 12:44:02 -04:00
Ryan Poplin
a0890126a8
ActiveRegionWalker's isActive function returns a results object now instead of just a double.
2012-07-27 11:01:39 -04:00
Eric Banks
baf3e33730
Allele refactoring checkpoint 2: all code finally compiles, AD and STR annotations are fixed, and most of the UG integration tests pass.
2012-07-26 23:27:11 -04:00
Ryan Poplin
35e803e110
Merged bug fix from Stable into Unstable
2012-07-26 14:00:04 -04:00
Ryan Poplin
4f741b4cd7
Smoothing in the BQSR bins should be one error observation and one non-error observation.
2012-07-26 13:59:02 -04:00
Guillermo del Angel
2ae890155c
Improvements to indel calling in pool caller: a) Compute per-read likelihoods in reference sample to determine wheter a read is informative or not. b) Fixed bugs in unit tests. c) Fixed padding-related bugs when computing matches/mismatches in ErrorModel, d) Added a couple of more integration tests to increase test coverage, including testing odd ploidy
2012-07-26 13:43:00 -04:00
Eric Banks
a694d1b5de
Merge branch 'master' into allelePadding
2012-07-26 01:53:14 -04:00
Eric Banks
32516a2f60
Initial checkpoint commit of VariantContext/Allele refactoring. There were just too many problems associated with the different representation of alleles in VCF (padded) vs. VariantContext (unpadded). We are moving VC to use the VCF representation. No more reference base for indels in VC and no more trimming and padding of alleles. Even reverse trimming has been stopped (the theory being that writers of VCF now know what they are doing and often want the reverse padding if they put it there; this has been requested on GetSatisfaction). Code compiles but presumably pretty much all tests with indels with fail at this point.
2012-07-26 01:50:39 -04:00
Eric Banks
7eb3f54750
Added category docs for the remaining public walkers (I think I got them all). I removed a couple of totally unnecessary walkers.
2012-07-25 21:40:28 -04:00
Eric Banks
0a98a6aa8d
Adding extraDocs tag per Mauricio's request
2012-07-25 18:23:18 -04:00
Eric Banks
05fa377a8e
Adding GATK categories to standard walkers. Will add to remaining walkers after the next successful release (so that I can see which walkers are public and still need it).
2012-07-25 16:05:47 -04:00
Eric Banks
a5721a8846
Context covariate optimizations were not suited for multiple threads, so I removed them (since that ended up being much, much easier than trying to make the covariates thread local). Added -nt 2 layer to BQSR integration tests to confirm that it now works with multiple threads.
2012-07-25 13:38:07 -04:00
Eric Banks
675ccab2fa
Renaming BQSR to BaseRecalibrator
2012-07-23 10:17:17 -04:00
Ryan Poplin
2e486d83e2
Updating HaplotypeCaller docs and expanding integration tests.
2012-07-23 10:05:42 -04:00
Mauricio Carneiro
df965d4a5a
Fixing BQSR integration test
2012-07-21 11:11:45 -04:00
Mauricio Carneiro
116885a450
Removed the "Walker" suffix from all walkers that had it.
...
* Did not touch archived walkers... those can be named whatever.
* Kept abstract classes that end in Walker untouched (e.g. LocusWalker, ReadWalker, ...)
* Renamed a few inner classes due to conflict when stripping off Walker from their outer classes: ContigStats, FlagStats and FastaStats.
2012-07-20 17:27:11 -04:00
Ryan Poplin
1592841c93
New function for merging nearby events into MNPs or complex substitutions. Added extensive unit tests.
2012-07-19 13:16:33 -04:00
Guillermo del Angel
c16f9f2f15
a) Use new method to check for GATK Like, b) minor improvements to indel pool caller (more to come): brain-dead, quick way to limit number of alt alleles to genotype. We can't process too many alt alleles because of the combinatorial explosion of GL values with high ploidy, and some STR validation targets had up to 12 alt alleles, resulting of GL vectors of > 1e8 elements. Can't use pileup elements since typically not many alleles will be in one pileup, and different alleles will appear in different samples, TBD a nicer solution. c) Commit to posterity scala script for large scale validation calling, still work in progress
2012-07-19 10:24:08 -04:00
Eric Banks
5f5edeca63
Reverting move of BQSR tests to public, as per DR's email
2012-07-19 10:02:05 -04:00
Eric Banks
9c1ab1b0c0
Move BQSR integration test and its dependent files into public; previously there was a protected->private dependency.
2012-07-18 21:11:33 -04:00
Mark DePristo
74e153ff4a
FisherStrand now uses RankSumTest isUsableBase to decide if a read should be included in testing
...
-- Previously used hardcoded MAPQ > 20 && QUAL > 20 but now uses isUsableBase
-- Updating MD5s as appropriate
2012-07-18 16:07:47 -04:00
Eric Banks
e4db8dde91
Enabled a whole other bunch of integration tests for BQSRv2. While I was there I also changed the default context size for indels to 3 (from 8) since that's what works best in the current implementation (as suggested by Ryan). At this point, all of the new core tools (ReduceReads, BQSRv2, HaplotypeCaller, UG extensions) have been moved over to protected and should be stable. Looks like we are pretty much ready for GATK 2.0!
2012-07-17 23:36:43 -04:00
Guillermo del Angel
731bbba2e6
Bug fixes for integration test, use correct new UG syntax
2012-07-17 16:57:59 -04:00
Guillermo del Angel
40b8c7172c
Pool Caller refactoring in preparation of GATK 2.0: a) PoolCallerUnifiedArgumentCollection disappeared, and arguments moved to UnifiedArgumentCollection. b) PoolCallerWalker is no longer needed and redundant, all functionality subsumed by UG. UG now checks if GATK is lite - if so, don't allow ploidy > 2. c) Moved pool classes from private to protected. d) Changed the way to specify ploidy. Instead of specifying samples per pool and having ploidy = 2*samplesPerPool, have user specify ploidy directly, which is cleaner. Update tests accordingly. We can now call triploid seedless grape genotypes correctly in theory. e) Renamed argument -reference to -reference_sample_calls since the former is ambiguous and it's not clear what it refers to.
2012-07-17 15:27:04 -04:00
Eric Banks
b0d99fd10d
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-17 15:12:28 -04:00
Eric Banks
305db8c0d1
Total rewrite of the isGATKLite() functionality with help of Khalid/David. PluginManager was not working for us.
2012-07-17 15:11:03 -04:00
Ryan Poplin
bf2d5efe4d
Moving HaplotypeCaller integration and unit tests over to protected as well.
2012-07-17 14:51:26 -04:00
Ryan Poplin
c55934043e
Moving HaplotypeCaller from private to protected
2012-07-17 14:41:19 -04:00
Eric Banks
3a64398d07
Cleaned up the isGATKLite check
2012-07-17 12:46:16 -04:00
Eric Banks
62c5228048
1) Revert previous change - indel recalibration is turned on by default and users of the Lite version will need to turn it off to avoid a User Error. 2) Implemented the engine.isGATKLite() method.
2012-07-17 12:23:40 -04:00
Eric Banks
40618ac471
A bunch of BQSR changes: 1) by default we do not emit indel quals, but they can be turned on with --enable_indel_quals. 2) We check whether or not we are running in Lite mode (not done yet) and if so and the user is trying to recalibrate indels, we throw a User Error (not supported). 3) Like v1 we now allow the user to set the qual value below which we don't recalibrate (this was the remaining source of differences in the v1 vs. v2 plots).
2012-07-17 10:52:43 -04:00
Eric Banks
d5b3a2eabf
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-07-17 00:32:53 -04:00
Eric Banks
f657b8bda8
Complete overhaul of the BQSRv2 integration tests. Much more comprehensive. Still need to deal with a few tests that need some modifications before I'm done, but I'll take care of that sometime tomorrow.
2012-07-17 00:32:34 -04:00
Eric Banks
0a89adbcdb
Add utility decorators so that classes can tell you which package source they come from if they want to (suggested by Khalid). Using those decorators, we can easily pull out the BQSR updateDataForPileupElement() method into a standard RecalibrationEngine and an AdvancedRecalibrationEngine and use the protected one (AdvancedRE) if available (otherwise, the public one).
2012-07-16 15:34:50 -04:00
Eric Banks
52baac1e16
Move BQSRv2 into public and v1 into the archive.
2012-07-16 14:23:38 -04:00
Joel Thibault
6c6a324583
Loosen a restriction on isOriginalRead()
...
* no longer needs to satisfy ReadAndIntervalOverlap.OVERLAP_CONTAINED
2012-07-16 14:07:10 -04:00
Eric Banks
d7bf74fb7e
Updating default value for -mindel to the one used by Khalid in the pipeline and me in my tests.
2012-07-10 02:04:26 -04:00
Mauricio Carneiro
6c17c50fa2
Updates to ReduceReads
...
* Added optional parameter to not hard clip on the interval border
* Made not clipping the default behavior (hence integration tests changed)
* Updated integration tests.
2012-07-09 13:46:51 -04:00
Mauricio Carneiro
12d1c594df
Moving ReduceReads into protected
2012-06-25 17:01:33 -04:00
David Roazen
9c6bccfd8b
build system overhaul
...
* Added support for a protected directory whose contents are only made public in binary form
* Simplified and reorganized build.xml to improve readability and maintainability
* build.xml now autodetects most build properties:
-Includes private/protected if they exist
-No more STING_BUILD_TYPE or specialized targets for public-only, etc.
* Build targets have changed! There are now two main build options:
"ant" build everything (GATK and Queue)
"ant gatk" build just the GATK
It was too hard to build everything before -- now it is the default.
* To run tests with debugging, use -Dtest.debug=true -Dtest.debug.port=XXXX on the command line.
Much better than the old comment/uncomment method!
2012-05-17 15:16:29 -04:00