Mark DePristo
8c0718a7c9
Fixed missing import
2012-03-30 15:31:55 -04:00
Mark DePristo
976bac0452
BaseTest now has a global variable to turn off network connection requirement
2012-03-30 15:31:55 -04:00
Mark DePristo
097ed4ecc4
Memory usage optimizations and safety improvements to StratNode and StratificationManager
...
-- Added memory and safety optimizations to StratNode and StratificationManager. Fresh, immutable Hashmaps are allocated for final data structures, so they exactly the correct size and cannot be changed by users.
-- Added ability of a stratification to specify incompatible evaluation. The two strats using this are AC and Sample with VariantSummary, as this computes per-sample averages and so combining these results in an O(n^2) memory requirement. Added integration test to cover incompatible strats and evals
2012-03-30 15:31:55 -04:00
Mark DePristo
b335c22f6d
Fully refactored, mostly cleaned up version of VariantEval using StratificationManager
2012-03-30 15:31:55 -04:00
Mark DePristo
c8086a79e3
New StratificationManager based VariantEval passes unmodified integration tests
...
-- Now needs cleanup and optimizations
2012-03-30 15:31:55 -04:00
Mark DePristo
d37f31e349
First version of VariantEval that runs (approximately correctly) with new StratificationManager
2012-03-30 15:31:54 -04:00
Mark DePristo
8971b54b21
Phase II of Stratification manager
...
-- Renamed and reorganized infrastructure
-- StratificationManager now a Map from List<Object> -> V. All key functions are implemented. Less commonly used TODO
-- Ready for hookup to VE
2012-03-30 15:31:54 -04:00
Mark DePristo
9f1cd0ff66
Lots of new functionality for StratificationStates manager
...
-- Really working according to unit tests
-- A nCombination utils
2012-03-30 15:31:54 -04:00
Mark DePristo
a3d896d80e
Part I of creating a fast state space lookup for VE
...
-- Created a unit tested tree mapping from a List<String> -> integer (StratificationStates). This class is the key infrastructure necessary to create a complete static mapping from all stratification combinations to an offset in a vector of EvalutionContexts for update in map.
-- Minor code cleanup throughout VE (removing unused headers, for example)
2012-03-30 15:31:53 -04:00
Eric Banks
533c283783
Deprecating AlignmentContext.getExtendedEventPileup(). At this point the only walkers left with any relaiance on extended events are Guillermo's pooled code (he'll update soon) and the Pileup walker. David, I'll leave that last one for you (it should be easy). We can now officially rip the extended event code from the engine.
2012-03-30 10:37:14 -04:00
Eric Banks
6b49af253b
Removing dependence on extended events from the RealignerTargetCreator. Did some minor refactoring while I was in there.
2012-03-30 10:33:30 -04:00
Eric Banks
b467cd1dae
Removing dependence on extended events for the remaining Variant Annotator modules.
2012-03-30 09:05:26 -04:00
Eric Banks
b21889812d
Removing some more usages of extended events. Not done yet, but almost there.
2012-03-30 01:51:37 -04:00
Eric Banks
ad6ace2439
Resolving merge conflicts
2012-03-30 01:51:09 -04:00
Eric Banks
16bef191c6
UG integration tests updated. A handful of sites are lost because there are only 5 indels and one starts at the beginning of the read so it no longer passes our min threshold (now consistent with GGA), but mostly the depth changes ever so slightly once in a while between extended and normal pileups (I think the normal pileups are correct). I have looked thoroughly in IGV at ALL differences and am happy with the new results. As an aside, the AD is now calculated more accurately for indels.
2012-03-30 01:35:49 -04:00
Eric Banks
f4d4969f23
Don't ever return null for the list of GL models
2012-03-30 00:22:40 -04:00
Eric Banks
44ac49aa34
Removing dependencies in the annotations on extended events. Some refactoring involved in this.
2012-03-30 00:17:02 -04:00
Mauricio Carneiro
962fc352ae
unnecessary substitution.
2012-03-29 18:01:43 -04:00
Mauricio Carneiro
b7c59d5d43
this was a dummy test I was using to figure out what the problem was. Deleting it.
2012-03-29 18:00:25 -04:00
Mauricio Carneiro
cbd21c6339
Nasty, nasty.....
...
VariantEval is overly abusive of the GATKReport (lack of) spec.
1. It converts numeric values (longs, integers and doubles) to string before sending to the Report, then expects it to decipher that those were actually numbers.
2. Worse, the stratification modules somehow instead of sending the actual values to the report table, sends a string with the value "unknown" and then abuses the GATKReport spec to convert those "unknown" placeholder values with numbers. Then again, it expects the report to know those are numbers, not strings.
Now that the GATKReport HAS specs, VariantEval needs to be overhauled to conform with that. In the meantime, I have added special ad-hoc treatment to these wrong contracts. It works, and the integration tests all passed without changing any MD5's, but right after Mark and Ryan commit their VariantEval refactors, I will step in to change the way it interacts with the GATKReport, so we can clean up the GATKReport.
No wonder, the printing needed to be O(n^2).
2012-03-29 17:49:53 -04:00
Eric Banks
c2e27729c7
Renaming PileupElement.isBeforeDeletion() to PileupElement.isBeforeDeletedBase() so that it's more clear that it can still be true while inside a deletion. Added PileupElement.isBeforeDeletionStart() to cover the case that I want where we only trigger before the actual deletion event. Similarly for after a deletion. Updated counting code in ConsensusAlleleCounter accordingly.
2012-03-29 17:08:25 -04:00
Ryan Poplin
6da9571829
resolving merge conflicts.
2012-03-29 16:16:28 -04:00
Ryan Poplin
ca96544ed0
All the zero quality N bases in the solid reads are adding lots of extra paths in the assembly graph. We now require a minimum base quality for every base in the kmer before adding it to the graph. The large number of solid reads with unmapped mates was also triggering the active region traversal at every base. We now ignore that check for solid reads.
2012-03-29 16:14:29 -04:00
Eric Banks
e4469a83ee
First attempt at removing all traces of extended events from UG; integration tests are expected to fail.
2012-03-29 14:59:29 -04:00
Eric Banks
e61e162c81
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-03-29 12:33:13 -04:00
Mauricio Carneiro
cf364f26a0
Fixing alignment issue with the GATKReportColumn algorithm
...
Numeric columns were being left-aligned when they should be right-aligned. Fixed it.
2012-03-29 12:28:49 -04:00
Mauricio Carneiro
f80bd4276a
fixed estimated Q reported calculation in the gatherer
2012-03-29 12:28:43 -04:00
Mauricio Carneiro
8a9fb514b6
simplifying GATKReportColumn constructor logic
2012-03-29 12:28:37 -04:00
Eric Banks
e861106398
Accidentally erased important line
2012-03-29 11:08:54 -04:00
Eric Banks
e4a225ed09
Move the code to subset a Variant Context to fewer alleles (including restructuring the PLs appropriately) into VariantContextUtils where it can be used generally.
2012-03-29 11:07:37 -04:00
Guillermo del Angel
c9c3f6b0fc
Minor UG Engine refactoring/cleanup: instead of passing in the # of samples separately from sample set, pass in ploidy instead and compute # of chromosomes internally - will help later on with code clarity
2012-03-29 11:05:42 -04:00
Ryan Poplin
9684a2efb0
HaplotypeCaller: Variants found on the same haplotype are now written out with phased genotypes. There are serious eval issues with MNPs so disabling them for now.
2012-03-29 09:41:29 -04:00
Guillermo del Angel
a0843f125e
Forgot to add file itself for new unit test
2012-03-28 21:08:18 -04:00
Guillermo del Angel
250adca350
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-03-28 21:01:49 -04:00
Guillermo del Angel
e0ab4e4b30
Refactoring so that ConsensusAlleleCounter can use regular pileups and can operate correctly. This involved adding utility functions to ReadBackedPileup to count # of insertions/deletions right after current position. Added unit test for IndelGenotypeLikelihoods, esp. ConsensusAlleleCounter logic
2012-03-28 21:01:31 -04:00
Mauricio Carneiro
8f0e9d74ce
GATKReportTable output refactor
...
writing out a GATKReportTable was O(n^2)!!!!!
New implementation is O(n). What a difference, when N = 2^16...
2012-03-28 17:19:12 -04:00
Guillermo del Angel
62ee31afba
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-03-28 16:00:38 -04:00
Guillermo del Angel
1eee9d512d
Make computeConsensusAlleles protected inside IndelGenotypeLikelihoodsCalculationModel so we can use it in unit tests, b) make ConsensusAlleleCounter work if no extended event pileup is present (necessary for ext. event removal)
2012-03-28 15:41:39 -04:00
Mauricio Carneiro
bb36cd4adf
Quick fixes to BQSRGatherer and GATKReportTable
...
* when gathering, be aware that some keys will be missing from some tables.
* when a gatktable has no elements, it should still output the header so we know it had no records
2012-03-28 09:07:54 -04:00
Roger Zurawicki
63cf7ec7ec
Added more primitives to GATK Report Column Type
...
- The Integer column type now accepts byte and shorts
- Updated Unit Tests and added a new testParse() test
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-03-28 09:07:54 -04:00
Guillermo del Angel
d2586911a4
Forgot to add tolerance to new MathUtils unit tests
2012-03-28 08:18:36 -04:00
Guillermo del Angel
08f7d47d7c
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-03-28 07:42:09 -04:00
Mark DePristo
12aa72f200
Merged bug fix from Stable into Unstable
2012-03-27 22:43:00 -04:00
Mark DePristo
979a84a252
Bugfix for thread unsafe PL cache
...
-- See https://getsatisfaction.com/gsa/topics/unifiedgenotyper_error_indel?utm_content=topic_link&utm_medium=email&utm_source=new_topic
-- Solution is to use a fixed cache that's never updated on the fly. My changes limit us to having no more than 500 alleles at a site, which I hope is ok but easy enough to up to a ridiculously large number.
2012-03-27 22:42:30 -04:00
Guillermo del Angel
8f34412fb8
First Pool Caller exact model: silly straightforward math implementation of biallelic pool caller exact likelihood model, no attempt and any smartness or optimization, no support yet for generalized multiallelic form, just hooking up for testing
2012-03-27 20:59:44 -04:00
Guillermo del Angel
ed322bd73f
Fix again merge issues
2012-03-27 15:03:13 -04:00
Guillermo del Angel
b4a7c0d98d
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-03-27 15:01:03 -04:00
Guillermo del Angel
343a061b1c
Fix merge issues when incorporating new AF calculations changes
2012-03-27 15:00:44 -04:00
Mauricio Carneiro
1b75663178
BQSR Gatherer implementation and integration tests
...
* restructured the hash tables into one class (RecalibrationReport) that has all the functionality for the different tables and key managers
* optmized empirical qual calculation when merging recalibration reports
* centralized the quality score quantization functionalities
* unified the creating/loading of all the key manager/hash table structures.
* added unit tests for the gatherer (disabled because gatk report needs to be sorted for automated testing)
* added integration tests for BQSR and on-the-fly recalibration
2012-03-27 13:50:22 -05:00
Ryan Poplin
5dbd3625cd
Initial algorithm for choosing best alternate haplotypes to genotype based on the likelihoods from all samples instead of choosing for each sample independently. Simple tradeoff of penalty for increasing model complexity and likelihood of the data.
2012-03-27 13:38:52 -04:00