gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Chris Hartl	199476eae1	Three squashed commits: 1) Add in checks for input parameters in MathUtils method. I was careful to use the bottom-level methods whenever possible, so that parameters don't needlessly go through multiple checks (so for instance, the parameters n and k for a binomial aren't checked on log10binomial, but rather in the log10binomialcoefficient subroutine). This addresses JIRA GSA-767 Unit tests pass (we'll let bamboo deal with the integrations) 2) Address reviewer comments (change UserExceptions to IllegalArgumentExceptions). 3) .isWellFormedDouble() tests for infinity and not strictly positive infinity. Allow negative-infinity values for log10sumlog10 (as these just correspond to p=0). After these commits, unit and integration tests now pass, and GSA-767 is done. rebase and fix conflict: public/java/src/org/broadinstitute/sting/utils/MathUtils.java	2013-05-31 00:26:50 -04:00
Ryan Poplin	61af37d0d2	Create a new normalDistributionLog10 function that is unit tested for use in the VQSR.	2013-05-30 16:00:08 -04:00
Eric Banks	58424e56be	Setting the reduce reads count tag was all wrong in a previous commit; fixing. RR counts are represented as offsets from the first count, but that wasn't being done correctly when counts are adjusted on the fly. Also, we were triggering the expensive conversion and writing to binary tags even when we weren't going to write the read to disk. The code has been updated so that unconverted counts are passed to the GATKSAMRecord and it knows how to encode the tag correctly. Also, there are now methods to write to the reduced counts array without forcing the conversion (and methods that do force the conversion). Also: 1. counts are now maintained as ints whenever possible. Only the GATKSAMRecord knows about the internal encoding. 2. as discussed in meetings today, we updated the encoding so that it can now handle a range of values that extends to 255 instead of 127 (and is backwards compatible). 3. tests have been moved from SyntheticReadUnitTest to GATKSAMRecordUnitTest accordingly.	2013-04-30 13:45:42 -04:00
Eric Banks	379a9841ce	Various bug fixes for recent Reduce Reads additions plus solution implemented for low MQ reads. 1. Using cumulative binomial probability was not working at high coverage sites (because p-values quickly got out of hand) so instead we use a hybrid system for determining significance: at low coverage sites use binomial prob and at high coverage sites revert to using the old base proportions. Then we get the best of both worlds. As a note, coverage refers to just the individual base counts and not the entire pileup. 2. Reads were getting lost because of the comparator being used in the SlidingWindow. When read pairs had the same alignment end position the 2nd one encountered would get dropped (but added to the header!). We now use a PriorityQueue instead of a TreeSet to allow for such cases. 3. Each consensus keeps track of its own number of softclipped bases. There was no reason that that number should be shared between them. 4. We output consensus filtered (i.e. low MQ) reads whenever they are present for now. Don't lose that information. Maybe we'll decide to change this in the future, but for now we are conservative. 5. Also implemented various small performance optimizations based on profiling. Added unit tests to cover these changes; systematic assessment now tests against low MQ reads too.	2013-04-24 18:18:50 -04:00
Eric Banks	3477e092ea	Minor: bump up the amount of cached log10 data in MathUtils so that Monkol can actually call 50K samples.	2013-04-19 08:39:08 -04:00
Eric Banks	5bce0e086e	Refactored binomial probability code in MathUtils. * Moved redundant code out of UGEngine * Added overloaded methods that assume p=0.5 for speed efficiency * Added unit test for the binomialCumulativeProbability method	2013-04-16 18:19:07 -04:00
Eric Banks	df189293ce	Improve compression in Reduce Reads by incorporating probabilistic model and global het compression The Problem: Exomes seem to be more prone to base errors and one error in 20x coverage (or below, like most regions in an exome) causes RR (with default settings) to consider it a variant region. This seriously hurts compression performance. The Solution: 1. We now use a probabilistic model for determining whether we can create a consensus (in other words, whether we can error correct a site) instead of the old ratio threshold. We calculate the cumulative binomial probability of seeing the given ratio and trigger consensus creation if that pvalue is lower than the provided threshold (0.01 by default, so rather conservative). 2. We also allow het compression globally, not just at known sites. So if we cannot create a consensus at a given site then we try to perform het compression; and if we cannot perform het compression that we just don't reduce the variant region. This way very wonky regions stay uncompressed, regions with one errorful read get fully compressed, and regions with one errorful locus get het compressed. Details: 1. -minvar is now deprecated in favor of -min_pvalue. 2. Added integration test for bad pvalue input. 3. -known argument still works to force het compression only at known sites; if it's not included then we allow het compression anywhere. Added unit tests for this. 4. This commit includes fixes to het compression problems that were revealed by systematic qual testing. Before finalizing het compression, we now check for insertions or other variant regions (usually due to multi-allelics) which can render a region incompressible (and we back out if we find one). We were checking for excessive softclips before, but now we add these tests too. 5. We now allow het compression on some but not all of the 4 consensus reads: if creating one of the consensuses is not possible (e.g. because of excessive softclips) then we just back that one consensus out instead of backing out all of them. 6. We no longer create a mini read at the stop of the variant window for het compression. Instead, we allow it to be part of the next global consensus. 7. The coverage test is no longer run systematically on all integration tests because the quals test supercedes it. The systematic quals test is now much stricter in order to catch bugs and edge cases (very useful!). 8. Each consensus (both the normal and filtered) keep track of their own mapping qualities (before the MQ for a consensus was affected by good and bad bases/reads). 9. We now completely ignore low quality bases, unless they are the only bases present in a pileup. This way we preserve the span of reads across a region (needed for assembly). Min base qual moved to Q15. 10.Fixed long-standing bug where sliding window didn't do the right thing when removing reads that start with insertions from a header. Note that this commit must come serially before the next commit in which I am refactoring the binomial prob code in MathUtils (which is failing and slow).	2013-04-16 18:19:06 -04:00
Chris Hartl	73d1c319bf	Rarely-occurring logic bugfix for GenotypeConcordance, streamlining and testing of MathUtils Currently, the multi-allelic test is covering the following case: Eval A T,C Comp A C reciprocate this so that the reverse can be covered. Eval A C Comp A T,C And furthermore, modify ConcordanceMetrics to more properly handle the situation where multiple alternate alleles are available in the comp. It was possible for an eval C/C sample to match a comp T/T sample, so long as the C allele were also present in at least one other comp sample. This comes from the fact that "truth" reference alleles can be paired with any allele also present in the truth VCF, while truth het/hom var sites are restricted to having to match only the alleles present in the genotype. The reason that truth ref alleles are special case is as follows, imagine: Eval: A G,T 0/0 2/0 2/2 1/1 Comp: A C,T 0/0 1/0 0/0 0/0 Even though the alt allele of the comp is a C, the assessment of genotypes should be as follows: Sample1: ref called ref Sample2: alleles don't match (the alt allele of the comp was not assessed in eval) Sample3: ref called hom-var Sample4: alleles don't match (the alt allele of the eval was not assessed in comp) Before this change, Sample2 was evaluated as "het called het" (as the T allele in eval happens to also be in the comp record, just not in the comp sample). Thus: apply current logic to comp hom-refs, and the more restrictive logic ("you have to match an allele in the comp genotype") when the comp is not reference. Also in this commit,major refactoring and testing for MathUtils. A large number of methods were not used at all in the codebase, these methods were removed: - dotProduct(several types). logDotProduct is used extensively, but not the real-space version. - vectorSum - array shuffle, random subset - countOccurances (general forms, the char form is used in the codebase) - getNMaxElements - array permutation - sorted array permutation - compare floats - sum() (for integer arrays and lists). Final keyword was extensively added to MathUtils. The ratio() and percentage() methods were revised to error out with non-positive denominators, except in the case of 0/0 (which returns 0.0 (ratio), or 0.0% (percentage)). Random sampling code was updated to make use of the cleaner implementations of generating permutations in MathUtils (allowing the array permutation code to be retired). The PaperGenotyper still made use of one of these array methods, since it was the only walker it was migrated into the genotyper itself. In addition, more extensive tests were added for - logBinomialCoefficient (Newton's identity should always hold) - logFactorial - log10sumlog10 and its approximation All unit tests pass	2013-03-28 23:25:28 -04:00
Mark DePristo	3b67aa8aee	Final edge case bug fixes to QualityUtil routines -- log10 functions in QualityUtils allow -Infinity to allow log10(0.0) values -- Fix edge condition of log10OneMinusX failing with Double.MIN_VALUE -- Fix another edge condition of log10OneMinusX failing with a small but not min_value double	2013-02-16 07:31:38 -08:00
Mark DePristo	9e28d1e347	Cleanup and unit tests for QualityUtils -- Fixed a few conversion bugs with edge case quals (ones that were very high) -- Fixed a critical bug in the conversion of quals that was causing near capped quals to fall below their actual value. Will undoubtedly need to fix md5s -- More precise prob -> qual calculations for very high confidence events in phredScaleCorrectRate, trueProbToQual, and errorProbToQual. Very likely to improve accuracy of many calculations in the GATK -- Added errorProbToQual and trueProbToQual calculations that accept an integer cap, and perform the (tricky) conversion from int to byte correctly. -- Full docs and unit tests for phredScaleCorrectRate and phredScaleErrorRate. -- Renamed probToQual to trueProbToQual -- Added goodProbability and log10OneMinusX to MathUtils -- Went through the GATK and cleaned up many uses of QualityUtils -- Cleanup constants in QualityUtils -- Added full docs for all of the constants -- Rename MAX_QUAL_SCORE to MAX_SAM_QUAL_SCORE for clarity -- Moved MAX_GATK_USABLE_Q_SCORE to RecalDatum, as it's s BQSR specific feature -- Convert uses of QualityUtils.errorProbToQual(1-x) to QualityUtils.trueProbToQual(x) -- Cleanup duplicate quality score routines in MathUtils. Moved and renamed MathUtils.log10ProbabilityToPhredScale => QualityUtils.phredScaleLog10ErrorRate. Removed 3 routines from MathUtils, and remapped their usages into the better routines in QualityUtils	2013-02-16 07:31:37 -08:00
Mark DePristo	9e43a2028d	Making band pass filter size, sigma, active region max size and extension all accessible from the command line	2013-01-24 13:47:59 -05:00
Mark DePristo	7fd27a5167	Add band pass filtering activity profile -- Based on the new incremental activity profile -- Unit Tested! Fixed a few bugs with the old band pass filter -- Expand IncrementalActivityProfileUnitTest to test the band pass filter as well for basic properties -- Add new UnitTest for BandPassIncrementalActivityProfile -- Added normalizeFromRealSpace to MathUtils -- Cleanup unused code in new activity profiles	2013-01-23 13:46:01 -05:00
Mauricio Carneiro	2a4ccfe6fd	Updated all JAVA file licenses accordingly GSATDG-5	2013-01-10 17:06:41 -05:00
Mark DePristo	99c9031cb4	Merge AFCalcResultTracker into StateTracker, cleanup -- These two classes were really the same, and now they are actually the same! -- Cleanuped the interfaces, removed duplicate data -- Added lots of contracts, some of which found numerical issues with GeneralPloidyExactAFCalc (which have been patched over but not fixed) -- Moved goodProbability and goodProbabilityVector utilities to MathUtils. Very useful for contracts!	2012-10-21 12:42:31 -04:00
Mark DePristo	c74d7061fe	Added AFCalcResultUnitTest -- Ensures that the posteriors remain within reasonable ranges. Fixed bug where normalization of posteriors = {-1e30, 0.0} => {-100000, 0.0} which isn't good. Now tests ensure that the normalization process preserves log10 precision where possible -- Updated MathUtils to make this possible	2012-10-16 08:11:06 -04:00
Ryan Poplin	25be94fbb8	Increasing the precision of MathUtils.approximateLog10SumLog10 from 1E-3 to 1E-4. Genotyper integration tests change as a result. Expanding the unit tests of MathUtils.log10sumLog10.	2012-10-15 13:24:32 -04:00
Mark DePristo	6b639f51f0	Finalizing new exact model and tests -- New capabilities in IndependentAllelesDiploidExactAFCalc to actually apply correct theta^n.alt.allele prior. -- Tests that theta^n.alt.alleles is being applied correctly -- Bugfix: keep in logspace when computing posterior probability in toAFCalcResult in AFCalcResultTracker.java -- Bugfix: use only the alleles used in genotyping when assessing if an allele is polymorphic in a sample in UnifiedGenotyperEngine	2012-10-15 07:53:57 -04:00
Mark DePristo	cb857d1640	AFCalcs must be made by factory method now -- AFCalcFactory is the only way to make AFCalcs now. There's a nice ordered enum there describing the models and their ploidy and max alt allele restrictions. The factory makes it easy to create them, and to find models that work for you given your ploidy and max alt alleles. -- AFCalc no longer has UAC constructor -- only AFCalcFactory does. Code cleanup throughout -- Enabling more unit tests, all of which almost pass now (except for IndependentAllelesDiploidExactAFCalc which will be fixed next) -- It's now possible to run the UG / HC with any of the exact models currently in the system. -- Code cleanup throughout the system, reorganizing the unit tests in particular	2012-10-15 07:53:56 -04:00
Mark DePristo	6bbe750e03	Continuing work on IndependentAllelesDiploidExactAFCalc -- Continuing to get IndependentAllelesDiploidExactAFCalc working correctly. A long way towards the right answer now, but still not there -- Restored (but not tested) OriginalDiploidExactAFCalc, the clean diploid O(N) version for Ryan -- MathUtils.normalizeFromLog10 no longer returns -Infinity when kept in log space, enforces the min log10 value there -- New convenience method in VariantContext that looks up the allele index in the alleles	2012-10-15 07:53:56 -04:00
Mark DePristo	91aeddeb5a	Steps on the way to a fully described and semantically meaningful AFCalcResult -- AFCalcResult now sports a isPolymorphic and getLog10PosteriorAFGt0ForAllele functions that allow you to ask individually whether specific alleles we've tried to genotype are polymorphic given some confidence threshold -- Lots of contracts for AFCalcResult -- Slowly killing off AFCalcResultsTracker -- Fix for the way UG checks for alt alleles being polymorphic, which is now properly conditioned on the alt allele -- Change in behavior for normalizeFromLog10 in MathUtils: now sets the log10 for 0 values to -10000, instead of -Infinity, since this is really better to ensure that we don't have -Infinity values traveling around the system -- ExactAFCalculationModelUnitTest now checks for meaningful pNonRef values for each allele, uncovering a bug in the GeneralPloidy (not fixed, related to Eric's summation issue from long ago that was reverted) in that we get different results for diploid and general-ploidy == 2 models for multi-allelics.	2012-10-15 07:53:56 -04:00
Mark DePristo	c82aa01e0e	Generalize testing infrastructure to allow us to run specific n.samples calculation	2012-10-15 07:53:55 -04:00
David Roazen	cb84a6473f	Downsampling: experimental engine integration -Off by default; engine fork isolates new code paths from old code paths, so no integration tests change yet -Experimental implementation is currently BROKEN due to a serious issue involving file spans. No one can/should use the experimental features until I've patched this issue. -There are temporarily two independent versions of LocusIteratorByState. Anyone changing one version should port the change to the other (if possible), and anyone adding unit tests for one version should add the same unit tests for the other (again, if possible). This situation will hopefully be extremely temporary, and last only until the experimental implementation is proven.	2012-09-06 15:03:27 -04:00
Guillermo del Angel	b61ecc7c19	Fix merge conflicts	2012-08-16 20:45:52 -04:00
Guillermo del Angel	d26183e0ec	First preliminary big refactoring of UG annotation engine. Goals: a) Remove gigantic hack that cached per-read haplotype likelihoods in a static array so that annotations would go back and retrieve them, b) unify interface for annotations between HaplotypeCaller and UnifiedGenotyper, c) as a consequence, removed and cleaned duplicated code. As a bonus, annotations have now more relevant info to help them compute values. Major idea is that per-read haplotype likelihoods are now stored in a single unified object of class PerReadAlleleLikelihoodMap. Class implementation in theory hides internal storage details from outside work (still may need work cleaning up interface), and this object(or rather, a Map from Sample->perReadAlleleLikelihoodMap) is produced by UGCalcLikelihoods. The genotype calculation is also able to potentially use this info if needed. All InfoFieldAnnotations now get an extra argument with this map. Currently, this map is only produced for indels in UG, or for all variants within HaplotypeCaller. If this map is absent (SNPs in UG), the old Pileup interface is used, but it's avoided whenever possible. FORMAT annotations are not yet changed but will be focus of second step. Major benefit will be that annotations will be able to very easily discard non-informative reads for certain events. HaplotypeCaller also uses this new class, and no longer hard-codes the mapping of allele ->list(reads) but instead uses the same objects and interfaces as the rest of the modules. Code still needs further testing/cleaning/reviewing/debugging	2012-08-16 20:36:53 -04:00
Eric Banks	dac3958461	Killing off some FindBugs 'Usability' issues	2012-08-16 13:32:44 -04:00
Guillermo del Angel	9e25b209e0	First pass of implementation of Reduced Reads with HaplotypeCaller. Main changes: a) Active region: scale PL's by representative count to determine whether region is active. b) Scale per-read, per-haplotype likelihoods by read representative counts. A read representative count is (temporarily) defined as the average representative count over all bases in read, TBD whether this is good enough to avoid biases in GL's. c) DeBruijn assembler inserts kmers N times in graph, where N is min representative count of read over kmer span - TBD again whether this is the best approach. d) Bug fixes in FragmentUtils: logic to merge fragments was wrong in cases where there is discrepancy of overlaps between unclipped/soft clipped bases. Didn't affect things before but RR makes prevalence of hard-clipped bases in CIGARs more prevalent so this was exposed. e) Cache read representative counts along with read likelihoods associated with a Haplotype. Code can/should be cleaned up and unified with PairHMMIndelErrorModelCode, as well as refactored to support arbitrary ploidy in HaplotypeCaller	2012-08-03 12:24:23 -04:00
Ryan Poplin	a0890126a8	ActiveRegionWalker's isActive function returns a results object now instead of just a double.	2012-07-27 11:01:39 -04:00
Khalid Shakir	c3c7f17d90	Updated hard limit MathUtils.MAXN number of samples from 11,000 to 50,000. Instead of creating a supposed network temporary directory locally which then fails when remote nodes try to access the non-existant dir, now checking to see if they network directory is available and throwing a SkipException to bypass the test when it cannot be run. TODO: Throw similar SkipExceptions when fastas are not available. Right now instead of skipping the test or failing fast the REQUIRE_NETWORK_CONNECTION=false means that the errors popup later when the networked fastas aren't found.	2012-05-29 11:18:22 -04:00
Guillermo del Angel	429800a192	Fix corner case rounding issue in MathUtils unit test: 10^logFactorial(4)) was 23.999999... which if cast directly yielded 23 - so, do pre-rounding to ensure correct integer result if caller will cast value.	2012-05-02 09:57:06 -04:00
Guillermo del Angel	76a95fdedf	Full implementation of multiallelic exact model for pools. Still super-linear so not useable at scale but it should be a gold standard to compare to. Unit tests are not exhaustive yet, will be expanded to provide better test coverage. Small inconsequential optimization in MathUtils: we're already caching log10(factorial(n)) for large n, so might as well use the cached values to compute binomial and multinomial coefficients instead of the log-gamma approximation which is more expensive (doesn't seem to save much time either in PoolCaller nor in UG though).	2012-05-02 09:24:28 -04:00
Guillermo del Angel	730208133b	Several fixes and improvements to Pool caller with ancillary test functions (not done yet): a) Utility class called Probability Vector that holds a log-probability vector and has the ability to clip ends that deviate largely from max value. b) Used this class to hold site error model, since likelihoods of error model away from peak are so far down that it's not worth computing with them and just wastes time. c) Expand unit tests and add an exhaustive test for ErrorModel class. d) Corrected major math bug in ErrorModel uncovered by exhaustive test: log(e^x) is NOT x if log's base = 10. e) Refactored utility functions that created artificial pileups for testing into separate class ArtificialPileupTestProvider. Right now functionality is limited (one artificial contig of 10 bp), can only specify pileups in one position with a given number of matches and mismatches to ref) but functionality will be expanded in future to cover more test cases. f) Use this utility class for IndelGenotypeLikelihoods unit test and for PoolGenotypeLikelihoods unit test (the latter testing functionality still not done). g) Linearized implementation of biallelic exact model (very simple approach, similar to diploid exact model, just abort if we're past the max value of AC distribution and below a threshold). Still need to add unit tests for this and to expand to multiallelic model. h) Update integration test md5's due to minor differences stemming from linearized exact model and better error model math	2012-04-27 14:41:17 -04:00
Ryan Poplin	dcc4871468	minor misc optimizations to PairHMM	2012-04-18 15:02:26 -04:00
Eric Banks	6ddf2170b6	More efficient implementation of the sum of the allele frequency posteriors matrix using a pre-allocated cache as discussed in group meeting last week. Now, when the cache is filled, we safely collapse down to a single value in real space and put the un-re-centered log10 value back into the front of the cache. Thanks to all for the help and advice.	2012-04-09 11:46:16 -04:00
Guillermo del Angel	8f34412fb8	First Pool Caller exact model: silly straightforward math implementation of biallelic pool caller exact likelihood model, no attempt and any smartness or optimization, no support yet for generalized multiallelic form, just hooking up for testing	2012-03-27 20:59:44 -04:00
Guillermo del Angel	343a061b1c	Fix merge issues when incorporating new AF calculations changes	2012-03-27 15:00:44 -04:00
Guillermo del Angel	1c424c0daf	Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable	2012-03-26 15:15:50 -04:00
Ryan Poplin	019145175b	Major optimizations to graph construction through better use of built in graph.containsVertex and vertex.equals methods. Minor optimizations to MathUtils.approximateLog10SumLog10 method	2012-03-26 11:32:44 -04:00
Guillermo del Angel	deb4586559	Next intermediate commit for new pool caller structure: a) Bug fixes in pool GL computation. Now, correct GL's are returned per each pool to the UG engine. Work still needs to be done in redoing interface with exact model. b) Added unit tests for new MathUtils dot product and logDotProduct functions. c) Refactorings of UnifiedGentotyperEngine since N (size of prior/posterior arrays) is no longer necessarily nSamples+1 but, in general, nSamplesPerPool*nPools+1	2012-03-24 21:49:43 -04:00
Guillermo del Angel	f198cec5e2	Temp commit: new structure for pool caller, now all work is in the same framework as in UG. There's a new genotype calculation model, PoolGenotypeCalculationModel, that does all the work and plugs into UnifiedGenotyperEngine. A new AF module for pools is upcoming. Old pool caller will be removed once all work is migrated	2012-03-22 15:46:39 -04:00
Ryan Poplin	a29fc6311a	New debug option to output the assembly graph in dot format. Merge nodes in assembly graph when possible.	2012-03-21 15:48:55 -04:00
Mauricio Carneiro	ca11ab39e7	BitSets keys to lower BQSR's memory footprint Infrastructure: * Generic BitSet implementation with any precision (up to long) * Two's complement implementation of the bit set handles negative numbers (cycle covariate) * Memoized implementation of the BitSet utils for better performance. * All exponents are now calculated with bit shifts, fixing numerical precision issues with the double Math.pow. * Replace log/sqrt with bitwise logic to get rid of numerical issues BQSR: * All covariates output BitSets and have the functionality to decode them back into Object values. * Covariates are responsible for determining the size of the key they will use (number of bits). * Generalized KeyManager implementation combines any arbitrary number of covariates into one bitset key with event type * No more NestedHashMaps. Single key system now fits in one hash to reduce hash table objects overhead Tests: * Unit tests added to every method of BitSetUtils * Unit tests added to the generalized key system infrastructure of BQSRv2 (KeyManager) * Unit tests added to the cycle and context covariates (will add unit tests to all covariates)	2012-03-16 13:01:48 -04:00
Eric Banks	2314787767	Generalizing to avoid JDK 1.7 incompatibilities	2012-03-12 22:50:59 -04:00
Ryan Poplin	14a77b1e71	Getting rid of redundant methods in MathUtils. Adding unit tests for approximateLog10SumLog10 and normalizeFromLog10. Increasing the precision of the Jacobian approximation used by approximateLog10SumLog which changes the UG+HC integration tests ever so slightly.	2012-03-05 12:28:32 -05:00
Mauricio Carneiro	d379c3763a	DNA Sequence to BitSet and vice-versa conversion tools * Turns DNA sequences (for context covariates) into bit sets for maximum compression * Allows variable context size representation guaranteeing uniqueness. * Works with long precision, so it is limited to a context size of 31 bases (can be extended with BigNumber precision if necessary). * Unit Tests added	2012-02-29 19:25:20 -05:00
Mauricio Carneiro	75783af6fc	int <-> BitSet conversion utils for MathUtils * added unit tests.	2012-02-21 14:10:36 -05:00
Mauricio Carneiro	4a57add6d0	First implementation of DiagnoseTargets * calculates and interprets the coverage of a given interval track * allows to expand intervals by specified number of bases * classifies targets as CALLABLE, LOW_COVERAGE, EXCESSIVE_COVERAGE and POOR_QUALITY. * outputs text file for now (testing purposes only), soon to be VCF. * filters are overly aggressive for now.	2012-02-03 17:12:43 -05:00
Mauricio Carneiro	3dd6a1f962	Adding some generic sum and average functions to MathUtils	2012-02-03 17:12:43 -05:00
Guillermo del Angel	966387ca0b	Next intermediate commit in the pool caller. Lots of bug fixes and now we can emit true vcf's with calls in discovery mode (still of unknown quality) - old validation mode is temporarily broken,will be fixed in next refactoring.	2012-01-23 09:22:31 -05:00
Guillermo del Angel	b123416c4c	Resolve stale merge changes	2012-01-18 20:56:36 -05:00
Guillermo del Angel	2eb45340e1	Initial, raw, mostly untested version of new pool caller that also does allele discovery. Still needs debugging/refining. Main modification is that there is a new operation mode, set by argument -ALLELE_DISCOVERY_MODE, which if true will determine optimal alt allele at each computable site and will compute AC distribution on it. Current implementation is not working yet if there's more than one pool and it will only output biallelic sites, no functionality for true multi-allelics yet	2012-01-18 20:54:10 -05:00

1 2

67 Commits (50b4c130ca8270209e85f260c60383969ecd5061)