Commit Graph

943 Commits (f968b8a58bd75139d8f8af0c4c709269f23de83f)

Author SHA1 Message Date
Khalid Shakir de13f41fc3 One step closer to a proper test-utils artifact. Using the maven-jar-plugin to create a test classifer, excluding actual tests, until we can properly separate the classes into separate artifacts/modules. 2014-02-03 13:50:46 -05:00
Khalid Shakir caa76cdac4 Added maven pom.xmls for various artifacts. 2014-02-03 13:50:46 -05:00
Khalid Shakir 1e25a758f5 Moved files to maven directories.
Here are the git moved directories in case other files need to be moved during a merge:
  git-mv private/java/src/        private/gatk-private/src/main/java/
  git-mv private/R/scripts/       private/gatk-private/src/main/resources/
  git-mv private/java/test/       private/gatk-private/src/test/java/
  git-mv private/testdata/        private/gatk-private/src/test/resources/
  git-mv private/scala/qscript/   private/queue-private/src/main/qscripts/
  git-mv private/scala/src/       private/queue-private/src/main/scala/
  git-mv protected/java/src/      protected/gatk-protected/src/main/java/
  git-mv protected/java/test/     protected/gatk-protected/src/test/java/
  git-mv public/java/src/         public/gatk-framework/src/main/java/
  git-mv public/java/test/        public/gatk-framework/src/test/java/
  git-mv public/testdata/         public/gatk-framework/src/test/resources/
  git-mv public/scala/qscript/    public/queue-framework/src/main/qscripts/
  git-mv public/scala/src/        public/queue-framework/src/main/scala/
  git-mv public/scala/test/       public/queue-framework/src/test/scala/
2014-02-03 13:50:44 -05:00
Valentin Ruano-Rubio 89c4e57478 gVCF <NON_REF> in all vcf lines including variant ones when –ERC gVCF is requested.
Changes:
-------

  <NON_REF> likelihood in variant sites is calculated as the maximum possible likelihood for an unseen alternative allele: for reach read is calculated as the second best likelihood amongst the reported alleles.

  When –ERC gVCF, stand_conf_emit and stand_conf_call are forcefully set to 0. Also dontGenotype is set to false for consistency sake.

  Integration test MD5 have been changed accordingly.

Additional fix:
--------------

  Specially after adding the <NON_REF> allele, but also happened without that, QUAL values tend to go to 0 (very large integer number in log 10) due to underflow when combining GLs (GenotypingEngine.combineGLs). To fix that combineGLs has been substituted by combineGLsPrecise that uses the log-sum-exp trick.

  In just a few cases this change results in genotype changes in integration tests but after double-checking using unit-test and difference between combineGLs and combineGLsPrecise in the affected integration test, the previous GT calls were either border-line cases and or due to the underflow.
2014-01-30 11:23:33 -05:00
Valentin Ruano-Rubio 748d2fdf92 Added Integration test to verify the bugs are not there anymore as reported in pivotracker 2014-01-26 23:29:31 -05:00
Valentin Ruano-Rubio 9e7bf75e89 Fix for the PairHMM transition probability miscalculation.
Problem:

matchToMatch transition calculation was wrong resulting in transition probabilites coming out of the Match state that added more than 1.

Reports:

https://www.pivotaltracker.com/s/projects/793457/stories/62471780
https://www.pivotaltracker.com/s/projects/793457/stories/61082450

Changes:

The transition matrix update code has been moved to a common place in PairHMMModel to dry out its multiple copies.

MatchToMatch transtion calculation has been fixed and implemented in PairHMMModel.

Affected integration test md5 have been updated, there were no differences in GT fields and example differences always implied
small changes in likelihoods that is what is expected.
2014-01-26 16:30:36 -05:00
Ryan Poplin bdd06ebfc2 Merge pull request #478 from broadinstitute/eb_generalize_hc_values_as_args
Pulled out some hard-coded values from the read-threading and isActive c...
2014-01-21 09:01:54 -08:00
Eric Banks 9e858270d7 Moving this test up one level to where it actually belongs. 2014-01-19 02:33:11 -05:00
Eric Banks 64d5bf650e Pulled out some hard-coded values from the read-threading and isActive code of the HC, and made them into a single argument.
In unifying the arguments it was clear that the values were inconsistent throughout the code, so now there's a
single value that is intended to be more liberal in what it allows in (in an attempt to increase sensitivity).

Very little code actually changes here, but just about every md5 in the HC integration tests are different (as
expected).  Added another integration test for the new argument.

To be used by David R to test his per-branch QC framework: does this commit make the HC look better against the KB?
2014-01-19 01:15:13 -05:00
Eric Banks de56134579 Fixed up and refactored what seems to be a useful private tool to create simulated reads around a VCF.
It didn't completely work before (it was hard-coded for a particular long-lost data set) but it should work now.
Since I thought that it might prove useful to others, I moved it to protected and added integration tests.

GERALDINE: NEW TOOL ALERT!
2014-01-15 13:49:31 -05:00
Eric Banks 9f1ab0087a Added in a check for what would be an empty allele after trimming. 2014-01-15 11:04:19 -05:00
Ryan Poplin 201ad398ac Merge pull request #473 from broadinstitute/eb_fix_qd_indel_normalization
The QD normalization for indels was busted and is now fixed.
2014-01-14 08:56:19 -08:00
Eric Banks e4fdc5ac44 Merge pull request #474 from broadinstitute/eb_fix_haplotype_resolver_PT63333488
Fixing the Haplotype Resolver so that it doesn't complain about missing header lines
2014-01-14 07:36:53 -08:00
Eric Banks fd511d12a2 Fixing the Haplotype Resolver so that it doesn't complain about missing header lines.
The code comments very clearly state that INFO fields shouldn't be propagated into the output,
but someone must have accidentally changed it afterwards.  This is just a simple one-line fix
to make sure the code adhered to the comments.

Delivers #63333488.
2014-01-13 22:47:43 -05:00
Geraldine Van der Auwera 8fcad6680b Assorted fixes and improvements to gatkdocs
-Added docs for ERC mode in HC
 -Move RecalibrationPerformance walker since to private since it is experimental and unsupported
 -Updated VR docs and restored percentBad/numBad (but @Hidden) to enable deprecation alert if users try to use them
 -Improved error msg for conflict between per-interval aggregation and -nt
 -Minor clean up in exception docs
 -Added Toy Walkers category for devs and dev supercat (to build out docs for developers)
 -Added more detailed info to GenotypeConcordance doc based on Chris forum post
 -Added system to include min/max argument values in gatkdocs (build gatkdocs with 'ant gatkdocs' to test it, see engine and DoC args for in situ examples)
 -Added tentative min/max argument annotations to DepthOfCoverage and CommandLineGATK arguments (and improved docs while at it)
 -Added gotoDev annotation to GATKDocumentedFeature to track who is the go-to person in GSA for questions & issues about specific walkers/tools (now discreetly indicated in each gatkdoc)
2014-01-13 17:46:22 -05:00
Eric Banks c7e08965d0 The QD normalization for indels was busted and is now fixed.
It is true that indels of length > 1 have higher QUALS than those of length = 1.  But for the HC those
QUALS are not that much higher, and it doesn't continue scaling up as the indels get larger.  So we no
longer normalize by indel length (which massively over-penalizes larger events and effectively drops their
QD to 0).

For the UG the previous normalization also wasn't perfect.  Now we divide the indel length by a factor
of 3 to make sure that QD is consistent over the range of indel lengths.

Integration tests change because QD is different for indels.
Also, got permission from Valentin to archive a failing test that no longer applies.

Thanks to Kurt on the GATK forum for pointing this all out.
2014-01-13 15:23:36 -05:00
droazen 7cd304fb41 Merge pull request #470 from broadinstitute/mf_new_RBP
Mf new rbp
2014-01-13 08:46:27 -08:00
MauricioCarneiro 50cd6781b3 Merge pull request #465 from broadinstitute/eb_improvements_to_ref_confidence_merger
Improvements to ref confidence merger
2014-01-08 10:51:01 -08:00
Eric Banks f172c349f6 Adding the functionality to enable users to input a file of VCFs for -V.
To do this I have added a RodBindingCollection which can represent either a VCF or a
file of VCFs.  Note that e.g. SelectVariants allows a list of RodBindingCollections so
that one can intermix VCFs and VCF lists.

For VariantContext tags with a list, by default the tags for the -V argument are applied
unless overridden by the individual line.  In other words, any given line can have either
one token (the file path) or two tokens (the new tags and the file path).  For example:
foo.vcf
VCF,name=bar bar.vcf

Note that a VCF list file name must end with '.list'.

Added this functionality to CombineVariants, CombineReferenceCalculationVariants, and VariantRecalibrator.
2014-01-08 00:45:00 -05:00
Eric Banks c133909d32 Fixed edge condition in the realigner where a realigned read can sometimes get partially aligned off the end of the contig.
Now we ignore such reads (which is much easier than trying to figure out when to soft-clip).
Added unit test.
2014-01-08 00:37:28 -05:00
Menachem Fromer e33d3dafc6 Add documentation for RBP, and also update the MD5 for the tests now that the output uses HP tags instead of '|', which is now reserved for trio-based phasing 2014-01-03 12:04:47 -05:00
Menachem Fromer d1275651ae Merge remote-tracking branch 'origin/master' into mf_new_RBP 2014-01-03 01:13:40 -05:00
Ryan Poplin 856c1f87c1 Allow for additional input data to be used in the VQSR for clustering but don't carry it forward into the output VCF file.
-- New -a argument in the VQSR for specifying additional data to be used in the clustering
-- New NA12878KB walker which creates ROC curves by partitioning the data along VQSLOD and calculating how many KB TP/FP's are called.
2014-01-02 14:46:04 -05:00
amilev f81a38f596 Merge pull request #446 from broadinstitute/ami-RNAseq-tools
Write a new tool for spliting reads that have N cigar string.
2014-01-01 21:06:25 -08:00
MauricioCarneiro 1223345726 Merge pull request #459 from broadinstitute/eb_fix_bad_hmm_clipping
Fixed up edge condition for clipping long reads in the HMM.
2014-01-01 20:00:34 -08:00
Ami Levy-Moonshine 6da53aea09 Write a new tool for spliting reads that have N cigar string.
For example, this tool can be used for processing bowtie RNA-seq data.
Each read with k N-cigar elemments is plit to k+1 reads. The split is done by hard clipping the bases rest of the bases.

In order to do it, few changes were introduced to some other clipping methods:
- make a segnificant change in ClippingOp.hardClip() that prevent the spliting of read with cigar: 1M2I1N1M3I.
- change getReadCoordinateForReferenceCoordinate in ReadUtil to recognize Ns

create unitTests for that walker:
- change ReadClipperTestUtils to be more general in order to use its code and avoid code duplication
- move some useful methods from ReadClipperTestUtils to CigarUtils

create integration test for that class

small change in a comment in FullProcessingPipeline

last commit:

Address review comments:
- move to protected under walkers/rnaseq
- change the read splitting methods to be more readable and more efficiant
- change (minor changes) some methods in ReadClipper to allow the changes in split reads
- add (minor change) one method to CigarUtils to allow the changes in split reads
- change ReadUtils.getReadCoordinateForReferenceCoordinate to include possible N in the cigar
- address the rest of the review comments (minor changes)

- fix ReadUtilsUnitTest.testReadWithNs acoording to the defult behaviour of getReadCoordinateForReferenceCoordinate (in case of refernce index that fall into deletion, return the read index of the base before the deletion).
- add another test to ReadUtilsUnitTest.testReadWithNs

- Allow the user to print the split positions (not working proparly currently)
2014-01-01 22:21:36 -05:00
Eric Banks bb4c4b1fcd Fixed up edge condition for clipping long reads in the HMM.
MD5s change because some reads were incorrectly getting clipped before.

[delivers #62584746]
2014-01-01 19:05:09 -05:00
Mauricio Carneiro d52bd44867 Move CompareBAMs to private
This is a tool that we use internally validate the ReduceReads development. I think it should be
private. There is no need to improve docs.

[delivers #54703398]
2014-01-01 14:33:23 -05:00
Eric Banks 9665f75ad4 Don't fail in annotations if the wrong tools are calling them, just silently skip them.
This is important for cases when users want to use annotation groups (like all experimental annotations).
2013-12-31 23:45:21 -05:00
Eric Banks 83e09b1f64 Created a new walker to do the full combination of N gVCFs from the HC single-sample ref calc pipeline.
Basically, it does 3 things (as opposed to having to call into 3 separate walkers):
1. merge the records at any given position into a single one with all alleles and appropriate PLs
2. re-genotype the record using the exact AF calculation model
3. re-annotate the record using the VariantAnnotatorEngine

In the course of this work it became clear that we couldn't just use the simpleMerge() method used
by CombineVariants; combining HC-based gVCFs is really a complicated process.  So I added a new
utility method to handle this merging and pulled any related code out of CombineVariants.  I tried
to clean up a lot of that code, but ultimately that's out of the scope of this project.

Added unit tests for correctness testing.
Integration tests cannot be used yet because the HC doesn't output correct gVCFs.
2013-12-31 12:07:56 -05:00
Menachem Fromer 48ef7a1a2f Merge remote-tracking branch 'origin/master' into mf_new_RBP 2013-12-19 10:42:20 -05:00
Valentin Ruano-Rubio 5db520c6fa Fixed issue > 0 log likelihoods using GraphBased likelihood engine reported by Mauricio
Added some integration test to check on the fix
2013-12-13 11:19:57 -05:00
Eric Banks ab33db625f Merge pull request #449 from broadinstitute/eb_move_calc_posteriors_to_protected
Moved CalculatePosteriors from private to protected, in preparation for 3.0
2013-12-07 22:18:46 -08:00
Eric Banks f1970b923e Moved CalculatePosteriors from private to protected, in preparation for 3.0.
Renamed it CalculateGenotypePosteriors.
Also, moved the utility code to a proper utility class instead of where Chris left it.
No actual code modifications made in this commit.
2013-12-08 00:08:34 -05:00
David Roazen 932cd3ada7 Fix 3rd-party library dependency issues in the HC/PairHMM tests
In general, test classes cannot use 3rd-party libraries that are not
also dependencies of the GATK proper without causing problems when,
at release time, we test that the GATK jar has been packaged correctly
with all required dependencies.

If a test class needs to use a 3rd-party library that is not a GATK
dependency, write wrapper methods in the GATK utils/* classes, and
invoke those wrapper methods from the test class.
2013-12-06 13:16:55 -05:00
Eric Banks e022db4690 Added docs for the minPruning argument in the HC 2013-12-05 11:50:56 -05:00
Geraldine Van der Auwera 3ab2f4edb2 Fixed documentation for -deletions argument in the UAC 2013-12-04 19:55:24 -05:00
amilev 0d94019bd6 Merge pull request #434 from broadinstitute/mc_dt_gccontent
Add GC Content to DiagnoseTargets
2013-12-04 09:42:26 -08:00
Joel Thibault 5fe0531b4d Throw a GVCFIndexException when the user doesn't specify the optimal indexing strategy 2013-12-03 23:12:14 -05:00
Mauricio Carneiro 701ede2817 Add GC Content to DiagnoseTargets 2013-12-03 23:04:40 -05:00
Eric Banks 6bee6a1b53 Change the behavior of SelectVariants for PL/AD when it encounters a record that has lost one or more alternate alleles.
Previously, we would strip out the PLs and AD values since they were no longer accurate.  However, this is not ideal because
then that information is just lost and 1) users complain on the forum and post it as a bug and 2) it gives us problems in both
the current and future (single sample) calling pipelines because we subset samples/alleles all the time and lose info.

Now the PLs and AD get correctly selected down.

While I was in there I also refactored some related code in subsetDiploidAlleles().  There were no real changes there - I just
broke it out into smaller chunks as per our best practices.

Added unit tests and updated integration tests.
Addressed reviews.
2013-12-03 09:23:03 -05:00
Valentin Ruano-Rubio 0f99778a59 Adding Graph-based likelihood ratio calculation to HC
To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line.

New HC Options (both Advanced and Hidden):
==========================================

  --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM)

Specifies what engine should be used to generate read vs haplotype likelihoods.

  PairHMM : standard full-PairHMM approach.
  GraphBased : using the assembly graph to accelarate the process.
  Random : generate random likelihoods - used for benchmarking purposes only.

  --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN)

It idicates how to merge haplotypes produced using different kmerSizes.
Only has effect when used in combination with (--likelihooCalculationEngine GraphBased)

  COMBO_MIN : use the smallest kmerSize with all haplotypes.
  COMBO_MAX : use the larger kmerSize with all haplotypes.
  MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it.
  MAX_ONLY : use the larger kmerSize with haplotypes asembled using it.

Major code changes:
===================

 * Introduce multiple likelihood calculation engines (before there was just one).

 * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class.

 * Added yet another PairHMM implementation with a different API in order to spport
   local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype).

Major components:
================

 * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations

 * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution
     of the graph-based likelihood approach.

 * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals
     to calcualte the likelihoods using the graph as an scafold.

 * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one
     used by GraphBasedLikelihoodCalculationEngineInstance to do its work.

 * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is
     used by GraphBasedLikelihoodCalcuationEngineInstance to do its work.

Remove mergeCommonChains from HaplotypeGraph creation

Fixed bamboo issues with HaplotypeGraphUnitTest

Fixed probrems with HaplotypeCallerIntegrationTest

Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest

Fixed ReadThreadingLikelihoodCalculationEngine issues

Moved event-block iteration outside GraphBased*EngineInstance

Removed unecessary parameter from ReadAnchoring constructor.
Fixed test problem

Added a bit more documentation to EventBlockSearchEngine

Fixing some private - protected dependency issues

Further refactoring making GraphBased*Instance and HaplotypeGraph slimmer. Addressed last pull request commit comments

Fixed FastLoglessPairHMM public -> protected dependency

Fixed probrem with HaplotypeGraph unit test

Adding Graph-based likelihood ratio calculation to HC

  To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line.

New HC Options (both Advanced and Hidden):
==========================================

  --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM)

Specifies what engine should be used to generate read vs haplotype likelihoods.

  PairHMM : standard full-PairHMM approach.
  GraphBased : using the assembly graph to accelarate the process.
  Random : generate random likelihoods - used for benchmarking purposes only.

  --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN)

It idicates how to merge haplotypes produced using different kmerSizes.
Only has effect when used in combination with (--likelihooCalculationEngine GraphBased)

  COMBO_MIN : use the smallest kmerSize with all haplotypes.
  COMBO_MAX : use the larger kmerSize with all haplotypes.
  MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it.
  MAX_ONLY : use the larger kmerSize with haplotypes asembled using it.

Major code changes:
===================

 * Introduce multiple likelihood calculation engines (before there was just one).

 * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class.

 * Added yet another PairHMM implementation with a different API in order to spport
   local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype).

Major components:
================

 * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations

 * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution
     of the graph-based likelihood approach.

 * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals
     to calcualte the likelihoods using the graph as an scafold.

 * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one
     used by GraphBasedLikelihoodCalculationEngineInstance to do its work.

 * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is
     used by GraphBasedLikelihoodCalcuationEngineInstance to do its work.

Remove mergeCommonChains from HaplotypeGraph creation

Fixed bamboo issues with HaplotypeGraphUnitTest

Fixed probrems with HaplotypeCallerIntegrationTest

Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest

Fixed ReadThreadingLikelihoodCalculationEngine issues

Moved event-block iteration outside GraphBased*EngineInstance

Removed unecessary parameter from ReadAnchoring constructor.
Fixed test problem

Added a bit more documentation to EventBlockSearchEngine

Fixing some private - protected dependency issues

Further refactoring making GraphBased*Instance and HaplotypeGraph slimmer. Addressed last pull request commit comments

Fixed FastLoglessPairHMM public -> protected dependency

Fixed probrem with HaplotypeGraph unit test
2013-12-02 19:37:19 -05:00
Eric Banks 84ddfb41b5 Merge pull request #438 from broadinstitute/rp_vqsr_num_bad_stability_fixes_and_runtime_optimizations
Various VQSR optimizations in runtime and stability.
2013-12-02 08:37:37 -08:00
Ryan Poplin 6a922e7aca Merge pull request #435 from broadinstitute/eb_fix_ug_bug_for_long_deletions
Bug fix for something Guillermo added to UG before he left to support calling indels from reduced reads.
2013-12-02 08:09:23 -08:00
Ryan Poplin b57054c63c Various VQSR optimizations in both runtime and accuracy.
-- For very large whole genome datasets with over 2M variants overlapping the training data randomly downsample the training set that gets used to build the Gaussian mixture model.
-- Annotations are ordered by the difference in means between known and novel instead of by their standard deviation.
-- Removed the training set quality score threshold.
-- Now uses 2 gaussians by default for the negative model.
-- Num bad argument has been removed and the cutoffs are now chosen by the model itself by looking at the LOD scores.
-- Model plots are now generated much faster.
-- Stricter threshold for determining model convergence.
-- All VQSR integration tests change because of these changes to the model.
-- Add test for downsampling of training data.
2013-11-29 13:04:46 -05:00
Eric Banks df6499e58c Bug fix for RR: stop (incorrectly) pulling the MQ out of the SAMRecord as a byte instead of an int.
For reads with high MQs (greater than max byte) the MQ was being treated as negative and failing
the min MQ filter.

Added unit test.

Delivers PT#61567540.
2013-11-27 18:55:03 -05:00
Eric Banks 51d1a26725 Bug fix for something Guillermo added to UG before he left to support calling indels from reduced reads.
His code was excessively clipping reads because it was looking at their cigar string instead of just
the read length.  This meant that it was basically impossible to call large deletions in UG even with
perfect evidence in the reads (as reported by Craig D).

Integration tests change because (IMO after looking at sites in IGV) reads with indels similar to the one
being genotyped used to be given too much likelihood and now give less.

Added unit tests for new methods.
2013-11-27 13:54:39 -05:00
Chris Hartl 1f777c4898 Introducing the latest-and-greatest in genotyping: CalculatePosteriors.
CalculatePosteriors enables the user to calculate genotype likelihood posteriors (and set genotypes accordingly) given one or more panels containing allele counts (for instance, calculating NA12878 genotypes based on 1000G EUR frequencies). The uncertainty in allele frequency is modeled by a Dirichlet distribution (parameters being the observed allele counts across each allele), and the genotype state is modeled by assuming independent draws (Hardy-Weinberg Equilibrium). This leads to the Dirichlet-Multinomial distribution.

Currently this is implemented only for ploidy=2. It should be straightforward to generalize. In addition there's a parameter for "EM" that currently does nothing but throw an exception -- another extension of this method is to run an EM over the Maximum A-Posteriori (MAP) allele count in the input sample as follows:
 while not converged:
  * AC = [external AC] + [sample AC]
  * Prior = DirichletMultinomial[AC]
  * Posteriors = [sample GL + Prior]
  * sample AC = MLEAC(Posteriors)

This is more useful for large callsets with small panels than for small callsets with large panels -- the latter of these being the more common usecase.

Fully unit tested.

Reviewer (Eric) jumped in to address many of his own comments plus removed public->protected dependencies.
2013-11-27 13:00:45 -05:00
Eric Banks 0fac4fb3b6 Make the reference model calculation work with reduced reads.
It's just a matter of using PileupElement.getRepresentativeCount() instead of '++'.
2013-11-21 10:53:33 -05:00
Eric Banks adb77b406f Fixed poor implementation of isRefSource() and isRefSink() among others.
There was already a note in the code about how wrong the implementation was.
The bad code was causing a single-node graph to get cleaned up into nothing when pruning tails.
Delivers PT #61069820.
2013-11-21 10:53:27 -05:00