Commit Graph

70 Commits (fc5ce4b66274e1a1c06a8f06ca4d7ffd8fcec4a2)

Author SHA1 Message Date
Valentin Ruano-Rubio fc5ce4b662 Created the stand-alone AC and AF annotation AlleleCountBySample
Story:

  https://www.pivotaltracker.com/story/show/77250524

Changes:

  - Remove the annotating code in GeneralPloidyExactAFCalc (GPEAFC) class.
  - Added the asAlleleList to GenotypeAlleleCounts class and get (GPEAFC) to use that instead of implementing its own (nicer and more reusable code).
  - Removed the explicit addition of AlleleCountBySample fields to the VCF header by the walker initialize
  - Added utility methods in Utils to wrap and int[] array into a List<Integer>, and double[] array into a List<Double> efficiently.

Test:

  - Added unit-testing for asAlleleList in GenotypeAlleleCountsUnitTest (within testFirst and testNext).
  - Added unit-testing for new methods in Utils : asList(int[]) and asList(double[])
  - Changed UG General Ploidy test to add explicitly those annotations.
  - Non-trivial changes in integration tests involving non-diploid runs (namelly haploid and tetraploid) as they are not showing
    those annotations anylonger, so the MD5s have been changed accordingly.
2014-08-22 20:33:25 -04:00
Eric Banks 36bdfa3918 Merge pull request #712 from broadinstitute/eb_physical_phasing_bug_PT77248992
Fixing bug in the physical phasing code, found by Valentin.
2014-08-21 15:25:51 -04:00
Eric Banks b1cb6196be Fixing bug in the physical phasing code, found by Valentin.
It turns out that there can be some really complex situations even with a single sample where
there are lots of unphasable hets around a hom.  Previously we were trying to phase each of the
hets against the hom, but that wasn't correct.  Instead we now detect that situation and don't
attempt to phase anything.
Added a unit test to cover this situation.
2014-08-21 15:24:09 -04:00
Laura Gauthier 9a5da41dd4 Add bells and whistles for Genotype Refinement Pipeline
New annotation for low= and high-confidence de novos (only annotates biallelics)
FamilyLikelihoodsUtils now add joint likelihood and joint posterior annotations
Restrict population priors based on discovered allele count to be valid for 10 or more samples.
2014-08-21 11:20:40 -04:00
Valentin Ruano-Rubio d31c5536aa Fixed the bug first by indicating the actual possible number of alternatives alleles considering the extra <NON_REF> and second by resizing the StateTracker capacity when invoked by GeneralPloidyExactAFCalc deep within its implementation of computeLog10PNonRef which is ultimatelly what get rids of the exception.
Story:

  https://www.pivotaltracker.com/story/show/74471252
2014-08-20 14:42:42 -04:00
Laura Gauthier b512c7eac9 Refactor StrandBiasTest (using template method) and add warnings for when annotations may not be calculated successfully.
VariantAnnotator/FS behavior changes slightly: VA used to output zeros for FS if there was no strand bias info, now skips FS output (but will still show FS in header)
2014-08-20 08:18:53 -04:00
Valentin Ruano-Rubio 8d9a55ae60 Moving new omniploidy likelihood calculation classes to their final package (as far as this pull-request is concerned) in org.broadinstitute.gatk.tools.walkers.genotyper 2014-08-19 11:54:29 -04:00
Valentin Ruano-Rubio 611b7f25ea Adds unit-test and integration test for new omniploidy likelihood calculation components
Added md5 to HaplotypeCallerIntegrationTest.testHaplotypeCallerSingleSampleWithDbsnp
2014-08-19 11:53:19 -04:00
Valentin Ruano-Rubio 9ee9da36bb Generalize the calculation of the genotype likelihoods in HC to cope with haploid and multiploidy
Changes in several walker to use new sample, allele closed lists and new GenotypingEngine constructors signatures

Rebase adoption of new calculation system in walkers
2014-08-19 11:53:06 -04:00
Valentin Ruano-Rubio f08dcbc160 Added the genotype likelihoods model interface and implementation for the random speciment sample from an infinite population with homogeneous ploidy accross samples. 2014-08-19 11:50:13 -04:00
Valentin Ruano-Rubio 4f993e8dbe Added read-likelihoods array base structure to substitute existing Map-of-Map-of-Maps. 2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio 242cd0e58f Added genotype allele counts and likelihood calculator utilities for arbitrary ploidy and number of alleles 2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio b0a4cb9f0c Added close sample and allele list data-structures and utility classes 2014-08-19 11:50:12 -04:00
Eric Banks d3f06024f8 Updated the physical phasing in the Haplotype Caller to address requests from ATGU.
1. It is now turned on by default
2. It now phases homozygous variants
3. Most importantly, it also phases variants that are always on opposite haplotypes

Changed the INFO keys to be PID and PGT, as described in the header.
2014-08-18 14:38:29 -04:00
Eric Banks 7e0c326e1c Merge pull request #706 from broadinstitute/vrr_reduce_hc_integration_test_time
Reduce intervals of integration tests in HaplotypeCallerIntegrationTest ...
2014-08-15 17:37:57 -04:00
Valentin Ruano-Rubio 2f79042dee Reduce intervals of integration tests in HaplotypeCallerIntegrationTest class
Story:

   https://www.pivotaltracker.com/story/show/74858854

Changes:

    Intervals have been shrunk so that the test run in 15s or less.
2014-08-15 14:20:10 -04:00
Eric Banks eb84091702 Update the --keepOriginalAC functionality in SelectVariants to work for sites that lose alleles in the selection. 2014-08-14 15:34:09 -04:00
Ryan Poplin 3a9a78c785 Removing an assumption that ADs were in the same order if the number of alleles matched. This happens for example when one sample is C->T and another sample is C->G. 2014-08-13 13:26:40 -04:00
Eric Banks 27193c5048 Merge pull request #700 from broadinstitute/eb_phase_HC_variants_PT74816060
Initial implementation of functionality to add physical phasing informat...
2014-08-13 12:30:32 -04:00
Eric Banks 4512940e87 Initial implementation of functionality to add physical phasing information to the output of the HaplotypeCaller.
If any pair of variants occurs on all used haplotypes together, then we propagate that information into the gVCF.
Can be enabled with the --tryPhysicalPhasing argument.
2014-08-13 12:25:31 -04:00
Valentin Ruano-Rubio b39508cd15 ReadLikelihoods class introduction final changes before merging
Stories:

        https://www.pivotaltracker.com/story/show/70222086
        https://www.pivotaltracker.com/story/show/67961652

Changes:

  Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM.
  Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set.
  Updated some integration test md5s.
2014-08-11 17:47:25 -04:00
Valentin Ruano-Rubio 9a9a68409e ReadLikelihoods class introduction final changes before merging
Stories:

        https://www.pivotaltracker.com/story/show/70222086
        https://www.pivotaltracker.com/story/show/67961652

Changes:

  Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM.
  Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set.
  Updated some integration test md5s.

Fixing GraphBased bugs with new master code
Fixed ReadLikelihoods.changeReads difficult to spot bug.
Changed PairHMM interface to fix a bug
Fixed missing changes for various PairHMM implementations to get them to use the new structure.
Fixed various bugs only detectable when running with full sample(s).
Believe to have fixed the lack of annotations in UG runs
Fixed integrationt test MD5s
Updating some md5s
Fixed yet another md5 probably left out by mistake
2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio 0b472f6bff Added new test to verify the functionality of ReadLikelihoods.java and its use in HC. Updated existing integration test md5s.
Stories:

    https://www.pivotaltracker.com/story/show/70222086
    https://www.pivotaltracker.com/story/show/67961652
2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio 2914ecb585 Change the Map-of-maps-of-maps for an array based implementation ReadLikelihoods to hold read likelihoods.
The array structure should be faster to populate and query (no properly benchmarked) and reduce memory footprint considerably.
    Nevertheless removing PairHMM factor (using likelihoodEngine Random) it only achieves a speed up of 15% in some example WGS dataset
    i.e. there are other bigger bottle necks in the system. Bamboo tests also seem to run significantly faster with this change.

    Stories:

      https://www.pivotaltracker.com/story/show/70222086
      https://www.pivotaltracker.com/story/show/67961652

    Changes:

       - ReadLikelihoods added to substitute  Map<String,PerSampleReadLikelihoods>
       - Operation that involve changes in full sets of ReadLikelihoods have been moved into that class.
       - Simplified a bit the code that handles the downsampling of reads based on contamination

    Caveats:

       - Still we keep Map<String,PerReadAlleleLikelihoodsMap> around to pass to annotators..., didn't feel like change the interface of so many public classes in this pull-request.
2014-08-11 17:46:28 -04:00
Ryan Poplin c56e493f98 Merge pull request #622 from broadinstitute/ldg_SORanalysis
Add StrandOddsRatio to default annotations produced by GenotypeGVCFs
2014-08-11 09:45:27 -04:00
Tim Fennell 5695f22da8 Changed the default GVCF Q Bands from 5,20,60 to be 1..60 by 1s, 60...90 by 10s and 99 in order to give finer resolution
for homref PLs and ADs at lower confidences and somewhat higher resolution at higher confidences.
2014-08-08 14:31:35 -04:00
Laura Gauthier 35de598e4b Modify StrandOddsRatio calculation to take on lower values in cases where reference +/- reads are skewed but alt reads are not. Add SOR to default annotations produced by GenotypeGVCFs. Add jitter to minimum SOR values 2014-08-07 12:09:19 -04:00
Laura Gauthier f532f1f843 Fix nullPointerException 2014-08-07 10:13:17 -04:00
Laura Gauthier 74affcc077 Update inbreeding coefficient calculation to give a better estimate for multialleleic sites
Add unit test for compound het and for multiallelic hets
2014-08-07 08:12:47 -04:00
Eric Banks b9486f5b4d Merge pull request #693 from broadinstitute/ldg_SORfromHC
Allow SOR to be calculated from HC
2014-08-06 21:48:09 -04:00
Phillip Dexheimer 593663d9b6 Improved detection of missing argument values
In particular, it was possible to specify arguments for Files or Compound types without values
 Added a special "none" value for annotations, since a bare "-A" is no longer allowed
 Delivers PT 71792842 and 59360374
2014-08-05 20:31:31 -04:00
Laura Gauthier 5533199402 Allow SOR to be calculated from HC
Refactor StrandBiasTest classes
2014-08-01 20:47:58 -04:00
Ryan Poplin 63b3f7dfd3 Fixing typos in AnalyzeCovariates 2014-07-31 10:36:18 -04:00
Valentin Ruano-Rubio 750eb4b5a6 Add diploid only support message to HaplotypeCaller
Story:

  https://www.pivotaltracker.com/story/show/73440292

Changes:

  - Just add the conditional in HaplotypeCaller#initialize

Testing:

  - Nothing added, checked locally, trivial change that would eventually be removed anyway.
2014-07-29 17:05:36 -04:00
David Roazen 0798a4b768 Update pom versions to mark the start of GATK 3.3 development 2014-07-17 12:09:33 -04:00
David Roazen 323f22f852 Update pom versions for the 3.2 release 2014-07-17 12:06:22 -04:00
Eric Banks 98d88eb07e Fixed IndexOutOfBounds error associated with tail merging.
Don't expand out source nodes for tail merging, since that's a head merging action only.
This shows up as a bug only because we now allow merging tails against non-reference paths.
2014-07-17 12:04:22 -04:00
Geraldine Van der Auwera a6f632874b Various documentation improvements
- Edited intervals merging docs for correctness & clarity
- Edited VQSR arg docs and made mode required (+added -mode SNP to VQSR tests)
- Moved PaperGenotyper to Toy Walkers to declutter the actually useful docs
- Moved GenotypeGVCFs to Variant Discovery category and clarified a few points
- Clarified that the -resource argument depends on using the -V:tag format
- Clarified how the pcr indel model works
- Added caveat for -U ALLOW_N_CIGAR_READS
- Added MathJax support for displaying equations in GATKDocs
- Updated HC example commands and caveats
2014-07-14 12:03:03 -04:00
Eric Banks ecefcb383d Disable the complex variant merging for now, as requested by ATGU 2014-07-11 17:27:40 -04:00
droazen b8751ad598 Merge pull request #680 from broadinstitute/ldg_VQSRscript
Update VQSR Rnd BQSR  script generation code for compatibility with late...
2014-07-11 10:16:37 -04:00
Eric Banks 1d97b4a191 Improved tail merging: now tails can be merged to branches that are not entirely reference.
This is useful for e.g. cases where there are SNPs on insertions.  Before tails were forced to be merged
(incorrectly) only to a reference node, but now they can be merged to any path in the graph from which they
directly branch.

Also, I've transferred over Ryan's code to refuse to process kmer sizes such that there are non-unique kmers
in the reference sequence with them.
2014-07-10 08:57:01 -04:00
Ryan Poplin 5eee065133 Merge pull request #674 from broadinstitute/rp_improve_genotyping
Improvements to genotyping accuracy.
2014-07-09 16:03:09 -04:00
Laura Gauthier 99026eb51b Update VQSR Rnd BQSR script generation code for compatibility with latest ggplot version. Update queueJobReport.R and public/gsalib/src/R/R/gsa.variantqc.utils.R also 2014-07-09 15:36:58 -04:00
Ryan Poplin 74a7674d70 Improvements to genotyping accuracy.
-- Global mismapping penalty was only applied to the reference haplotype. This led to problems with overlapping events, mostly STR haplotypes. Now the penalty is applied to every haplotype.
-- We subset the reads down to only those which overlap the event (after assembly based realignment) for likelihood calculations.
2014-07-09 13:11:07 -04:00
David Roazen 719e685759 Remove junit imports in the test suite 2014-07-09 12:09:27 -04:00
Eric Banks bad7865078 When converting a haplotype to a set of variants we now check for cases that are overly complex.
In these cases, where the alignment contains multiple indels, we output a single complex
variant instead of the multiple partial indels.

We also re-enable dangling tail recovery by default.
2014-07-01 14:18:59 -04:00
Ryan Poplin e14bff212d SB tables should be created even if the ref or alt columns have no counts. This is so that FS/SOR will still be calculated when the variant is extremely high or low frequency.
-- Removed long running HC integration test... sorry
2014-06-30 15:19:15 -04:00
Ryan Poplin 0127799cba Reads are now realigned to the most likely haplotype before being used by the annotations.
-- AD,DP will now correspond directly to the reads that were used to construct the PLs
-- RankSumTests, etc. will use the bases from the realigned reads instead of the original alignments
-- There is now no additional runtime cost to realign the reads when using bamout or GVCF mode
-- bamout mode no longer sets the mapping quality to zero for uninformative reads, instead the read will not be given an HC tag
2014-06-30 10:35:50 -04:00
Phillip Dexheimer 06d619e9aa Removed redundant SelectVariantsIntegrationTest, merged it's only test into protected version 2014-06-24 18:59:59 -04:00
Eric Banks 2df2a153e6 Merge pull request #658 from broadinstitute/ldg_PbyTwithPriors
Updated CalculateGenotypePosteriors to compute genotype posteriors using...
2014-06-18 15:04:39 -04:00