gatk-3.8

Commit Graph

Author	SHA1	Message	Date
droazen	5c4a3eb89c	Merge pull request #727 from broadinstitute/ks_gatk_queue_package_test_updates Various fixes for package tests.	2014-09-05 10:17:32 -04:00
Ryan Poplin	a45acdfb89	StrandOddsRatio is now a standard annotation.	2014-09-05 08:33:37 -04:00
Khalid Shakir	376592f423	Various fixes for package tests. Explicitly including gatk/queue test-jar artifacts in package test classpaths. SelectVariantsIntegrationTest#testInvalidJexl now resets the JexlEngine silent flag that VariantFiltration.initialize() toggles. External example no longer tries to unpack nonexistent gatk artifact jars during package tests.	2014-09-04 15:30:31 -04:00
Ryan Poplin	1b809268d5	fixing a few small typos in the HaplotypeCaller and related classes	2014-09-04 14:48:27 -04:00
droazen	5c087a6e1f	Merge pull request #724 from broadinstitute/ks_remove_test_qscript_symbolic_links Removed symlink creation for tests and qscripts	2014-09-04 09:10:54 -04:00
Eric Banks	538537dbf1	Merge pull request #718 from broadinstitute/mf_rbp_fix Fix MNP merging code to work with explicit HP phase representation	2014-09-02 20:39:22 -04:00
Eric Banks	01e725cd1a	Merge pull request #723 from broadinstitute/eb_fix_rna_splitting_PT77878554 Make sure that the OverhangFixingManager (used for splitting RNA reads) ...	2014-09-02 20:39:01 -04:00
Menachem Fromer	10f9001738	Fix MNP merging code to work with explicit HP phase representation	2014-09-02 17:25:08 -04:00
Eric Banks	ff91ab8ba2	Make sure that the OverhangFixingManager (used for splitting RNA reads) handles unmapped reads.	2014-09-02 16:56:17 -04:00
Valentin Ruano Rubio	c7925f6e5c	Merge pull request #719 from broadinstitute/vrr_generalize_ploidy_in_genotype_gvcfs Adds support for omniploidy to GenotypeGVCFs and CombineGVCFs.	2014-09-02 16:51:02 -04:00
Valentin Ruano-Rubio	d363725b4b	Adds support for omniploidy to GenotypeGVCFs and CombineGVCFs. Same changes fixed the problem for GenotypeGVCFs and CombineGVCFs. Stories: - https://www.pivotaltracker.com/story/show/77626044 - https://www.pivotaltracker.com/story/show/77626854 Changes: - Generalized the code for the merging in GATKVariantContextUtils to cope with ploidy != 2. - GenotypeGVCFs now check that the input's ploidy conform to the '-ploidy' argument. - Moved out Refernce Confidence VC merging code from GATKVariantContextUtils so that we can keep new code in protected. Caveats: - GenotypeGVCFs only can deal with input files that have the same ploidy in all positions; the one that the user MUST indicate in the -ploidy argument (if different to the default 2). - CombineGVCFs won't necessarely complain if its passed mixed ploidy inputs but you won't be able to genotype it with GenotypeGVCFs. Test: - Removed deprecated unit tests for GATKVariantContextUtils. - Moved unit-tests regarding GVCF merging from GATKVariantContextUtilsUnitTest to ReferenceConfidenceVariantContextUtilsUnitTest. - Added unit test for new code for mapping genotype indices between allele index encoding in GenotypeLikelihoodCalculator. - GenotypeGVCFs and CombineGVCFs original integration test are unaffected by the change. - Added tetraploid run integration tests to check on non-diploid execution of GenotypeGVCFs and CombineGVCFs.	2014-09-02 15:06:47 -04:00
Khalid Shakir	fcb0eca203	Now passing in the path to the GATK directory to tests. Changed tests and scripts to use gatkdir full path instead of relative testdata/qscripts symbolic links. Although symlinks not created, left the symlink deletion script execution with a comment about future removal. Re-enabled example UG pipeline queue test. Replaced all hardcoded strings of {public,private}/testdata with BaseTest variables. Refactored temp list creation method from ListFileUtilsUnitTest to BaseTest.createTempListFile. Removed list files with hardcoded paths, now using createTempListFile instead with private test dir variable.	2014-09-02 01:40:59 +08:00
Khalid Shakir	2d28972c88	The 'after' files are @Input files and commited in git, so don't delete them after tests.	2014-08-30 03:04:54 +08:00
Eric Banks	5b087c9897	Changed the functionality of the physical phasing in the HC: now hom vars are output as 0\|1. We do this for technical reasons, mostly because we don't genotype in the HC anymore; it's all done downstream by GenotypeGVCFs so we can't be sure that the genotype will be hom var. Also, there are steps in the downstream pipeline where genotypes can change, so assuming anything in the HC is a bad idea, and if we have phasing info in the het state, we want to propagate that forward. Now, PGT tag fixing happens downstream in GenotypeGVCFs. While I was in there I also cleaned up the code a bit and fixed a bug where annotation was happening before genotype creation when using the --includeNonVariantSites argument. Added tests accordingly.	2014-08-25 21:40:14 -04:00
Valentin Ruano-Rubio	6dc5cf0be0	Fixes some missmerged md5 updates from a previous merge into master	2014-08-24 20:47:07 -04:00
Eric Banks	9009c1e996	Merge pull request #715 from broadinstitute/vrr_disable_physical_phasing_for_nondiploid_hc Disable physical phasing for non-diploid HC calling.	2014-08-23 20:58:51 -04:00
Valentin Ruano-Rubio	6695aeafd9	Disable physical phasing for non-diploid HC calling. Story: https://www.pivotaltracker.com/story/show/77452256 Changes: If ploidy != 2, disable physical phasing and log an info message to let the user know. Tests: Change md5s affected by this change.	2014-08-23 10:52:07 -04:00
Phillip Dexheimer	931890915f	Add the --sample_name argument to HaplotypeCaller * This is a shortcut for people who have multi-sample BAMs but would like to use GVCF mode. Rather than creating single-sample BAMs with PrintReads, one could use the --sample_name argument to HaplotypeCaller to specify the single sample to make calls on * Completes PT 73075482	2014-08-22 23:22:03 -04:00
Valentin Ruano-Rubio	fc5ce4b662	Created the stand-alone AC and AF annotation AlleleCountBySample Story: https://www.pivotaltracker.com/story/show/77250524 Changes: - Remove the annotating code in GeneralPloidyExactAFCalc (GPEAFC) class. - Added the asAlleleList to GenotypeAlleleCounts class and get (GPEAFC) to use that instead of implementing its own (nicer and more reusable code). - Removed the explicit addition of AlleleCountBySample fields to the VCF header by the walker initialize - Added utility methods in Utils to wrap and int[] array into a List<Integer>, and double[] array into a List<Double> efficiently. Test: - Added unit-testing for asAlleleList in GenotypeAlleleCountsUnitTest (within testFirst and testNext). - Added unit-testing for new methods in Utils : asList(int[]) and asList(double[]) - Changed UG General Ploidy test to add explicitly those annotations. - Non-trivial changes in integration tests involving non-diploid runs (namelly haploid and tetraploid) as they are not showing those annotations anylonger, so the MD5s have been changed accordingly.	2014-08-22 20:33:25 -04:00
Eric Banks	36bdfa3918	Merge pull request #712 from broadinstitute/eb_physical_phasing_bug_PT77248992 Fixing bug in the physical phasing code, found by Valentin.	2014-08-21 15:25:51 -04:00
Eric Banks	b1cb6196be	Fixing bug in the physical phasing code, found by Valentin. It turns out that there can be some really complex situations even with a single sample where there are lots of unphasable hets around a hom. Previously we were trying to phase each of the hets against the hom, but that wasn't correct. Instead we now detect that situation and don't attempt to phase anything. Added a unit test to cover this situation.	2014-08-21 15:24:09 -04:00
Laura Gauthier	9a5da41dd4	Add bells and whistles for Genotype Refinement Pipeline New annotation for low= and high-confidence de novos (only annotates biallelics) FamilyLikelihoodsUtils now add joint likelihood and joint posterior annotations Restrict population priors based on discovered allele count to be valid for 10 or more samples.	2014-08-21 11:20:40 -04:00
Valentin Ruano-Rubio	d31c5536aa	Fixed the bug first by indicating the actual possible number of alternatives alleles considering the extra <NON_REF> and second by resizing the StateTracker capacity when invoked by GeneralPloidyExactAFCalc deep within its implementation of computeLog10PNonRef which is ultimatelly what get rids of the exception. Story: https://www.pivotaltracker.com/story/show/74471252	2014-08-20 14:42:42 -04:00
Laura Gauthier	b512c7eac9	Refactor StrandBiasTest (using template method) and add warnings for when annotations may not be calculated successfully. VariantAnnotator/FS behavior changes slightly: VA used to output zeros for FS if there was no strand bias info, now skips FS output (but will still show FS in header)	2014-08-20 08:18:53 -04:00
Valentin Ruano-Rubio	8d9a55ae60	Moving new omniploidy likelihood calculation classes to their final package (as far as this pull-request is concerned) in org.broadinstitute.gatk.tools.walkers.genotyper	2014-08-19 11:54:29 -04:00
Valentin Ruano-Rubio	611b7f25ea	Adds unit-test and integration test for new omniploidy likelihood calculation components Added md5 to HaplotypeCallerIntegrationTest.testHaplotypeCallerSingleSampleWithDbsnp	2014-08-19 11:53:19 -04:00
Valentin Ruano-Rubio	9ee9da36bb	Generalize the calculation of the genotype likelihoods in HC to cope with haploid and multiploidy Changes in several walker to use new sample, allele closed lists and new GenotypingEngine constructors signatures Rebase adoption of new calculation system in walkers	2014-08-19 11:53:06 -04:00
Valentin Ruano-Rubio	f08dcbc160	Added the genotype likelihoods model interface and implementation for the random speciment sample from an infinite population with homogeneous ploidy accross samples.	2014-08-19 11:50:13 -04:00
Valentin Ruano-Rubio	4f993e8dbe	Added read-likelihoods array base structure to substitute existing Map-of-Map-of-Maps.	2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio	242cd0e58f	Added genotype allele counts and likelihood calculator utilities for arbitrary ploidy and number of alleles	2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio	b0a4cb9f0c	Added close sample and allele list data-structures and utility classes	2014-08-19 11:50:12 -04:00
Eric Banks	d3f06024f8	Updated the physical phasing in the Haplotype Caller to address requests from ATGU. 1. It is now turned on by default 2. It now phases homozygous variants 3. Most importantly, it also phases variants that are always on opposite haplotypes Changed the INFO keys to be PID and PGT, as described in the header.	2014-08-18 14:38:29 -04:00
Eric Banks	7e0c326e1c	Merge pull request #706 from broadinstitute/vrr_reduce_hc_integration_test_time Reduce intervals of integration tests in HaplotypeCallerIntegrationTest ...	2014-08-15 17:37:57 -04:00
Valentin Ruano-Rubio	2f79042dee	Reduce intervals of integration tests in HaplotypeCallerIntegrationTest class Story: https://www.pivotaltracker.com/story/show/74858854 Changes: Intervals have been shrunk so that the test run in 15s or less.	2014-08-15 14:20:10 -04:00
Eric Banks	eb84091702	Update the --keepOriginalAC functionality in SelectVariants to work for sites that lose alleles in the selection.	2014-08-14 15:34:09 -04:00
Ryan Poplin	3a9a78c785	Removing an assumption that ADs were in the same order if the number of alleles matched. This happens for example when one sample is C->T and another sample is C->G.	2014-08-13 13:26:40 -04:00
Eric Banks	27193c5048	Merge pull request #700 from broadinstitute/eb_phase_HC_variants_PT74816060 Initial implementation of functionality to add physical phasing informat...	2014-08-13 12:30:32 -04:00
Eric Banks	4512940e87	Initial implementation of functionality to add physical phasing information to the output of the HaplotypeCaller. If any pair of variants occurs on all used haplotypes together, then we propagate that information into the gVCF. Can be enabled with the --tryPhysicalPhasing argument.	2014-08-13 12:25:31 -04:00
Valentin Ruano-Rubio	b39508cd15	ReadLikelihoods class introduction final changes before merging Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM. Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set. Updated some integration test md5s.	2014-08-11 17:47:25 -04:00
Valentin Ruano-Rubio	9a9a68409e	ReadLikelihoods class introduction final changes before merging Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM. Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set. Updated some integration test md5s. Fixing GraphBased bugs with new master code Fixed ReadLikelihoods.changeReads difficult to spot bug. Changed PairHMM interface to fix a bug Fixed missing changes for various PairHMM implementations to get them to use the new structure. Fixed various bugs only detectable when running with full sample(s). Believe to have fixed the lack of annotations in UG runs Fixed integrationt test MD5s Updating some md5s Fixed yet another md5 probably left out by mistake	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	0b472f6bff	Added new test to verify the functionality of ReadLikelihoods.java and its use in HC. Updated existing integration test md5s. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	2914ecb585	Change the Map-of-maps-of-maps for an array based implementation ReadLikelihoods to hold read likelihoods. The array structure should be faster to populate and query (no properly benchmarked) and reduce memory footprint considerably. Nevertheless removing PairHMM factor (using likelihoodEngine Random) it only achieves a speed up of 15% in some example WGS dataset i.e. there are other bigger bottle necks in the system. Bamboo tests also seem to run significantly faster with this change. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: - ReadLikelihoods added to substitute Map<String,PerSampleReadLikelihoods> - Operation that involve changes in full sets of ReadLikelihoods have been moved into that class. - Simplified a bit the code that handles the downsampling of reads based on contamination Caveats: - Still we keep Map<String,PerReadAlleleLikelihoodsMap> around to pass to annotators..., didn't feel like change the interface of so many public classes in this pull-request.	2014-08-11 17:46:28 -04:00
Ryan Poplin	c56e493f98	Merge pull request #622 from broadinstitute/ldg_SORanalysis Add StrandOddsRatio to default annotations produced by GenotypeGVCFs	2014-08-11 09:45:27 -04:00
Tim Fennell	5695f22da8	Changed the default GVCF Q Bands from 5,20,60 to be 1..60 by 1s, 60...90 by 10s and 99 in order to give finer resolution for homref PLs and ADs at lower confidences and somewhat higher resolution at higher confidences.	2014-08-08 14:31:35 -04:00
Laura Gauthier	35de598e4b	Modify StrandOddsRatio calculation to take on lower values in cases where reference +/- reads are skewed but alt reads are not. Add SOR to default annotations produced by GenotypeGVCFs. Add jitter to minimum SOR values	2014-08-07 12:09:19 -04:00
Laura Gauthier	f532f1f843	Fix nullPointerException	2014-08-07 10:13:17 -04:00
Laura Gauthier	74affcc077	Update inbreeding coefficient calculation to give a better estimate for multialleleic sites Add unit test for compound het and for multiallelic hets	2014-08-07 08:12:47 -04:00
Eric Banks	b9486f5b4d	Merge pull request #693 from broadinstitute/ldg_SORfromHC Allow SOR to be calculated from HC	2014-08-06 21:48:09 -04:00
Phillip Dexheimer	593663d9b6	Improved detection of missing argument values In particular, it was possible to specify arguments for Files or Compound types without values Added a special "none" value for annotations, since a bare "-A" is no longer allowed Delivers PT 71792842 and 59360374	2014-08-05 20:31:31 -04:00
Laura Gauthier	5533199402	Allow SOR to be calculated from HC Refactor StrandBiasTest classes	2014-08-01 20:47:58 -04:00

1 2 3 4 5 ...

1178 Commits (5c4a3eb89c7df76a1e6f5ed397500e5b09ea3a3a)