gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Valentin Ruano-Rubio	fc5ce4b662	Created the stand-alone AC and AF annotation AlleleCountBySample Story: https://www.pivotaltracker.com/story/show/77250524 Changes: - Remove the annotating code in GeneralPloidyExactAFCalc (GPEAFC) class. - Added the asAlleleList to GenotypeAlleleCounts class and get (GPEAFC) to use that instead of implementing its own (nicer and more reusable code). - Removed the explicit addition of AlleleCountBySample fields to the VCF header by the walker initialize - Added utility methods in Utils to wrap and int[] array into a List<Integer>, and double[] array into a List<Double> efficiently. Test: - Added unit-testing for asAlleleList in GenotypeAlleleCountsUnitTest (within testFirst and testNext). - Added unit-testing for new methods in Utils : asList(int[]) and asList(double[]) - Changed UG General Ploidy test to add explicitly those annotations. - Non-trivial changes in integration tests involving non-diploid runs (namelly haploid and tetraploid) as they are not showing those annotations anylonger, so the MD5s have been changed accordingly.	2014-08-22 20:33:25 -04:00
Valentin Ruano-Rubio	8d9a55ae60	Moving new omniploidy likelihood calculation classes to their final package (as far as this pull-request is concerned) in org.broadinstitute.gatk.tools.walkers.genotyper	2014-08-19 11:54:29 -04:00
Valentin Ruano-Rubio	611b7f25ea	Adds unit-test and integration test for new omniploidy likelihood calculation components Added md5 to HaplotypeCallerIntegrationTest.testHaplotypeCallerSingleSampleWithDbsnp	2014-08-19 11:53:19 -04:00
Valentin Ruano-Rubio	9ee9da36bb	Generalize the calculation of the genotype likelihoods in HC to cope with haploid and multiploidy Changes in several walker to use new sample, allele closed lists and new GenotypingEngine constructors signatures Rebase adoption of new calculation system in walkers	2014-08-19 11:53:06 -04:00
Valentin Ruano-Rubio	4f993e8dbe	Added read-likelihoods array base structure to substitute existing Map-of-Map-of-Maps.	2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio	242cd0e58f	Added genotype allele counts and likelihood calculator utilities for arbitrary ploidy and number of alleles	2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio	b0a4cb9f0c	Added close sample and allele list data-structures and utility classes	2014-08-19 11:50:12 -04:00
Eric Banks	eb84091702	Update the --keepOriginalAC functionality in SelectVariants to work for sites that lose alleles in the selection.	2014-08-14 15:34:09 -04:00
Ryan Poplin	3a9a78c785	Removing an assumption that ADs were in the same order if the number of alleles matched. This happens for example when one sample is C->T and another sample is C->G.	2014-08-13 13:26:40 -04:00
Eric Banks	27193c5048	Merge pull request #700 from broadinstitute/eb_phase_HC_variants_PT74816060 Initial implementation of functionality to add physical phasing informat...	2014-08-13 12:30:32 -04:00
Eric Banks	4512940e87	Initial implementation of functionality to add physical phasing information to the output of the HaplotypeCaller. If any pair of variants occurs on all used haplotypes together, then we propagate that information into the gVCF. Can be enabled with the --tryPhysicalPhasing argument.	2014-08-13 12:25:31 -04:00
Geraldine Van der Auwera	49702dc695	Clarified Phone Home system details re: privacy	2014-08-12 17:23:35 -04:00
jmthibault79	6d7201a7f8	Merge pull request #698 from broadinstitute/pd_printreads_subset Improvements to read-group filtering in PrintReads	2014-08-12 14:13:07 -04:00
Phillip Dexheimer	7e77875c81	Improvements to read-group filtering in PrintReads - Read groups that are excluded by sample_name, platform, or read_group arguments no longer appear in the header - The performance penalty associated with filtering by read group has been essentially eliminated - Partial fulfillment of PT 73075482	2014-08-11 20:08:16 -04:00
Valentin Ruano-Rubio	9a9a68409e	ReadLikelihoods class introduction final changes before merging Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM. Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set. Updated some integration test md5s. Fixing GraphBased bugs with new master code Fixed ReadLikelihoods.changeReads difficult to spot bug. Changed PairHMM interface to fix a bug Fixed missing changes for various PairHMM implementations to get them to use the new structure. Fixed various bugs only detectable when running with full sample(s). Believe to have fixed the lack of annotations in UG runs Fixed integrationt test MD5s Updating some md5s Fixed yet another md5 probably left out by mistake	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	0b472f6bff	Added new test to verify the functionality of ReadLikelihoods.java and its use in HC. Updated existing integration test md5s. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	2914ecb585	Change the Map-of-maps-of-maps for an array based implementation ReadLikelihoods to hold read likelihoods. The array structure should be faster to populate and query (no properly benchmarked) and reduce memory footprint considerably. Nevertheless removing PairHMM factor (using likelihoodEngine Random) it only achieves a speed up of 15% in some example WGS dataset i.e. there are other bigger bottle necks in the system. Bamboo tests also seem to run significantly faster with this change. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: - ReadLikelihoods added to substitute Map<String,PerSampleReadLikelihoods> - Operation that involve changes in full sets of ReadLikelihoods have been moved into that class. - Simplified a bit the code that handles the downsampling of reads based on contamination Caveats: - Still we keep Map<String,PerReadAlleleLikelihoodsMap> around to pass to annotators..., didn't feel like change the interface of so many public classes in this pull-request.	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	09ac3779d6	Added ReadLikelihoods component to substitute Map<String,PerReadAlleleLikelihoodMap>. It uses a more efficient java array[] based implementation and encapsulates operations perform with such a read-likelihood collection such as marginalization, filtering by position, poor modeling or capping worst likelihoods and so forth. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652	2014-08-11 17:46:28 -04:00
Eric Banks	5f31e54d67	Merge pull request #696 from broadinstitute/pd_DoC_sorting Fix sample sort order bug in DepthOfCoverage	2014-08-06 08:35:35 -04:00
Phillip Dexheimer	b0c026e671	Fix sample sort order bug in DepthOfCoverage Rare bug triggered by hash collision between sample names PT 66183936	2014-08-05 21:55:34 -04:00
Phillip Dexheimer	593663d9b6	Improved detection of missing argument values In particular, it was possible to specify arguments for Files or Compound types without values Added a special "none" value for annotations, since a bare "-A" is no longer allowed Delivers PT 71792842 and 59360374	2014-08-05 20:31:31 -04:00
Phillip Dexheimer	359fe150c9	Documentation fix (closed HTML tag)	2014-08-04 23:19:16 -04:00
Laura Gauthier	4373922ee6	Update GATK to work with latest htsjdk ValidationStringency was moved from htsjdk.samtools.SAMFileReader to htsjdk.samtools samtools find BAM index file method was also moved (and made public!)	2014-07-30 12:05:14 -04:00
Eric Banks	84af1fc75f	The copy constructor for a GATKSAMRecord (used for testing only) should use the actual read's contig index, not its mate's.	2014-07-23 15:31:03 -04:00
David Roazen	0798a4b768	Update pom versions to mark the start of GATK 3.3 development	2014-07-17 12:09:33 -04:00
David Roazen	323f22f852	Update pom versions for the 3.2 release	2014-07-17 12:06:22 -04:00
Geraldine Van der Auwera	a6f632874b	Various documentation improvements - Edited intervals merging docs for correctness & clarity - Edited VQSR arg docs and made mode required (+added -mode SNP to VQSR tests) - Moved PaperGenotyper to Toy Walkers to declutter the actually useful docs - Moved GenotypeGVCFs to Variant Discovery category and clarified a few points - Clarified that the -resource argument depends on using the -V:tag format - Clarified how the pcr indel model works - Added caveat for -U ALLOW_N_CIGAR_READS - Added MathJax support for displaying equations in GATKDocs - Updated HC example commands and caveats	2014-07-14 12:03:03 -04:00
Eric Banks	ecefcb383d	Disable the complex variant merging for now, as requested by ATGU	2014-07-11 17:27:40 -04:00
droazen	b8751ad598	Merge pull request #680 from broadinstitute/ldg_VQSRscript Update VQSR Rnd BQSR script generation code for compatibility with late...	2014-07-11 10:16:37 -04:00
Khalid Shakir	18f6d56b4c	Revert "Using the base directory for each test run when outputting MD5DB mismatches." This reverts commit f192f032a153755a84b1d682f6e652a7c6787fb9.	2014-07-11 01:11:25 +08:00
Khalid Shakir	cc09ef9190	Revert "Appending to md5db in the gatkdir, with additional logging." This reverts commit 0aa2884f7b006f5d48c325bf942b92c183e45074.	2014-07-11 01:11:20 +08:00
kshakir	aecd34d274	Merge pull request #677 from broadinstitute/ks_md5_db_per_test_type Appending to md5db in the gatkdir, with additional logging.	2014-07-10 17:53:24 +08:00
Khalid Shakir	a7d1904c63	Appending to md5db in the gatkdir, with additional logging.	2014-07-10 03:58:47 +08:00
Laura Gauthier	99026eb51b	Update VQSR Rnd BQSR script generation code for compatibility with latest ggplot version. Update queueJobReport.R and public/gsalib/src/R/R/gsa.variantqc.utils.R also	2014-07-09 15:36:58 -04:00
David Roazen	719e685759	Remove junit imports in the test suite	2014-07-09 12:09:27 -04:00
Khalid Shakir	2129aa05d8	Bug fix for poms missing package test artifacts.	2014-07-08 06:34:26 +08:00
Khalid Shakir	e5be9c7073	Using the base directory for each test run when outputting MD5DB mismatches.	2014-07-08 06:34:25 +08:00
Eric Banks	bad7865078	When converting a haplotype to a set of variants we now check for cases that are overly complex. In these cases, where the alignment contains multiple indels, we output a single complex variant instead of the multiple partial indels. We also re-enable dangling tail recovery by default.	2014-07-01 14:18:59 -04:00
Ryan Poplin	0127799cba	Reads are now realigned to the most likely haplotype before being used by the annotations. -- AD,DP will now correspond directly to the reads that were used to construct the PLs -- RankSumTests, etc. will use the bases from the realigned reads instead of the original alignments -- There is now no additional runtime cost to realign the reads when using bamout or GVCF mode -- bamout mode no longer sets the mapping quality to zero for uninformative reads, instead the read will not be given an HC tag	2014-06-30 10:35:50 -04:00
Khalid Shakir	7b5f88a49c	Refactored DoC custom Queue wrappers to a non-package object. Now, "mvn verify && mvn verify" should work again.	2014-06-26 00:59:18 +08:00
droazen	b935ed0df1	Merge pull request #665 from broadinstitute/ks_force_delete_bad_symlinks Executing a version of the delete_maven_links.sh	2014-06-25 00:13:05 -04:00
Phillip Dexheimer	06d619e9aa	Removed redundant SelectVariantsIntegrationTest, merged it's only test into protected version	2014-06-24 18:59:59 -04:00
Khalid Shakir	45d819a00e	For now, executing the delete_maven_links.sh just ahead of creating the symbolic links during the process-test-resources phase. Better than running it during the "clean" phase, since these users may not run "mvn clean" before attempting to build.	2014-06-25 02:32:15 +08:00
Phillip Dexheimer	65eeb4a7ab	Recast the "Invalid JEXL expression detected" error in SelectVariants from a RuntimeException to a UserException - PT 68931448	2014-06-20 00:05:23 -04:00
Phillip Dexheimer	da5e567b73	Added functionality to CatVariants to process .list files with -V - Pivotal 70305712	2014-06-19 21:46:13 -04:00
Ryan Poplin	da1dab6c32	Merge pull request #661 from broadinstitute/jw_allele_balance_gvcf Enable AB annotation in reference model pipeline. Incorporates patches f...	2014-06-19 13:10:41 -04:00
Eric Banks	1092dd6e25	From Carlos Barroto: switch outputRoot in SplitSamFile to an empty string instead of null.	2014-06-19 11:06:55 -04:00
Eric Banks	9212edba41	From Carlos Barroto: made 'level' in Picard's CalculateHsMetrics Scala Queue extension an argument.	2014-06-19 11:06:50 -04:00
Ryan Poplin	8b75428a90	Enable AB annotation in reference model pipeline. Incorporates patches from John Wallace to public github account	2014-06-19 09:35:04 -04:00
Nigel Delaney	7570666f2a	Merge pull request #655 from broadinstitute/nfd_mathutil_opts Optimization of function to calculate the logged sum of exponentiated values	2014-06-17 17:07:42 -04:00

1 2 3 4 5 ...

4398 Commits (fc5ce4b66274e1a1c06a8f06ca4d7ffd8fcec4a2)