gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Ryan Poplin	ac1a397024	This warning message actually happens all the time in AssessNA12878 when we subset down to biallelic events but I've verified that it is working as intended. Moving the logging level up to debug.	2014-09-29 11:40:38 -04:00
Phillip Dexheimer	1482a53aba	Added -writeFullFormat engine-level argument * This argument forces GATK to always write every record in the VCF format field, even if some records at the end are missing and could be removed * Revved htsjdk and picard * PT 70993484	2014-09-17 08:25:27 -04:00
Valentin Ruano-Rubio	95b45443ae	Updated test according to changes in the AF calculator framework. Changes: ------- * Updated current unit and integration test to use the new API components. * Added unit tests for new classes AFPriorProvider and AFCalculatorProviders. * Added integration test for mixed ploidy GenotypeGVCFs and CombineGVCFs	2014-09-12 14:59:47 -04:00
Valentin Ruano-Rubio	3cdeab6e9e	GenotypingEngines and walkers now use AFCalc(ulator) providers rathern than instanciate their own (fixed) calculators directly. Changes: ------- * GenotypingEngine uses now a AFCalc provider instead of its own thread-local with one-time initialized and fixed AF calculator. * All walkers that use a GenotypingEngine now are passing the appropiate AF calculator provider. For now most just use a fix calculator (FixedAFCalculatorProvider) except GenotypeGVCFs as this one now can cope with mixture of ploidies failing-over to a general-ploidy calculator when the preferred implementation is not capable to handle a site's analysis.	2014-09-12 14:25:09 -04:00
Phillip Dexheimer	a35f5b8685	Moved arguments controlling options in output files into the engine * Arguments involved are --no_cmdline_in_header, --sites_only, and --bcf for VCF files and --bam_compression, --simplifyBAM, --disable_bam_indexing, and --generate_md5 for BAM files * PT 52740563 * Removed ReadUtils.createSAMFileWriterWithCompression(), replaced with ReadUtils.createSAMFileWriter(), which applies all appropriate engine-level arguments * Replaced hard-coded field names in ArgumentDefinitionField (Queue extension generator) with a Reflections-based lookup that will fail noisily during extension generation if there's an error	2014-09-05 21:18:11 -04:00
Khalid Shakir	376592f423	Various fixes for package tests. Explicitly including gatk/queue test-jar artifacts in package test classpaths. SelectVariantsIntegrationTest#testInvalidJexl now resets the JexlEngine silent flag that VariantFiltration.initialize() toggles. External example no longer tries to unpack nonexistent gatk artifact jars during package tests.	2014-09-04 15:30:31 -04:00
droazen	5c087a6e1f	Merge pull request #724 from broadinstitute/ks_remove_test_qscript_symbolic_links Removed symlink creation for tests and qscripts	2014-09-04 09:10:54 -04:00
Valentin Ruano Rubio	c7925f6e5c	Merge pull request #719 from broadinstitute/vrr_generalize_ploidy_in_genotype_gvcfs Adds support for omniploidy to GenotypeGVCFs and CombineGVCFs.	2014-09-02 16:51:02 -04:00
Valentin Ruano-Rubio	d363725b4b	Adds support for omniploidy to GenotypeGVCFs and CombineGVCFs. Same changes fixed the problem for GenotypeGVCFs and CombineGVCFs. Stories: - https://www.pivotaltracker.com/story/show/77626044 - https://www.pivotaltracker.com/story/show/77626854 Changes: - Generalized the code for the merging in GATKVariantContextUtils to cope with ploidy != 2. - GenotypeGVCFs now check that the input's ploidy conform to the '-ploidy' argument. - Moved out Refernce Confidence VC merging code from GATKVariantContextUtils so that we can keep new code in protected. Caveats: - GenotypeGVCFs only can deal with input files that have the same ploidy in all positions; the one that the user MUST indicate in the -ploidy argument (if different to the default 2). - CombineGVCFs won't necessarely complain if its passed mixed ploidy inputs but you won't be able to genotype it with GenotypeGVCFs. Test: - Removed deprecated unit tests for GATKVariantContextUtils. - Moved unit-tests regarding GVCF merging from GATKVariantContextUtilsUnitTest to ReferenceConfidenceVariantContextUtilsUnitTest. - Added unit test for new code for mapping genotype indices between allele index encoding in GenotypeLikelihoodCalculator. - GenotypeGVCFs and CombineGVCFs original integration test are unaffected by the change. - Added tetraploid run integration tests to check on non-diploid execution of GenotypeGVCFs and CombineGVCFs.	2014-09-02 15:06:47 -04:00
Eric Banks	fe86dafc41	Merge pull request #705 from broadinstitute/gg_simplify_gatkdocs_templates Changed the GATKDocs format to PHP	2014-09-02 06:28:26 -04:00
Khalid Shakir	fcb0eca203	Now passing in the path to the GATK directory to tests. Changed tests and scripts to use gatkdir full path instead of relative testdata/qscripts symbolic links. Although symlinks not created, left the symlink deletion script execution with a comment about future removal. Re-enabled example UG pipeline queue test. Replaced all hardcoded strings of {public,private}/testdata with BaseTest variables. Refactored temp list creation method from ListFileUtilsUnitTest to BaseTest.createTempListFile. Removed list files with hardcoded paths, now using createTempListFile instead with private test dir variable.	2014-09-02 01:40:59 +08:00
Michael Linderman	380cd67146	Update extension generator to recognize RodBindingCollection as 'taggable' Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2014-09-01 12:19:44 +08:00
Valentin Ruano-Rubio	6695aeafd9	Disable physical phasing for non-diploid HC calling. Story: https://www.pivotaltracker.com/story/show/77452256 Changes: If ploidy != 2, disable physical phasing and log an info message to let the user know. Tests: Change md5s affected by this change.	2014-08-23 10:52:07 -04:00
Valentin Ruano-Rubio	fc5ce4b662	Created the stand-alone AC and AF annotation AlleleCountBySample Story: https://www.pivotaltracker.com/story/show/77250524 Changes: - Remove the annotating code in GeneralPloidyExactAFCalc (GPEAFC) class. - Added the asAlleleList to GenotypeAlleleCounts class and get (GPEAFC) to use that instead of implementing its own (nicer and more reusable code). - Removed the explicit addition of AlleleCountBySample fields to the VCF header by the walker initialize - Added utility methods in Utils to wrap and int[] array into a List<Integer>, and double[] array into a List<Double> efficiently. Test: - Added unit-testing for asAlleleList in GenotypeAlleleCountsUnitTest (within testFirst and testNext). - Added unit-testing for new methods in Utils : asList(int[]) and asList(double[]) - Changed UG General Ploidy test to add explicitly those annotations. - Non-trivial changes in integration tests involving non-diploid runs (namelly haploid and tetraploid) as they are not showing those annotations anylonger, so the MD5s have been changed accordingly.	2014-08-22 20:33:25 -04:00
Valentin Ruano-Rubio	8d9a55ae60	Moving new omniploidy likelihood calculation classes to their final package (as far as this pull-request is concerned) in org.broadinstitute.gatk.tools.walkers.genotyper	2014-08-19 11:54:29 -04:00
Valentin Ruano-Rubio	611b7f25ea	Adds unit-test and integration test for new omniploidy likelihood calculation components Added md5 to HaplotypeCallerIntegrationTest.testHaplotypeCallerSingleSampleWithDbsnp	2014-08-19 11:53:19 -04:00
Valentin Ruano-Rubio	9ee9da36bb	Generalize the calculation of the genotype likelihoods in HC to cope with haploid and multiploidy Changes in several walker to use new sample, allele closed lists and new GenotypingEngine constructors signatures Rebase adoption of new calculation system in walkers	2014-08-19 11:53:06 -04:00
Valentin Ruano-Rubio	4f993e8dbe	Added read-likelihoods array base structure to substitute existing Map-of-Map-of-Maps.	2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio	242cd0e58f	Added genotype allele counts and likelihood calculator utilities for arbitrary ploidy and number of alleles	2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio	b0a4cb9f0c	Added close sample and allele list data-structures and utility classes	2014-08-19 11:50:12 -04:00
Geraldine Van der Auwera	cdba069b02	changed the GATKDocs format to PHP	2014-08-18 18:04:07 -04:00
Eric Banks	eb84091702	Update the --keepOriginalAC functionality in SelectVariants to work for sites that lose alleles in the selection.	2014-08-14 15:34:09 -04:00
Ryan Poplin	3a9a78c785	Removing an assumption that ADs were in the same order if the number of alleles matched. This happens for example when one sample is C->T and another sample is C->G.	2014-08-13 13:26:40 -04:00
Eric Banks	27193c5048	Merge pull request #700 from broadinstitute/eb_phase_HC_variants_PT74816060 Initial implementation of functionality to add physical phasing informat...	2014-08-13 12:30:32 -04:00
Eric Banks	4512940e87	Initial implementation of functionality to add physical phasing information to the output of the HaplotypeCaller. If any pair of variants occurs on all used haplotypes together, then we propagate that information into the gVCF. Can be enabled with the --tryPhysicalPhasing argument.	2014-08-13 12:25:31 -04:00
Geraldine Van der Auwera	49702dc695	Clarified Phone Home system details re: privacy	2014-08-12 17:23:35 -04:00
jmthibault79	6d7201a7f8	Merge pull request #698 from broadinstitute/pd_printreads_subset Improvements to read-group filtering in PrintReads	2014-08-12 14:13:07 -04:00
Phillip Dexheimer	7e77875c81	Improvements to read-group filtering in PrintReads - Read groups that are excluded by sample_name, platform, or read_group arguments no longer appear in the header - The performance penalty associated with filtering by read group has been essentially eliminated - Partial fulfillment of PT 73075482	2014-08-11 20:08:16 -04:00
Valentin Ruano-Rubio	9a9a68409e	ReadLikelihoods class introduction final changes before merging Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM. Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set. Updated some integration test md5s. Fixing GraphBased bugs with new master code Fixed ReadLikelihoods.changeReads difficult to spot bug. Changed PairHMM interface to fix a bug Fixed missing changes for various PairHMM implementations to get them to use the new structure. Fixed various bugs only detectable when running with full sample(s). Believe to have fixed the lack of annotations in UG runs Fixed integrationt test MD5s Updating some md5s Fixed yet another md5 probably left out by mistake	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	0b472f6bff	Added new test to verify the functionality of ReadLikelihoods.java and its use in HC. Updated existing integration test md5s. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	2914ecb585	Change the Map-of-maps-of-maps for an array based implementation ReadLikelihoods to hold read likelihoods. The array structure should be faster to populate and query (no properly benchmarked) and reduce memory footprint considerably. Nevertheless removing PairHMM factor (using likelihoodEngine Random) it only achieves a speed up of 15% in some example WGS dataset i.e. there are other bigger bottle necks in the system. Bamboo tests also seem to run significantly faster with this change. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: - ReadLikelihoods added to substitute Map<String,PerSampleReadLikelihoods> - Operation that involve changes in full sets of ReadLikelihoods have been moved into that class. - Simplified a bit the code that handles the downsampling of reads based on contamination Caveats: - Still we keep Map<String,PerReadAlleleLikelihoodsMap> around to pass to annotators..., didn't feel like change the interface of so many public classes in this pull-request.	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	09ac3779d6	Added ReadLikelihoods component to substitute Map<String,PerReadAlleleLikelihoodMap>. It uses a more efficient java array[] based implementation and encapsulates operations perform with such a read-likelihood collection such as marginalization, filtering by position, poor modeling or capping worst likelihoods and so forth. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652	2014-08-11 17:46:28 -04:00
Eric Banks	5f31e54d67	Merge pull request #696 from broadinstitute/pd_DoC_sorting Fix sample sort order bug in DepthOfCoverage	2014-08-06 08:35:35 -04:00
Phillip Dexheimer	b0c026e671	Fix sample sort order bug in DepthOfCoverage Rare bug triggered by hash collision between sample names PT 66183936	2014-08-05 21:55:34 -04:00
Phillip Dexheimer	593663d9b6	Improved detection of missing argument values In particular, it was possible to specify arguments for Files or Compound types without values Added a special "none" value for annotations, since a bare "-A" is no longer allowed Delivers PT 71792842 and 59360374	2014-08-05 20:31:31 -04:00
Phillip Dexheimer	359fe150c9	Documentation fix (closed HTML tag)	2014-08-04 23:19:16 -04:00
Laura Gauthier	4373922ee6	Update GATK to work with latest htsjdk ValidationStringency was moved from htsjdk.samtools.SAMFileReader to htsjdk.samtools samtools find BAM index file method was also moved (and made public!)	2014-07-30 12:05:14 -04:00
Eric Banks	84af1fc75f	The copy constructor for a GATKSAMRecord (used for testing only) should use the actual read's contig index, not its mate's.	2014-07-23 15:31:03 -04:00
David Roazen	0798a4b768	Update pom versions to mark the start of GATK 3.3 development	2014-07-17 12:09:33 -04:00
David Roazen	323f22f852	Update pom versions for the 3.2 release	2014-07-17 12:06:22 -04:00
Geraldine Van der Auwera	a6f632874b	Various documentation improvements - Edited intervals merging docs for correctness & clarity - Edited VQSR arg docs and made mode required (+added -mode SNP to VQSR tests) - Moved PaperGenotyper to Toy Walkers to declutter the actually useful docs - Moved GenotypeGVCFs to Variant Discovery category and clarified a few points - Clarified that the -resource argument depends on using the -V:tag format - Clarified how the pcr indel model works - Added caveat for -U ALLOW_N_CIGAR_READS - Added MathJax support for displaying equations in GATKDocs - Updated HC example commands and caveats	2014-07-14 12:03:03 -04:00
Eric Banks	ecefcb383d	Disable the complex variant merging for now, as requested by ATGU	2014-07-11 17:27:40 -04:00
droazen	b8751ad598	Merge pull request #680 from broadinstitute/ldg_VQSRscript Update VQSR Rnd BQSR script generation code for compatibility with late...	2014-07-11 10:16:37 -04:00
Khalid Shakir	18f6d56b4c	Revert "Using the base directory for each test run when outputting MD5DB mismatches." This reverts commit f192f032a153755a84b1d682f6e652a7c6787fb9.	2014-07-11 01:11:25 +08:00
Khalid Shakir	cc09ef9190	Revert "Appending to md5db in the gatkdir, with additional logging." This reverts commit 0aa2884f7b006f5d48c325bf942b92c183e45074.	2014-07-11 01:11:20 +08:00
kshakir	aecd34d274	Merge pull request #677 from broadinstitute/ks_md5_db_per_test_type Appending to md5db in the gatkdir, with additional logging.	2014-07-10 17:53:24 +08:00
Khalid Shakir	a7d1904c63	Appending to md5db in the gatkdir, with additional logging.	2014-07-10 03:58:47 +08:00
Laura Gauthier	99026eb51b	Update VQSR Rnd BQSR script generation code for compatibility with latest ggplot version. Update queueJobReport.R and public/gsalib/src/R/R/gsa.variantqc.utils.R also	2014-07-09 15:36:58 -04:00
David Roazen	719e685759	Remove junit imports in the test suite	2014-07-09 12:09:27 -04:00
Khalid Shakir	2129aa05d8	Bug fix for poms missing package test artifacts.	2014-07-08 06:34:26 +08:00

1 2 3 4 5 ...

4412 Commits (d609b2cdbb7d77fd3fc5ecbde46fbc3b0a036dac)