gatk-3.8

Commit Graph

Author	SHA1	Message	Date
jmthibault79	6d7201a7f8	Merge pull request #698 from broadinstitute/pd_printreads_subset Improvements to read-group filtering in PrintReads	2014-08-12 14:13:07 -04:00
Phillip Dexheimer	7e77875c81	Improvements to read-group filtering in PrintReads - Read groups that are excluded by sample_name, platform, or read_group arguments no longer appear in the header - The performance penalty associated with filtering by read group has been essentially eliminated - Partial fulfillment of PT 73075482	2014-08-11 20:08:16 -04:00
Valentin Ruano-Rubio	9a9a68409e	ReadLikelihoods class introduction final changes before merging Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM. Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set. Updated some integration test md5s. Fixing GraphBased bugs with new master code Fixed ReadLikelihoods.changeReads difficult to spot bug. Changed PairHMM interface to fix a bug Fixed missing changes for various PairHMM implementations to get them to use the new structure. Fixed various bugs only detectable when running with full sample(s). Believe to have fixed the lack of annotations in UG runs Fixed integrationt test MD5s Updating some md5s Fixed yet another md5 probably left out by mistake	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	0b472f6bff	Added new test to verify the functionality of ReadLikelihoods.java and its use in HC. Updated existing integration test md5s. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	2914ecb585	Change the Map-of-maps-of-maps for an array based implementation ReadLikelihoods to hold read likelihoods. The array structure should be faster to populate and query (no properly benchmarked) and reduce memory footprint considerably. Nevertheless removing PairHMM factor (using likelihoodEngine Random) it only achieves a speed up of 15% in some example WGS dataset i.e. there are other bigger bottle necks in the system. Bamboo tests also seem to run significantly faster with this change. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: - ReadLikelihoods added to substitute Map<String,PerSampleReadLikelihoods> - Operation that involve changes in full sets of ReadLikelihoods have been moved into that class. - Simplified a bit the code that handles the downsampling of reads based on contamination Caveats: - Still we keep Map<String,PerReadAlleleLikelihoodsMap> around to pass to annotators..., didn't feel like change the interface of so many public classes in this pull-request.	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	09ac3779d6	Added ReadLikelihoods component to substitute Map<String,PerReadAlleleLikelihoodMap>. It uses a more efficient java array[] based implementation and encapsulates operations perform with such a read-likelihood collection such as marginalization, filtering by position, poor modeling or capping worst likelihoods and so forth. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652	2014-08-11 17:46:28 -04:00
Eric Banks	5f31e54d67	Merge pull request #696 from broadinstitute/pd_DoC_sorting Fix sample sort order bug in DepthOfCoverage	2014-08-06 08:35:35 -04:00
Phillip Dexheimer	b0c026e671	Fix sample sort order bug in DepthOfCoverage Rare bug triggered by hash collision between sample names PT 66183936	2014-08-05 21:55:34 -04:00
Phillip Dexheimer	593663d9b6	Improved detection of missing argument values In particular, it was possible to specify arguments for Files or Compound types without values Added a special "none" value for annotations, since a bare "-A" is no longer allowed Delivers PT 71792842 and 59360374	2014-08-05 20:31:31 -04:00
Phillip Dexheimer	359fe150c9	Documentation fix (closed HTML tag)	2014-08-04 23:19:16 -04:00
Laura Gauthier	4373922ee6	Update GATK to work with latest htsjdk ValidationStringency was moved from htsjdk.samtools.SAMFileReader to htsjdk.samtools samtools find BAM index file method was also moved (and made public!)	2014-07-30 12:05:14 -04:00
Eric Banks	84af1fc75f	The copy constructor for a GATKSAMRecord (used for testing only) should use the actual read's contig index, not its mate's.	2014-07-23 15:31:03 -04:00
David Roazen	0798a4b768	Update pom versions to mark the start of GATK 3.3 development	2014-07-17 12:09:33 -04:00
David Roazen	323f22f852	Update pom versions for the 3.2 release	2014-07-17 12:06:22 -04:00
Geraldine Van der Auwera	a6f632874b	Various documentation improvements - Edited intervals merging docs for correctness & clarity - Edited VQSR arg docs and made mode required (+added -mode SNP to VQSR tests) - Moved PaperGenotyper to Toy Walkers to declutter the actually useful docs - Moved GenotypeGVCFs to Variant Discovery category and clarified a few points - Clarified that the -resource argument depends on using the -V:tag format - Clarified how the pcr indel model works - Added caveat for -U ALLOW_N_CIGAR_READS - Added MathJax support for displaying equations in GATKDocs - Updated HC example commands and caveats	2014-07-14 12:03:03 -04:00
Eric Banks	ecefcb383d	Disable the complex variant merging for now, as requested by ATGU	2014-07-11 17:27:40 -04:00
droazen	b8751ad598	Merge pull request #680 from broadinstitute/ldg_VQSRscript Update VQSR Rnd BQSR script generation code for compatibility with late...	2014-07-11 10:16:37 -04:00
Khalid Shakir	18f6d56b4c	Revert "Using the base directory for each test run when outputting MD5DB mismatches." This reverts commit f192f032a153755a84b1d682f6e652a7c6787fb9.	2014-07-11 01:11:25 +08:00
Khalid Shakir	cc09ef9190	Revert "Appending to md5db in the gatkdir, with additional logging." This reverts commit 0aa2884f7b006f5d48c325bf942b92c183e45074.	2014-07-11 01:11:20 +08:00
kshakir	aecd34d274	Merge pull request #677 from broadinstitute/ks_md5_db_per_test_type Appending to md5db in the gatkdir, with additional logging.	2014-07-10 17:53:24 +08:00
Khalid Shakir	a7d1904c63	Appending to md5db in the gatkdir, with additional logging.	2014-07-10 03:58:47 +08:00
Laura Gauthier	99026eb51b	Update VQSR Rnd BQSR script generation code for compatibility with latest ggplot version. Update queueJobReport.R and public/gsalib/src/R/R/gsa.variantqc.utils.R also	2014-07-09 15:36:58 -04:00
David Roazen	719e685759	Remove junit imports in the test suite	2014-07-09 12:09:27 -04:00
Khalid Shakir	2129aa05d8	Bug fix for poms missing package test artifacts.	2014-07-08 06:34:26 +08:00
Khalid Shakir	e5be9c7073	Using the base directory for each test run when outputting MD5DB mismatches.	2014-07-08 06:34:25 +08:00
Eric Banks	bad7865078	When converting a haplotype to a set of variants we now check for cases that are overly complex. In these cases, where the alignment contains multiple indels, we output a single complex variant instead of the multiple partial indels. We also re-enable dangling tail recovery by default.	2014-07-01 14:18:59 -04:00
Ryan Poplin	0127799cba	Reads are now realigned to the most likely haplotype before being used by the annotations. -- AD,DP will now correspond directly to the reads that were used to construct the PLs -- RankSumTests, etc. will use the bases from the realigned reads instead of the original alignments -- There is now no additional runtime cost to realign the reads when using bamout or GVCF mode -- bamout mode no longer sets the mapping quality to zero for uninformative reads, instead the read will not be given an HC tag	2014-06-30 10:35:50 -04:00
Khalid Shakir	7b5f88a49c	Refactored DoC custom Queue wrappers to a non-package object. Now, "mvn verify && mvn verify" should work again.	2014-06-26 00:59:18 +08:00
droazen	b935ed0df1	Merge pull request #665 from broadinstitute/ks_force_delete_bad_symlinks Executing a version of the delete_maven_links.sh	2014-06-25 00:13:05 -04:00
Phillip Dexheimer	06d619e9aa	Removed redundant SelectVariantsIntegrationTest, merged it's only test into protected version	2014-06-24 18:59:59 -04:00
Khalid Shakir	45d819a00e	For now, executing the delete_maven_links.sh just ahead of creating the symbolic links during the process-test-resources phase. Better than running it during the "clean" phase, since these users may not run "mvn clean" before attempting to build.	2014-06-25 02:32:15 +08:00
Phillip Dexheimer	65eeb4a7ab	Recast the "Invalid JEXL expression detected" error in SelectVariants from a RuntimeException to a UserException - PT 68931448	2014-06-20 00:05:23 -04:00
Phillip Dexheimer	da5e567b73	Added functionality to CatVariants to process .list files with -V - Pivotal 70305712	2014-06-19 21:46:13 -04:00
Ryan Poplin	da1dab6c32	Merge pull request #661 from broadinstitute/jw_allele_balance_gvcf Enable AB annotation in reference model pipeline. Incorporates patches f...	2014-06-19 13:10:41 -04:00
Eric Banks	1092dd6e25	From Carlos Barroto: switch outputRoot in SplitSamFile to an empty string instead of null.	2014-06-19 11:06:55 -04:00
Eric Banks	9212edba41	From Carlos Barroto: made 'level' in Picard's CalculateHsMetrics Scala Queue extension an argument.	2014-06-19 11:06:50 -04:00
Ryan Poplin	8b75428a90	Enable AB annotation in reference model pipeline. Incorporates patches from John Wallace to public github account	2014-06-19 09:35:04 -04:00
Nigel Delaney	7570666f2a	Merge pull request #655 from broadinstitute/nfd_mathutil_opts Optimization of function to calculate the logged sum of exponentiated values	2014-06-17 17:07:42 -04:00
Nigel Delaney	5e258bfeff	Minor optimization to function to calculate the log of exponentials. * Avoids calling Math.Pow whenever possible (skips -Inf and 0 values), leads to better performance.	2014-06-17 15:26:10 -04:00
Chris Whelan	ba1d23e535	Created a new tool, SiblingIBD, which finds Identical-By-Descent regions in two siblings. -When parental genotypes are available, implements an HMM on genotype observations in the quartet. -Outputs IBD regions as well as per-site posterior probabilities of being in each IBD state. -Includes an experimental heuristic based mode for when parental genotypes are not available. -Made a method in MendelianViolation public static to reuse code. -Added the mockito library to private/gatk-tools-private/pom.xml	2014-06-13 09:41:37 -04:00
Menachem Fromer	a1868e8b82	For XHMM and Depth-of-Coverage Qscripts, add ability for user to input sample renaming file at the GATK level using existing GATK flag (--sample_rename_mapping_file) and custom pre-processing code. For XHMM Qscript, add scatter-gather for Discovery and Genotype stages.	2014-06-09 23:49:54 -04:00
Phillip Dexheimer	4eb9858461	Ensure that output files are specified in a writeable location -PT 69579780	2014-06-02 21:13:59 -04:00
Valentin Ruano Rubio	db96891d4b	Merge pull request #638 from broadinstitute/vrr_createTempFile_testfix Changed File.createTempFile to BaseTest.createTempFile calls Test	2014-05-29 10:15:05 -04:00
Valentin Ruano-Rubio	938172d7f0	Removed redundant overrride createTempFileFromBase (same code as super class) and added some finals to DepthOfCoverageB36IntegrationTest	2014-05-28 19:02:04 -04:00
Valentin Ruano-Rubio	e0c221470c	Changed File.createTempFile to BaseTest.createTempFile	2014-05-28 18:59:48 -04:00
EvolvedMicrobe	ef7531d4a5	Merge pull request #640 from broadinstitute/IntegerSWImplementation Change SmithWaterman to use integers instead of doubles.	2014-05-28 15:10:05 -04:00
Nigel Delaney	cc45e62e8e	Change SmithWaterman to use integers instead of doubles.	2014-05-28 13:13:14 -04:00
Eric Banks	ff43b1f298	Merge pull request #636 from broadinstitute/pd_log10_refactor Replaced the static, fixed MathUtils.log10Cache array with a dynamic Log...	2014-05-28 08:46:49 -04:00
Phillip Dexheimer	6122b2805d	Legibility improvements to ProgressMeter - Fields in the header are delimited with the pipe character - Header is now split into two lines to improve spacing - Field width in header and progress lines auto-adjusts to length of "processing units" label (sites, active regions, etc) - Addresses PT 69725930	2014-05-27 23:52:42 -04:00
Phillip Dexheimer	c15e6fcc0e	Refactored the static lookup arrays in MathUtils (log10Cache, log10FactorialCache, jacobianLogTable) -They are now only computed when necessary -Log10Cache is dynamically resizable, either by calling get() on an out-of-range value or by calling ensureCacheContains -Log10FactorialCache and JacobianLogTable are initialized to a fixed size on first access and are not resizable -Addresses PT 69124396	2014-05-27 22:27:57 -04:00

1 2 3 4 5 ...

4386 Commits (6d7201a7f8a15ef3368d3e72f8d5d2d30c00f1ef)