gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Valentin Ruano-Rubio	9ee9da36bb	Generalize the calculation of the genotype likelihoods in HC to cope with haploid and multiploidy Changes in several walker to use new sample, allele closed lists and new GenotypingEngine constructors signatures Rebase adoption of new calculation system in walkers	2014-08-19 11:53:06 -04:00
Valentin Ruano-Rubio	4f993e8dbe	Added read-likelihoods array base structure to substitute existing Map-of-Map-of-Maps.	2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio	242cd0e58f	Added genotype allele counts and likelihood calculator utilities for arbitrary ploidy and number of alleles	2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio	b0a4cb9f0c	Added close sample and allele list data-structures and utility classes	2014-08-19 11:50:12 -04:00
Geraldine Van der Auwera	cdba069b02	changed the GATKDocs format to PHP	2014-08-18 18:04:07 -04:00
Eric Banks	eb84091702	Update the --keepOriginalAC functionality in SelectVariants to work for sites that lose alleles in the selection.	2014-08-14 15:34:09 -04:00
Ryan Poplin	3a9a78c785	Removing an assumption that ADs were in the same order if the number of alleles matched. This happens for example when one sample is C->T and another sample is C->G.	2014-08-13 13:26:40 -04:00
Eric Banks	27193c5048	Merge pull request #700 from broadinstitute/eb_phase_HC_variants_PT74816060 Initial implementation of functionality to add physical phasing informat...	2014-08-13 12:30:32 -04:00
Eric Banks	4512940e87	Initial implementation of functionality to add physical phasing information to the output of the HaplotypeCaller. If any pair of variants occurs on all used haplotypes together, then we propagate that information into the gVCF. Can be enabled with the --tryPhysicalPhasing argument.	2014-08-13 12:25:31 -04:00
Geraldine Van der Auwera	49702dc695	Clarified Phone Home system details re: privacy	2014-08-12 17:23:35 -04:00
jmthibault79	6d7201a7f8	Merge pull request #698 from broadinstitute/pd_printreads_subset Improvements to read-group filtering in PrintReads	2014-08-12 14:13:07 -04:00
Phillip Dexheimer	7e77875c81	Improvements to read-group filtering in PrintReads - Read groups that are excluded by sample_name, platform, or read_group arguments no longer appear in the header - The performance penalty associated with filtering by read group has been essentially eliminated - Partial fulfillment of PT 73075482	2014-08-11 20:08:16 -04:00
Valentin Ruano-Rubio	9a9a68409e	ReadLikelihoods class introduction final changes before merging Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM. Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set. Updated some integration test md5s. Fixing GraphBased bugs with new master code Fixed ReadLikelihoods.changeReads difficult to spot bug. Changed PairHMM interface to fix a bug Fixed missing changes for various PairHMM implementations to get them to use the new structure. Fixed various bugs only detectable when running with full sample(s). Believe to have fixed the lack of annotations in UG runs Fixed integrationt test MD5s Updating some md5s Fixed yet another md5 probably left out by mistake	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	0b472f6bff	Added new test to verify the functionality of ReadLikelihoods.java and its use in HC. Updated existing integration test md5s. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	2914ecb585	Change the Map-of-maps-of-maps for an array based implementation ReadLikelihoods to hold read likelihoods. The array structure should be faster to populate and query (no properly benchmarked) and reduce memory footprint considerably. Nevertheless removing PairHMM factor (using likelihoodEngine Random) it only achieves a speed up of 15% in some example WGS dataset i.e. there are other bigger bottle necks in the system. Bamboo tests also seem to run significantly faster with this change. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: - ReadLikelihoods added to substitute Map<String,PerSampleReadLikelihoods> - Operation that involve changes in full sets of ReadLikelihoods have been moved into that class. - Simplified a bit the code that handles the downsampling of reads based on contamination Caveats: - Still we keep Map<String,PerReadAlleleLikelihoodsMap> around to pass to annotators..., didn't feel like change the interface of so many public classes in this pull-request.	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	09ac3779d6	Added ReadLikelihoods component to substitute Map<String,PerReadAlleleLikelihoodMap>. It uses a more efficient java array[] based implementation and encapsulates operations perform with such a read-likelihood collection such as marginalization, filtering by position, poor modeling or capping worst likelihoods and so forth. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652	2014-08-11 17:46:28 -04:00
Eric Banks	5f31e54d67	Merge pull request #696 from broadinstitute/pd_DoC_sorting Fix sample sort order bug in DepthOfCoverage	2014-08-06 08:35:35 -04:00
Phillip Dexheimer	b0c026e671	Fix sample sort order bug in DepthOfCoverage Rare bug triggered by hash collision between sample names PT 66183936	2014-08-05 21:55:34 -04:00
Phillip Dexheimer	593663d9b6	Improved detection of missing argument values In particular, it was possible to specify arguments for Files or Compound types without values Added a special "none" value for annotations, since a bare "-A" is no longer allowed Delivers PT 71792842 and 59360374	2014-08-05 20:31:31 -04:00
Phillip Dexheimer	359fe150c9	Documentation fix (closed HTML tag)	2014-08-04 23:19:16 -04:00
Laura Gauthier	4373922ee6	Update GATK to work with latest htsjdk ValidationStringency was moved from htsjdk.samtools.SAMFileReader to htsjdk.samtools samtools find BAM index file method was also moved (and made public!)	2014-07-30 12:05:14 -04:00
Eric Banks	84af1fc75f	The copy constructor for a GATKSAMRecord (used for testing only) should use the actual read's contig index, not its mate's.	2014-07-23 15:31:03 -04:00
David Roazen	0798a4b768	Update pom versions to mark the start of GATK 3.3 development	2014-07-17 12:09:33 -04:00
David Roazen	323f22f852	Update pom versions for the 3.2 release	2014-07-17 12:06:22 -04:00
Geraldine Van der Auwera	a6f632874b	Various documentation improvements - Edited intervals merging docs for correctness & clarity - Edited VQSR arg docs and made mode required (+added -mode SNP to VQSR tests) - Moved PaperGenotyper to Toy Walkers to declutter the actually useful docs - Moved GenotypeGVCFs to Variant Discovery category and clarified a few points - Clarified that the -resource argument depends on using the -V:tag format - Clarified how the pcr indel model works - Added caveat for -U ALLOW_N_CIGAR_READS - Added MathJax support for displaying equations in GATKDocs - Updated HC example commands and caveats	2014-07-14 12:03:03 -04:00
Eric Banks	ecefcb383d	Disable the complex variant merging for now, as requested by ATGU	2014-07-11 17:27:40 -04:00
droazen	b8751ad598	Merge pull request #680 from broadinstitute/ldg_VQSRscript Update VQSR Rnd BQSR script generation code for compatibility with late...	2014-07-11 10:16:37 -04:00
Khalid Shakir	18f6d56b4c	Revert "Using the base directory for each test run when outputting MD5DB mismatches." This reverts commit f192f032a153755a84b1d682f6e652a7c6787fb9.	2014-07-11 01:11:25 +08:00
Khalid Shakir	cc09ef9190	Revert "Appending to md5db in the gatkdir, with additional logging." This reverts commit 0aa2884f7b006f5d48c325bf942b92c183e45074.	2014-07-11 01:11:20 +08:00
kshakir	aecd34d274	Merge pull request #677 from broadinstitute/ks_md5_db_per_test_type Appending to md5db in the gatkdir, with additional logging.	2014-07-10 17:53:24 +08:00
Khalid Shakir	a7d1904c63	Appending to md5db in the gatkdir, with additional logging.	2014-07-10 03:58:47 +08:00
Laura Gauthier	99026eb51b	Update VQSR Rnd BQSR script generation code for compatibility with latest ggplot version. Update queueJobReport.R and public/gsalib/src/R/R/gsa.variantqc.utils.R also	2014-07-09 15:36:58 -04:00
David Roazen	719e685759	Remove junit imports in the test suite	2014-07-09 12:09:27 -04:00
Khalid Shakir	2129aa05d8	Bug fix for poms missing package test artifacts.	2014-07-08 06:34:26 +08:00
Khalid Shakir	e5be9c7073	Using the base directory for each test run when outputting MD5DB mismatches.	2014-07-08 06:34:25 +08:00
Eric Banks	bad7865078	When converting a haplotype to a set of variants we now check for cases that are overly complex. In these cases, where the alignment contains multiple indels, we output a single complex variant instead of the multiple partial indels. We also re-enable dangling tail recovery by default.	2014-07-01 14:18:59 -04:00
Ryan Poplin	0127799cba	Reads are now realigned to the most likely haplotype before being used by the annotations. -- AD,DP will now correspond directly to the reads that were used to construct the PLs -- RankSumTests, etc. will use the bases from the realigned reads instead of the original alignments -- There is now no additional runtime cost to realign the reads when using bamout or GVCF mode -- bamout mode no longer sets the mapping quality to zero for uninformative reads, instead the read will not be given an HC tag	2014-06-30 10:35:50 -04:00
Khalid Shakir	7b5f88a49c	Refactored DoC custom Queue wrappers to a non-package object. Now, "mvn verify && mvn verify" should work again.	2014-06-26 00:59:18 +08:00
droazen	b935ed0df1	Merge pull request #665 from broadinstitute/ks_force_delete_bad_symlinks Executing a version of the delete_maven_links.sh	2014-06-25 00:13:05 -04:00
Phillip Dexheimer	06d619e9aa	Removed redundant SelectVariantsIntegrationTest, merged it's only test into protected version	2014-06-24 18:59:59 -04:00
Khalid Shakir	45d819a00e	For now, executing the delete_maven_links.sh just ahead of creating the symbolic links during the process-test-resources phase. Better than running it during the "clean" phase, since these users may not run "mvn clean" before attempting to build.	2014-06-25 02:32:15 +08:00
Phillip Dexheimer	65eeb4a7ab	Recast the "Invalid JEXL expression detected" error in SelectVariants from a RuntimeException to a UserException - PT 68931448	2014-06-20 00:05:23 -04:00
Phillip Dexheimer	da5e567b73	Added functionality to CatVariants to process .list files with -V - Pivotal 70305712	2014-06-19 21:46:13 -04:00
Ryan Poplin	da1dab6c32	Merge pull request #661 from broadinstitute/jw_allele_balance_gvcf Enable AB annotation in reference model pipeline. Incorporates patches f...	2014-06-19 13:10:41 -04:00
Eric Banks	1092dd6e25	From Carlos Barroto: switch outputRoot in SplitSamFile to an empty string instead of null.	2014-06-19 11:06:55 -04:00
Eric Banks	9212edba41	From Carlos Barroto: made 'level' in Picard's CalculateHsMetrics Scala Queue extension an argument.	2014-06-19 11:06:50 -04:00
Ryan Poplin	8b75428a90	Enable AB annotation in reference model pipeline. Incorporates patches from John Wallace to public github account	2014-06-19 09:35:04 -04:00
Nigel Delaney	7570666f2a	Merge pull request #655 from broadinstitute/nfd_mathutil_opts Optimization of function to calculate the logged sum of exponentiated values	2014-06-17 17:07:42 -04:00
Nigel Delaney	5e258bfeff	Minor optimization to function to calculate the log of exponentials. * Avoids calling Math.Pow whenever possible (skips -Inf and 0 values), leads to better performance.	2014-06-17 15:26:10 -04:00
Chris Whelan	ba1d23e535	Created a new tool, SiblingIBD, which finds Identical-By-Descent regions in two siblings. -When parental genotypes are available, implements an HMM on genotype observations in the quartet. -Outputs IBD regions as well as per-site posterior probabilities of being in each IBD state. -Includes an experimental heuristic based mode for when parental genotypes are not available. -Made a method in MendelianViolation public static to reuse code. -Added the mockito library to private/gatk-tools-private/pom.xml	2014-06-13 09:41:37 -04:00
Menachem Fromer	a1868e8b82	For XHMM and Depth-of-Coverage Qscripts, add ability for user to input sample renaming file at the GATK level using existing GATK flag (--sample_rename_mapping_file) and custom pre-processing code. For XHMM Qscript, add scatter-gather for Discovery and Genotype stages.	2014-06-09 23:49:54 -04:00
Phillip Dexheimer	4eb9858461	Ensure that output files are specified in a writeable location -PT 69579780	2014-06-02 21:13:59 -04:00
Valentin Ruano Rubio	db96891d4b	Merge pull request #638 from broadinstitute/vrr_createTempFile_testfix Changed File.createTempFile to BaseTest.createTempFile calls Test	2014-05-29 10:15:05 -04:00
Valentin Ruano-Rubio	938172d7f0	Removed redundant overrride createTempFileFromBase (same code as super class) and added some finals to DepthOfCoverageB36IntegrationTest	2014-05-28 19:02:04 -04:00
Valentin Ruano-Rubio	e0c221470c	Changed File.createTempFile to BaseTest.createTempFile	2014-05-28 18:59:48 -04:00
EvolvedMicrobe	ef7531d4a5	Merge pull request #640 from broadinstitute/IntegerSWImplementation Change SmithWaterman to use integers instead of doubles.	2014-05-28 15:10:05 -04:00
Nigel Delaney	cc45e62e8e	Change SmithWaterman to use integers instead of doubles.	2014-05-28 13:13:14 -04:00
Eric Banks	ff43b1f298	Merge pull request #636 from broadinstitute/pd_log10_refactor Replaced the static, fixed MathUtils.log10Cache array with a dynamic Log...	2014-05-28 08:46:49 -04:00
Phillip Dexheimer	6122b2805d	Legibility improvements to ProgressMeter - Fields in the header are delimited with the pipe character - Header is now split into two lines to improve spacing - Field width in header and progress lines auto-adjusts to length of "processing units" label (sites, active regions, etc) - Addresses PT 69725930	2014-05-27 23:52:42 -04:00
Phillip Dexheimer	c15e6fcc0e	Refactored the static lookup arrays in MathUtils (log10Cache, log10FactorialCache, jacobianLogTable) -They are now only computed when necessary -Log10Cache is dynamically resizable, either by calling get() on an out-of-range value or by calling ensureCacheContains -Log10FactorialCache and JacobianLogTable are initialized to a fixed size on first access and are not resizable -Addresses PT 69124396	2014-05-27 22:27:57 -04:00
David Roazen	74b51c5c7a	Improve test suite tmp file cleanup -Make BaseTest.createTempFile() mark any possible corresponding index files for deletion on exit -Make WalkerTest mark shadow BCF files and auxiliary for deletion on exit -Make VariantRecalibrationWalkersIntegrationTest mark PDF files for deletion on exit	2014-05-27 13:41:44 -04:00
Valentin Ruano-Rubio	7c8a1ae892	Fix for SW to make double comparisons with a tolerance Stories: - https://www.pivotaltracker.com/story/show/69577868 Changes: - Added a epsilon difference tolerance in weight comparisons. Tests: - Added HaplotypeCallerIntegrationTest#testDifferentIndelLocationsDueToSWExactDoubleComparisonsFix - Updated md5 due to minor likelihood changes. - Disabled a test for PathUtils.calculateCigar since does not work and is unclear what is causing the error (needs original author input)	2014-05-23 01:48:48 -04:00
Khalid Shakir	b7e98bdae9	Fixed GATK docs artifact, moved protected ExampleUG tests.	2014-05-22 21:03:55 -04:00
Karthik Gururaj	972a82d386	Changed 'sting' to 'gatk' in the VectorLoglessPairHMM classes and the C++ code	2014-05-19 17:36:41 -04:00
Khalid Shakir	3939971d78	After renaming the packages, instead of updating the JNI library used for testing bwa, moving the classes to the archive. NOTE: The migrated READEME.md has been added that will allow others to possibly ressurect this code as needed.	2014-05-19 17:36:41 -04:00
Khalid Shakir	2c854e554a	Refactored maven directories and java packages replacing "sting" with "gatk". To reduce merge conflicts, this commit modifies contents of files, while file renamings are in previous commit. See previous commit message for list of changes.	2014-05-19 17:36:39 -04:00
Khalid Shakir	4e6d43d003	Refactored maven directories and java packages replacing "sting" with "gatk". To reduce merge conflicts, this commit only renames files, while file modifications are in next commit. Some updates/fixes here are actually included in the next commit. = Maven updates Moved artifacts to new package names: * private/queue-private -> private/gatk-queue-private * private/gatk-private -> private/gatk-tools-private * public/gatk-package -> protected/gatk-package-distribution * public/queue-package -> protected/gatk-queue-package-distribution * protected/gatk-protected -> protected/gatk-tools-protected * public/queue-framework -> public/gatk-queue * public/gatk-framework -> public/gatk-tools-public New poms for new artifacts and packages: * private/gatk-package-internal * private/gatk-queue-package-internal * private/gatk-queue-extensions-internal * protected/gatk-queue-extensions-distribution * public/gatk-engine Updated references to StingText.properties to GATKText.properties. Updated ant-bridge.sh to use gatk.* properties instead of sting.. = Engine updates Renaming files containing engine parts from o.b.gatk.tools to o.b.gatk.engine. Changed package references from tools to engine for CommandLineGATK, GenomeAnalysisEngine, ReadMetrics, ReadProperties, and WalkerManager. Changed package reference tools.phonehome to engine.phonehome. Renamed classes Sting* to GATK, such as ReviewedGATKException. = Test updates Moved gatk example resources. Moved test engine files from tools to engine packages. Moved resources for phonehome to proper package. Moved test classes under o.b.gatk into packages: * o.b.g.utils.{BaseTest,ExampleToCopyUnitTest,GATKTextReporter,MD5DB,MD5Mismatch,TestNGTestTransformer} * o.b.g.engine.walkers.WalkerTest Updated package names in DependencyAnalyzerOutputLoaderUnitTest's data. = Queue updates Moving queue scripts to location where generated extensions can be used. Renamed .q to .scala, updating licenses previously missed by git hooks. Moved queue extensions to new artifact gatk-queue-extensions. Fixed import statments frequently merge-conflicting on FullProcessingPipeline.scala. = BWA Added README on how to obtain and include bwa as a library. Updated libbwa build. Fixed packaged names under bwa/java implementation. Updated contents of BWCAligner native implementation. = Other fixes Don't duplicate the resource bundle entries by both unpacking and appending. (partial fix) Staged engine and utils poms to build GATKText.properties, once Utils random generator dependency on GATK engine is fixed. Re-enabled custom testng listeners/reporters and moved testng dependencies to the gatk-root. Updated comments referencing Sting with GATK. Moved a couple untangled classes from gatk-tools-public to gatk-utils and gatk-engine.	2014-05-19 16:43:47 -04:00
Phillip Dexheimer	a5abc079dc	Revised final Queue status line to display number of jobs in each state when the script fails * Addresses PT 61552466 * Included a simple scala script in private/testdata that will always fail	2014-05-15 21:30:44 -04:00
jmthibault79	78560212d0	Merge pull request #630 from broadinstitute/pd_blank_lines_in_listfile Allow blank lines in a (non-BAM) list file	2014-05-14 11:32:44 -04:00
droazen	8297cd1a1a	Merge pull request #619 from broadinstitute/pd_intervalmerge_doc Made IntervalSharder respect the IntervalMergingRule specified on the co...	2014-05-14 11:22:18 -04:00
Phillip Dexheimer	77449961ab	Allow blank lines in a (non-BAM) list file * Addresses PT Bug 67841052 * Added Unit Test	2014-05-13 23:14:15 -04:00
Khalid Shakir	67e44985b1	Java/Scala imports updated for new package names. Fourth of four commits for picard/htsjdk package rename.	2014-05-08 19:13:31 +08:00
Khalid Shakir	cc3f1f2b96	Revved picard libraries. Third of four commits for picard/htsjdk package rename.	2014-05-08 19:13:27 +08:00
Khalid Shakir	a894a2dddb	Updates to GATK classes and POMs that need updating, plus RodSystemValidation md5 updates. GATK classes accessing package protected htsjdk classes changed to new package names. POMs updated to support merging of sam/tribble/variant -> htsjdk and changes to picard artifact. RodSystemValidation outputs changed due to variant codec packages changes, requiring test md5 updates. Second of four commits for picard/htsjdk package rename.	2014-05-08 19:13:27 +08:00
Khalid Shakir	3ce3e27aa1	Moved GATK classes and POMs that will need updating. GATK classes accessing package protected htsjdk classes will need new package names. POMs will merge sam/tribble/variant into htsjdk. Move only, contents updated in next commit. First of four commits for picard/htsjdk package rename.	2014-05-08 19:13:27 +08:00
Laura Gauthier	bf7b97393e	Add ability to output to a file discordant loci and their respective genotypes for each sample	2014-05-07 10:12:45 -04:00
Karthik Gururaj	d9c489f928	Removed scary warning messages for VectorPairHMM	2014-05-06 10:59:24 -07:00
Karthik Gururaj	f6ea25b4d1	Parallel version of the JNI for the PairHMM The JNI treats shared memory as critical memory and doesn't allow any parallel reads or writes to it until the native code finishes. This is not a problem per se it is the right thing to do, but we need to enable -nct when running the haplotype caller and with it have multiple native PairHMM running for each map call. Move to a copy based memory sharing where the JNI simply copies the memory over to C++ and then has no blocked critical memory when running, allowing -nct to work. This version is slightly (almost unnoticeably) slower with -nct 1, but scales better with -nct 2-4 (we haven't tested anything beyond that because we know the GATK falls apart with higher levels of parallelism * Make VECTOR_LOGLESS_CACHING the default implementation for PairHMM. * Changed version number in pom.xml under public/VectorPairHMM * VectorPairHMM can now be compiled using gcc 4.8.x * Modified define-* to get rid of gcc warnings for extra tokens after #undefs * Added a Linux kernel version check for AVX - gcc's __builtin_cpu_supports function does not check whether the kernel supports AVX or not. * Updated PairHMM profiling code to update and print numbers only in single-thread mode * Edited README.md, pom.xml and Makefile for users to pass path to gcc 4.8.x if necessary * Moved all cpuid inline assembly to single function Changed info message to clog from cinfo * Modified version in pom.xml in VectorPairHMM from 3.1 to 3.2 * Deleted some unnecessary code * Modified C++ sandbox to print per interval timing	2014-05-02 19:12:48 -04:00
Valentin Ruano-Rubio	d563072282	Fix for CombineGVCFs and GenotypeGVCFs recurrent exception about missing PLs Story: https://www.pivotaltracker.com/story/show/68220438 Changes: - PL-less input genotypes are now uncalled and so non-variant sites when combining GVCFs. - HC GVCF/BP_RESOLUTION Mode now outputs non-variant sites in sites covered by deletions. - Fixed existing tests Test: - HaplotypeCallerGVCFIntegrationTest - ReferenceConfidenceModelUnitTest - CombineGVCFsIntegrationTest	2014-05-02 09:21:06 -04:00
Phillip Dexheimer	7a2b70a10f	Made IntervalSharder respect the IntervalMergingRule specified on the command line * This addresses PT Bug 69741902 * Added a required IMR argument to FilePointer, BAMScheduler, IntervalSharder, and SAMDataSource * This rule is used by FilePointer.combine and FilePointer.union * Added unit and integration tests	2014-04-30 22:07:22 -04:00
Michael McCowan	fe3c68cb2d	Java 8 compatability fix: `Reflections` NPE bugfix.	2014-04-29 13:34:03 -04:00
Ryan Poplin	41d3069213	When we subset PLs because Alleles are removed during genotyping we also need to subset AD.	2014-04-28 15:52:26 -04:00
kshakir	10ee35eafa	Merge pull request #616 from broadinstitute/ks_cjav_pbsengine_no_default_queue Removed setting of a default queue in PbsEngineJobRunner.	2014-04-28 14:24:51 -04:00
Ryan Poplin	06dbe74a23	Merge pull request #609 from kcibul/kc_cancersimreads extended SimulateReadsForVariants to optionally use the AF field to indi...	2014-04-28 13:31:56 -04:00
Carlos Borroto	b7a59e01aa	Removed setting of a default queue in PbsEngineJobRunner. Discussed here: http://gatkforums.broadinstitute.org/discussion/3959/would-it-be-possible-for-pbsengine-jobrunner-not-to-set-a-default-queue Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2014-04-29 00:44:12 +08:00
Ami Levy-Moonshine	13dd755468	create a new read transformer that refactor NDN cigar elements to one N element. story: https://www.pivotaltracker.com/story/show/69648104 description: This read transformer will refactor cigar strings that contain N-D-N elements to one N element (with total length of the three refactored elements). This is intended primarily for users of RNA-Seq data handling programs such as TopHat2. Currently we consider that the internal N-D-N motif is illegal and we error out when we encounter it. By refactoring the cigar string of those specific reads, users of TopHat and other tools can circumvent this problem without affecting the rest of their dataset. edit: address review comments - change the tool's name and change the tool to be a readTransformer instead of read filter	2014-04-28 11:29:00 -04:00
Michael McCowan	8290d3c8ac	Allow for non-tab whitespace in sample names when performing on-the-fly sample-renaming.	2014-04-22 11:07:13 -04:00
MauricioCarneiro	f03e5ffeb1	Merge pull request #604 from broadinstitute/vrr_hc_omniploidy_general_api Disentangle UG and HC Genotyper engines.	2014-04-20 07:43:23 -04:00
Valentin Ruano-Rubio	7455ac9796	Addressed revisions	2014-04-19 16:48:48 -04:00
Ryan Poplin	a9a48f2459	Merge pull request #607 from broadinstitute/mm_bugfix_raise_mathutils_n_ceiling Support more samples in math utilities.	2014-04-17 13:32:34 -04:00
Joel Thibault	1ab50f4ba8	CatVariants now handles BCF and Block-Compressed VCF [Delivers #67461500]	2014-04-17 12:31:38 -04:00
Kristian Cibulskis	7115cadbd8	extended SimulateReadsForVariants to optionally use the AF field to indicate allele fraction of the simulated event, useful in cancer and other variable ploidy use cases	2014-04-16 16:20:02 -04:00
Joel Thibault	4c74319578	Update for Picard refactoring which improves block-compressed VCF reading [Delivers #69215404]	2014-04-16 14:39:23 -04:00
Joel Thibault	fd09cb7143	Rev Picard 1.111.1920	2014-04-16 14:39:19 -04:00
Joel Thibault	f98df5c071	Integration test for the file extensions CatVariants should handle	2014-04-16 13:25:47 -04:00
Joel Thibault	bdd7024d00	Integration test for block-compressed VCF reading	2014-04-16 13:09:40 -04:00
Joel Thibault	ce770b032a	Move execAndCheck() to ProcessController	2014-04-16 13:09:40 -04:00
Joel Thibault	b197618d13	This comment is no longer true	2014-04-15 15:42:39 -04:00
Mike	f0732d386c	Support more samples in math utilities. - Amend `MathUtils`' constants such that they support callings in excess of 70,000 samples (instead, 100,000).	2014-04-14 12:05:38 -04:00
Valentin Ruano-Rubio	08203b516e	Disentangle UG and HC Genotyper engines. Description: Transforms a delegation dependency from HC to UG genotyping engine into a reusage by inhertance where HC and UG engines inherit from a common superclass GenotyperEngine that implements the common parts. A side-effect some of the code is now more clear and redundant code has been removed. Changes have a few consequence for the end user. HC has now a few more user arguments, those that control the functionality that HC was borrowing directly from UGE. Added -ploidy argument although it is contraint to be 2 for now. Added -out_mode EMIT_ALL_SITES\|EMIT_VARIANTS_ONLY ... Added -allSitePLs flag. Stories: https://www.pivotaltracker.com/story/show/68017394 Changes: - Moved (HC's) GenotyperEngine to HaplotypeCallerGenotyperEngine (HCGE). Then created a engine superclass class GenotypingEngine (GE) that contains common parts between HCGE and the UG counterpart 'UnifiedGenotypingEngine' (UGE). Simplified the code and applied the template pattern to accomodate for small diferences in behaviour between both caller engines. (There is still room for improvement though). - Moved inner classes and enums to top-level components for various reasons including making them shorter and simpler names to refer to them. - Create a HomoSpiens class for Human specific constants; even if they are good default for most users we need to clearly identify the human assumption across the code if we want to make GATK work with any species in general; i.e. any reference to HomoSapiens, except as a default value for a user argument, should smell. - Fixed a bug deep in the genotyping calculation we were taking on fixed values for snp and indel heterozygisity to be the default for Human ignoring user arguments. - GenotypingLikehooldCalculationCModel.Model to Gen.Like.Calc.*Model.Name; not a definitive solution though as names are used often in conditionals that perhaps should be member methods of the GenLikeCalc classes. - Renamed LikelihoodCalculationEngine to ReadLikelihoodCalculationEngine to distinguish them clearly from Genotype likelihood calculation engines. - Changed copy by explicity argument listing to a clone/reflexion solution for casting between genotypers argument collection classes. - Created GenotypeGivenAllelesUtils to collect methods needed nearly exclusively by the GGA mode. Tests : - StandardCallerArgumentCollectionUnitTest (check copy by cloning/reflexion). - All existing integration and unit tests for modified classes.	2014-04-13 03:09:55 -04:00
Joel Thibault	c84126205b	Test that stdout redirects and log files do not affect output	2014-04-09 13:52:42 -04:00
Joel Thibault	1103fd231a	Better exception message	2014-04-09 10:51:45 -04:00
Eric Banks	b07c0a6b4c	Merge pull request #594 from broadinstitute/dr_vcf_sample_renaming Extend on-the-fly sample renaming feature to vcfs	2014-04-08 11:47:45 -04:00
David Roazen	af6a897479	Extend on-the-fly sample renaming feature to vcfs -Only works with single-sample vcfs -As with bams, the user must provide a file mapping the absolute path to each vcf whose samples are to be renamed to the new sample name for that vcf. The argument is the same as for bams: --sample_rename_mapping_file, and the mapping file may contain a mix of bam and vcf files should the user wish. -It's an error to attempt to remap the sample names of a multi-sample or sites-only vcf -Implemented at the codec level at the instant the vcf header is first read in to minimize the chances of downstream code examining vcf headers/records before renaming occurs. -Integration tests are in sting, unit tests are in picard -Rev picard et. al. to 1.111.1902	2014-04-08 11:07:00 -04:00
Eric Banks	e690ed1a67	The contig is named MT not M in b36. Delivers PT68890442.	2014-04-08 10:03:47 -04:00
Eric Banks	ad336375dc	Merge pull request #590 from broadinstitute/vrr_validate_variants_unused_alleles_fix Addresses issue with strict validation on GVCF files.	2014-04-07 22:10:49 -04:00
Valentin Ruano-Rubio	5afcc8e05f	Change in the command line interface of ValidateVariants. Following reviewers comments the command line interface has been simplified. All extra strict validations are performed by default (as before) and the user has to indicate which one he/she does not want to use with --validationTypeToExclude. Before he/she was able to indicate the only ones to apply with --validationType but that has been scrapped out. Stories: - https://www.pivotaltracker.com/story/show/68725164 Changes: - Removed validateType argument. - Improved documentation. - Added some warnning log message on suspicious argument combinations. Tests: - ValidateVariantsIntegrationTest#*	2014-04-07 16:27:11 -04:00
Ryan Poplin	7d11b4d5f1	Balancing training classes between SNP/Indel and TP/FP. -- This results in much more consistent distribution of LOD scores for SNPs and Indels. -- Removing genotype summary stats since they are now produced by default. -- Added functionality to specify certain subsets of the training data to be used in Tranche file generation, -good:tranche=true set.vcf	2014-04-07 15:23:53 -04:00
MauricioCarneiro	84861fa10a	Merge pull request #587 from broadinstitute/eb_actually_fail_on_reduced_bams Make sure to fail in all cases where the BAM being used was created by ReduceReads.	2014-04-04 17:27:57 -04:00
Laura Gauthier	ff25b656e1	Added check to make sure file passed in with sample IDs is valid (used in SelectVariants) -- throws UserException. Corresponding test checks for UserException.	2014-04-04 15:38:50 -04:00
Valentin Ruano-Rubio	18deeec6b0	Addresses issue with strict validation on GVCF files. More concretelly Picard's strict VCF validation does not like that there is alternative alleles that are not participating in any genotype call across samples. This is an issue with GVCF in the single-sample pipeline where this is certainly expected with <NON_REF> and other relative unlikely alleles. To solve this issue we allow the user to exclude some of the strict validations using a new argument --validationTypeToExclude. In order to avoid the validation issue with GVCF the user needs to add the following to the command line: '--validationTypeToExclude ALLELES' Story: https://www.pivotaltracker.com/story/show/68725164 Changes: - Added validateTypeToExclude argument to ValidateVariants walker. - Implemented the selective exclusion of validation types. - Added new info and improved existing documentation of the ValidateVariants walker. Tests: - ValidateVariantsIntegrationTest#testUnusedAlleleError - ValidateVariantsIntegrationTest#testUnusedAlleleFix	2014-04-04 14:37:10 -04:00
Laura Gauthier	06d78ba068	Expanded documentation to include description of which callsets are being compared in what order and more definitions	2014-04-04 10:35:53 -04:00
Eric Banks	a3d55b3341	Make sure to fail in all cases where the BAM being used was created by ReduceReads. In some cases, the program records were being removed from the BAM headers by the GATK engine before we applied the check for reduced reads (so we did not fail appropriately). Pushed up the check to happen before the PG tags are modified and added a unit test to ensure it stays that way. It turns out that some UG tests still used reduced bams so I switched to use different ones. Based on reviewer feedback, made it more generic so that it's easy to add new unsupported tools.	2014-04-03 16:52:41 -04:00
Eric Banks	0b73573abc	Slightly modifying the way to use the IUPAC ambiguity codes in the FastaAlternateReferenceMaker. Previously it required you to create a single sample VCF and then to pass that in to the tool, but Geraldine convinced me that this was a pain for users (because they usually have multi-sample VCFs). Instead now you can pass in a multi-sample VCF and specify which sample's genotypes should be used for the IUPAC encoding. Therefore the argument changed from '--useIUPAC' to '--use_IUPAC_sample NA12878'.	2014-04-02 21:34:25 -04:00
Valentin Ruano-Rubio	84711b8e90	Fixed bug using GraphBased due to infinite likelihoods resulting from the calculation of alignment cost of very long insertion or deletions (done in linear scale) Stories: https://www.pivotaltracker.com/story/show/66263868 Bug: The problem was due to the way we were calculating the fix penalty of a large deletion or insertion. In this case we calculate the alignment likelihood of the portion or read or haplotype deletion as the penalty of that deletion/insertion without going through the full pair-hmm process. For large events this resulted in a 0 in in linear scale computations that ins transformed into an infinity in log scale. Changes: - Change to use log10 scale for calculate those penalties. - Minor addition of .gitignore to hide ./public/external-example/target which is generated by the building process.	2014-04-01 16:14:52 -04:00
Joel Thibault	70fe7f72f1	Return a TabixIndexCreator for appropriate file types [Fixes #68291082]	2014-03-31 16:15:34 -04:00
Joel Thibault	ab5634cbac	Test that a Tabix index is created for block-compressed output formats - Replace .idx and .tbi with appropriate constants	2014-03-31 14:36:48 -04:00
Joel Thibault	a2d40c84ba	Keep the list of zipped suffixes in sync with Variant	2014-03-31 14:36:41 -04:00
Joel Thibault	a2cd9703fa	Rev Picard 1.110.1773	2014-03-31 14:15:06 -04:00
Joel Thibault	2049eb1658	Rev Picard 1.110.1763 - SamPairUtils migrated in Picard r1737 - Revert IndelRealigner changes made in commit 4f4b85 -- Those changes were based on Picard revision 1722 to net/sf/picard/sam/SamPairUtil.java -- Picard revision 1723 reverts these changes, so we also revert to match	2014-03-30 09:33:57 -04:00
Ryan Poplin	6566dd6ca9	Fix for dropping of reference sample depth in the DP annotation. -- In the case of hierarchical merge we can't assume that we have only one genotype. -- Removed use of deprecated VC annotation access functions.	2014-03-24 14:01:50 -04:00
Ryan Poplin	69eaf7c82d	Merge pull request #577 from broadinstitute/eb_minor_fixes_for_fragment_utils Fixed docs for method and fixed the edge case optimization to properly u...	2014-03-21 14:01:44 -04:00
Eric Banks	0d82a70633	Fixed docs for method and fixed the edge case optimization to properly use equals() on Integers. Shouldn't affect actual results at all.	2014-03-20 15:55:09 -04:00
Eric Banks	3b1c337401	Have CombineVariants throw a UserError when trying to combine GVCFs from the HaplotypeCaller. Was previously throwing an IllegalArgumentException (in the wrong place in the code). Error message tells users to use CombineGVCFs.	2014-03-19 19:11:40 -04:00
David Roazen	e549f4a9d2	Fix typo in UtilsUnitTest data provider name This is currently my leading suspect for the cause of the intermittent NoSuchElementException errors on master, since the maven surefire plugin seems unable to handle errors in TestNG DataProviders without blowing up.	2014-03-18 11:52:29 -04:00
David Roazen	4ba72d43cf	Re-enable GATKRunReportUnitTest This test is not, as I had initially thought, the cause of the maven errors. Our master branch is failing intermittently regardless of whether this test is enabled or disabled. This reverts commit 45fc9ff515eec8d676b64a04fb34fb357492ff84.	2014-03-18 09:53:41 -04:00
David Roazen	afa6abe554	Temporarily disable GATKRunReportUnitTest in unstable while maven issues are worked out This test passes when run individually, as part of the commit tests, or as part of the package tests. However, when running the unit tests in isolation it causes maven/surefire to throw a NoSuchElementException. This is clearly a maven/surefire bug or configuration issue. I will re-enable this test on a branch as Khalid and I try to work through it.	2014-03-18 01:28:28 -04:00
David Roazen	2d8653f493	Update pom versions to mark the start of GATK 3.2 development	2014-03-18 01:18:59 -04:00
David Roazen	a6a41c777c	Update pom versions for 3.1	2014-03-18 01:09:29 -04:00
David Roazen	d5e38ec39b	Move GATKRunReport tests from private to public -Hide AWS downloader credentials in a private properties file -Remove references to private ActiveRegion walker Allows phone home functionality to be tested at release time when we are running tests on the release jar.	2014-03-17 18:29:40 -04:00
droazen	6b3320f067	Merge pull request #561 from broadinstitute/ks_package_classpath Updated package-tests classpath, and allowing javac -cp <package>.jar.	2014-03-17 17:38:24 -04:00
Eric Banks	2e34ff7692	Merge pull request #563 from broadinstitute/aw_refactor_tribble GATK changes to conform to Tribble refactoring as part improving Tabix s...	2014-03-17 13:35:46 -04:00
Eric Banks	dabdd0a0fd	Remove unused and unnecessary argument	2014-03-17 12:28:27 -04:00
Alec Wysoker	0369f93b24	GATK changes to conform to Tribble refactoring as part improving Tabix support in Tribble (among other things). 1. Enable on-the-fly indexing for vcf.gz. 2. Handle on-the-fly indexing where file to be indexed is not a regular file, thus index should not be created. 3. Add method setProgressLogger to all SAMFileWriter implementations. 4. Revved picard to 1.109.1722 5. IndelRealigner md5s change because the MC tag is added to records now. Fixed up and signed off by ebanks.	2014-03-17 11:56:22 -04:00
Khalid Shakir	639247ab48	Updated package-tests classpath, and allowing javac -cp <package>.jar. Package tests now hard coding just the gatk-framework tests jar, to include ONLY BaseTest, until the exclusions may be debugged. Removing cofoja's annotation service from the package jars, to allow javac -cp <package>.jar.	2014-03-17 05:47:59 -04:00
Valentin Ruano-Rubio	2e964c59b4	Improved criteria to select best haplotypes out from the assembly graph. Currently the best haplotypes are those that accumulate the largest ABSOLUTE edge multiplicity sum across their path in the assembly graph. The edge mulitplicity is equal to the number of reads that expand through that edge, i.e. have a kmer that uniquely map to some vertex up-stream from the edge and the following base calls extend across that edge to vertices downstream from it. Despite that it is obvious that higher multiplicties correlated with haplotype probability this criterion fails short in some regards of which the most relevant is: As it is evaluated in condensed seq-graph (as supposed to uncompressed read-threading-graphs) it is bias to haplotypes that have more short-sequence vetices ( -> ATGC -> CA -> has worse score than -> A -> T -> G -> C -> C -> A ->). This is partly result of how we modify the edge multiplicities when we merge vertices from a linear chain. This pull-request addresses the problem by changing to a new scoring schema based in likelihood estimates: Each haplotype's likelihood can be calculated as the multiplication of the likelihood of "taking" its edges in the assembly graph. The likelihood of "taking" an edge in the assembly graph is calculated as its multiplicity divide by the sum of multiplicity of edges that share the same source vertex. This pull-request addresses the following stories: https://www.pivotaltracker.com/story/show/66691418 https://www.pivotaltracker.com/story/show/64319760 Change Summary: 1. Change to the new scoring schema. 2. Added a graph DOT printing code to KBestHaplotypeFinder in order to diagnose scoring. 3. Graph transformation have been modified in order to generate no 0-multiplicity edges. (Nevertheless the schema above should work with 0 edges assuming that they are in fact 0.5)	2014-03-14 18:37:01 -04:00
David Roazen	1324120c17	Unconditionally include all of commons-httpclient in the GATK/Queue jars The maven shade plugin was eliminating a necessary class (IgnoreCookiesSpec) when packaging the GATK/Queue. Work around this by telling maven to always package all of commons-httpclient.	2014-03-14 10:50:15 -04:00
Eric Banks	ffaf92f871	Added new functionality to the FastaAlternateReferenceMaker to have it output IUPAC codes for het sites. Enable it with the new --useIUPAC argument. Added both unit and integration tests for the new functionality - and fixed up the exising tests once I was in there.	2014-03-12 14:31:57 -04:00
droazen	8b53567dc7	Merge pull request #553 from broadinstitute/dr_rename_pipeline_tests Rename existing PipelineTests to QueueTests to prepare for upcoming push of new pipeline tests	2014-03-10 21:36:45 -04:00
David Roazen	78562c14bb	Rename existing PipelineTests to QueueTests to prepare for upcoming push of new pipeline tests -These tests are really integration tests for Queue rather than generalized pipeline tests, so it makes sense to call them QueueTests. -Rename test classes and maven build targets, and update shell scripts to reflect new naming.	2014-03-10 21:24:03 -04:00
David Roazen	7c34f05082	Merge remote-tracking branch 'origin/master' into intel	2014-03-10 14:07:36 -04:00
Ami Levy-Moonshine	2a6f05a8a1	add an option to randomly (uniformly) split a vcf file/s to more than 2 files. The old code that allow split to two files (given in the input) is kept to allow uneven splitting between files.	2014-03-10 10:58:44 -04:00
Karthik Gururaj	6e98e9e589	Removed g_haplotype* global variables in native code so that it works with multi-threading in Java. Modified VectorLoglessPairHMM.java so that jniInitializeRegion and jniFinalizeRegion are empty	2014-03-06 22:08:35 -08:00
Karthik Gururaj	3999677c93	Changed to delete[] where applicable	2014-03-06 12:23:08 -08:00
Karthik Gururaj	a29777765d	Binary library	2014-03-06 11:14:46 -08:00
Karthik Gururaj	7844d956ac	Modified delete to delete[]	2014-03-06 11:13:34 -08:00
Karthik Gururaj	27e640d640	Modified SSE4.1 and 4.2 checks with _may_i_use_cpu_feature()	2014-03-06 08:51:11 -08:00
Karthik Gururaj	37f107cb3a	Using Mustafa's function _may_i_use_cpu_feature() for AVX check	2014-03-06 08:37:48 -08:00
David Roazen	9df59bd8cc	Update pom versions to mark the start of GATK 3.1 development	2014-03-06 00:05:58 -05:00
David Roazen	34edcb8ddf	Update pom versions for the 3.0 release	2014-03-05 23:37:21 -05:00
David Roazen	a9ddfdb7c0	Remove external-example module from public pom.xml This module was causing failures during the release packaging tests. After discussing with Khalid, we've decided to disable it for now until a fix can be developed.	2014-03-05 20:25:38 -05:00
Karthik Gururaj	ec54528605	Fixed error in Sandbox.java	2014-03-05 09:36:55 -08:00
Karthik Gururaj	8fcbf9272c	Merge branch 'intel_pairhmm' of /data/broad/gsa-unstable into intel_pairhmm Conflicts: protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java public/VectorPairHMM/src/main/c++/Sandbox.java	2014-03-05 09:35:50 -08:00
Intel Repocontact	d81116eb1d	Added vectorized PairHMM implementation by Mohammad and Mustafa into the Maven build of GATK. C++ code has PAPI calls for reading hardware counters Followed Khalid's suggestion for packing libVectorLoglessCaching into the jar file with Maven Native library part of git repo 1. Renamed directory structure from public/c++/VectorPairHMM to public/VectorPairHMM/src/main/c++ as per Khalid's suggestion 2. Use java.home in public/VectorPairHMM/pom.xml to pass environment variable JRE_HOME to the make process. This is needed because the Makefile needs to compile JNI code with the flag -I<JRE_HOME>/../include (among others). Assuming that the Maven build process uses a JDK (and not just a JRE), the variable java.home points to the JRE inside maven. 3. Dropped all pretense at cross-platform compatibility. Removed Mac profile from pom.xml for VectorPairHMM Moved JNI_README 1. Added the catch UnsatisfiedLinkError exception in PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING in case the native library could not be loaded. Made VECTOR_LOGLESS_CACHING as the default implementation. 2. Updated the README with Mauricio's comments 3. baseline.cc is used within the library - if the machine supports neither AVX nor SSE4.1, the native library falls back to un-vectorized C++ in baseline.cc. 4. pairhmm-1-base.cc: This is not part of the library, but is being heavily used for debugging/profiling. Can I request that we keep it there for now? In the next release, we can delete it from the repository. 5. I agree with Mauricio about the ifdefs. I am sure you already know, but just to reassure you the debug code is not compiled into the library (because of the ifdefs) and will not affect performance. 1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java 2. Committing the right set of files after rebase Added public license text to all C++ files Added license to Makefile Add package info to Sandbox.java Conflicts: protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/DebugJNILoglessPairHMM.java protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/JNILoglessPairHMM.java protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/VectorLoglessPairHMM.java public/VectorPairHMM/src/main/c++/.gitignore public/VectorPairHMM/src/main/c++/LoadTimeInitializer.cc public/VectorPairHMM/src/main/c++/LoadTimeInitializer.h public/VectorPairHMM/src/main/c++/Makefile public/VectorPairHMM/src/main/c++/Sandbox.cc public/VectorPairHMM/src/main/c++/Sandbox.h public/VectorPairHMM/src/main/c++/Sandbox.java public/VectorPairHMM/src/main/c++/Sandbox_JNIHaplotypeDataHolderClass.h public/VectorPairHMM/src/main/c++/Sandbox_JNIReadDataHolderClass.h public/VectorPairHMM/src/main/c++/baseline.cc public/VectorPairHMM/src/main/c++/define-double.h public/VectorPairHMM/src/main/c++/define-float.h public/VectorPairHMM/src/main/c++/define-sse-double.h public/VectorPairHMM/src/main/c++/define-sse-float.h public/VectorPairHMM/src/main/c++/headers.h public/VectorPairHMM/src/main/c++/jnidebug.h public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.cc public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.h public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.cc public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.h public/VectorPairHMM/src/main/c++/pairhmm-template-kernel.cc public/VectorPairHMM/src/main/c++/pairhmm-template-main.cc public/VectorPairHMM/src/main/c++/run.sh public/VectorPairHMM/src/main/c++/shift_template.c public/VectorPairHMM/src/main/c++/utils.cc public/VectorPairHMM/src/main/c++/utils.h public/VectorPairHMM/src/main/c++/vector_function_prototypes.h	2014-03-05 09:30:29 -08:00
Laura Gauthier	43fdd38342	Add error handling to CalculateGenotypePosteriors to catch multiallelic variants with wrong number of ACs -- throws UserException; added tests in PosteriorLikelihoodsUtilsUnitTests Add error handling to CalculateGenotypePosteriors for cases where MLEAC>AN; add tests in PosteriorLikelihoodsUtilsUnitTests Add unit tests to confirm that CalculateGenotypePosteriors has the ability to switch genotypes for four cases	2014-03-05 12:03:18 -05:00
Laura Gauthier	7f9f58dbd1	Added hidden flag to GenotypeConcordance to output sites of discordant genotypes (to System.out) Revised ConcondanceMetrics tests to adapt to change Added comments to PosteriorLikelihoodsUtils	2014-03-05 12:03:18 -05:00
Joel Thibault	57747ad35e	Logger output should go to STDERR instead of STDOUT	2014-03-05 10:01:06 -05:00
Joel Thibault	b4dde6a78c	Add WARN to the valid log types error message - order if statements and error message in increasing severity	2014-03-05 10:01:06 -05:00
Valentin Ruano Rubio	243d1bc07a	Merge pull request #542 from broadinstitute/vrr_efficient_find_best_haplotypes Added a more efficient implementation of the KBest haplotype finder code...	2014-03-05 09:44:50 -05:00
David Roazen	58905e8fe0	Disable the intermittently-failing and flawed ProgressMeterDaemonUnitTest -created a Pivotal ticket to eventually redesign this test	2014-03-05 09:15:26 -05:00
Valentin Ruano-Rubio	69bf2b3247	Added a more efficient implementation of the KBest haplotype finder code (CONT.) Changes: 1. Addressed review comments on new K-best haplotype assembly graph finder. 2. Generalize KBestHaplotypeFinder to deal with multiple source and sink vertices. 3. Updated test to use KBestHaplotypeFinder instead of KBestPaths 4. Retired KBestPaths to the archive. 5. Small improvements to the code and documentation.	2014-03-04 23:22:27 -05:00
Valentin Ruano-Rubio	7acf2eb0e7	Added a more efficient implementation of the KBest haplotype finder code. Story: https://www.pivotaltracker.com/story/show/66238286 Changes: 1. Created a new k-best haplotype search implementation in class KBestHaplotypeFinder. 2. Changed HC code to use the new implementation. This seems to fix the original problem without causing significant changes in outputs using some empirical data test cases 3. Moved haplotype's cigar calculation code from Path to CigarUtils; need that in order to gain independence from Path in some parts of the code. In any case that seems like a more natural location for that functionality.	2014-03-04 12:22:14 -05:00
Karthik Gururaj	a893765ae2	Added license to Makefile	2014-03-03 09:11:02 -08:00
Karthik Gururaj	7cd23543a1	Added public license text to all C++ files	2014-03-03 09:04:00 -08:00
Eric Banks	22ad18b919	Moving Reduce Reads to the archive. The GATK now fails with a user error if you try to run with a reduced bam. (I added a unit test for that; everything else here is just the removal of all traces of RR)	2014-03-02 02:03:14 -05:00
Khalid Shakir	387188e5bb	Attempting to limit gc during Maven tests, using defaults found in JavaCommandLineFunction	2014-03-01 15:24:45 +08:00
Karthik Gururaj	1b395a871a	1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java 2. Committing the right set of files after rebase	2014-02-28 16:08:28 -08:00
Karthik Gururaj	37526dfad5	1. Added the catch UnsatisfiedLinkError exception in PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING in case the native library could not be loaded. Made VECTOR_LOGLESS_CACHING as the default implementation. 2. Updated the README with Mauricio's comments 3. baseline.cc is used within the library - if the machine supports neither AVX nor SSE4.1, the native library falls back to un-vectorized C++ in baseline.cc. 4. pairhmm-1-base.cc: This is not part of the library, but is being heavily used for debugging/profiling. Can I request that we keep it there for now? In the next release, we can delete it from the repository. 5. I agree with Mauricio about the ifdefs. I am sure you already know, but just to reassure you the debug code is not compiled into the library (because of the ifdefs) and will not affect performance.	2014-02-28 08:59:55 -08:00
Chris Whelan	e61ba8b340	Added command line checks for duplicate files in ROD lists -- Keep a list of processed files in ArgumentTypeDescriptor.getRodBindingsCollection -- Throw user exception if a file name duplicates one that was previously parsed -- Throw user exception if the ROD list is empty -- Added two unit tests to RodBindingCollectionUnitTest	2014-02-27 13:32:18 -05:00
Karthik Gururaj	2d0ce45bb0	Moved JNI_README	2014-02-27 10:12:23 -08:00
Karthik Gururaj	c645725fc3	1. Renamed directory structure from public/c++/VectorPairHMM to public/VectorPairHMM/src/main/c++ as per Khalid's suggestion 2. Use java.home in public/VectorPairHMM/pom.xml to pass environment variable JRE_HOME to the make process. This is needed because the Makefile needs to compile JNI code with the flag -I<JRE_HOME>/../include (among others). Assuming that the Maven build process uses a JDK (and not just a JRE), the variable java.home points to the JRE inside maven. 3. Dropped all pretense at cross-platform compatibility. Removed Mac profile from pom.xml for VectorPairHMM	2014-02-26 15:17:15 -08:00
Karthik Gururaj	bd71ba35e5	Moved pom.xml to VectorPairHMM and updated artifactId	2014-02-26 14:01:46 -08:00
Khalid Shakir	da587d48ed	Using absolute paths in generated diff commands, to ease running them from any directory.	2014-02-27 04:43:39 +08:00
Khalid Shakir	c163e6d0d2	Separate failsafe directories for each of the integration test types [#66515572 ]	2014-02-27 04:43:39 +08:00
Karthik Gururaj	b81e2c2948	Native library part of git repo	2014-02-26 11:47:42 -08:00
Karthik Gururaj	0fe843bfd9	Followed Khalid's suggestion for packing libVectorLoglessCaching into the jar file with Maven	2014-02-26 11:47:42 -08:00
Karthik Gururaj	15fe244e4b	Now has PAPI values	2014-02-26 11:47:42 -08:00
Intel Repocontact	e32e9e6af6	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2014-02-26 11:47:01 -08:00
Intel Repocontact	ff2a972ab5	Merge branch 'master' of github.com:broadinstitute/gsa-unstable Conflicts: .gitignore	2014-02-25 20:56:28 -08:00
Khalid Shakir	f02ce6eca7	Added tests for cleaning up scattered .bai files, and using the log directory. Re-added import java.io.File for BamGatherFunction. Other cleanup to resolve scala syntax warnings from intellij. Moved Example UG script to from protected to public.	2014-02-26 02:11:28 +08:00
pdexheimer	0405afeab2	Inherit BamGatherFunction from MergeSamFiles rather than PicardBamFunction - This change means that BamGatherFunction will now have an @Output field for the BAM index, which will allow the bai to be deleted for intermediate functions Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2014-02-26 02:11:28 +08:00
pdexheimer	504c125c26	Ensure .out files are saved into logDirectory Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2014-02-26 02:11:28 +08:00
pdexheimer	51dcd364a5	Added logDirectory argument Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2014-02-26 02:11:28 +08:00
Khalid Shakir	7e516b294f	Replaced local drmaa and Jama artifacts with versions from maven central. Removed unused caliper binary from local repo.	2014-02-22 01:21:35 +08:00
Khalid Shakir	a75043b207	When git describe fails use "exported" instead of "unknown".	2014-02-22 01:21:35 +08:00
Khalid Shakir	4670c87313	Fixed mvn run for packagetests over external-example.	2014-02-22 01:21:34 +08:00
Khalid Shakir	70ecce2a0f	Fixed scope for test-jar depedencies.	2014-02-22 01:21:34 +08:00
Eric Banks	235f0c6fa0	Merge pull request #528 from broadinstitute/eb_fix_cat_variants_usage_message Fix the usage message for CatVariants to make it accurate.	2014-02-19 22:45:22 -05:00
Eric Banks	341d1bf2dd	Fix the usage message for CatVariants to make it accurate. It just hit a user on our forum...	2014-02-19 20:42:08 -05:00
Valentin Ruano-Rubio	c167fb5fdf	Fixing GenotypesGVCF. Bug uncovered by some untrimmed alleles in the single sample pipeline output. Notice however does not fix the untrimmed alleles in general. Story: https://www.pivotaltracker.com/story/show/65481104 Changes: 1. Fixed the bug itself. 2. Fixed non-working tests (sliently skipped due to exception in dataProvider).	2014-02-19 14:20:39 -05:00
Ryan Poplin	43c20264b0	Initial commit of the random forest classifier.	2014-02-17 13:07:27 -05:00
Khalid Shakir	a505db79f5	Fixed build bug in ./ant-bridge.sh unittest -Dsingle=..., due to external-example. pipeline.run property no longer required to be passed by test executor.	2014-02-15 13:52:20 +08:00
droazen	1e82f117ad	Merge pull request #518 from broadinstitute/ks_skashin_gatkdocs_arguments Ks skashin gatkdocs arguments	2014-02-14 13:57:19 -05:00
Eric Banks	f6022a944b	Merge pull request #513 from broadinstitute/eb_clean_up_genotype_posteriors Various small fixes for CalculateGenotypePosteriors based on feedback fr...	2014-02-14 13:50:46 -05:00
Eric Banks	3724d4e5f3	Various small fixes for CalculateGenotypePosteriors based on feedback from guys in Ben Neale's group. Note that this tool is still a work in progress and very experimental, so isn't 100% stable. Most of the features are untested (both by people and by unit/integration tests) because Chris Hartl implemented it right before he left, and we're going to need to add tests at some point soon. I added a first integration test in this commit, but it's just a start. The fixes include: 1. Stop having the genotyping code strip out AD values. It doesn't make sense that it should do this so I don't know why it was doing that at all. Updated GenotypeGVCFs so that it doesn't need to manually recover them anymore. This also helps CalculateGenotypePosteriors which was losing the AD values. Updated code in LeftAlignAndTrimVariants to strip out PLs and AD, since it wasn't doing that before. Updated the integration test for that walker to include such data. 2. Chris was calling Math.pow directly on the normalized posteriors which isn't safe. Instead, the normalization routine itself can revert back to log scale in a safe manner so let's use it. Also, renamed the variable to posteriorProbabilities (and not likelihoods). 3. Have CGP update the AC/AF/AN counts after fixing GTs.	2014-02-14 13:48:14 -05:00
kshakir	8b136d53b9	Merge pull request #524 from broadinstitute/ks_symlink_bin_jar Create symlinks target/GenomeAnalysisTK.jar and target/Queue.jar	2014-02-15 02:32:59 +08:00
Khalid Shakir	bc9ac93b6c	Adding the external example to the build.	2014-02-15 01:26:07 +08:00
Khalid Shakir	2e99a6ecf8	Create symlinks target/GenomeAnalysisTK.jar and target/Queue.jar during package phase.	2014-02-15 01:12:32 +08:00
Nicholas Clarke	7ae19953f5	Squashed commit of the following: commit 5e73b94eed3d1fc75c88863c2cf07d5972eb348b Merge: e12593a d04a585 Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Fri Feb 14 09:25:22 2014 +0000 Merge pull request #1 from broadinstitute/checkpoint SimpleTimer passes tests, with formatting commit d04a58533f1bf5e39b0b43018c9db3302943d985 Author: kshakir <github@kshakir.org> Date: Fri Feb 14 14:46:01 2014 +0800 SimpleTimer passes tests, with formatting Fixed getNanoOffset() to offset nano to nano, instead of nano to seconds. Updated warning message with comma separated numbers, and exact values of offsets. commit e12593ae66a5e6f0819316f2a580dbc7ae5896ad Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Wed Feb 12 13:27:07 2014 +0000 Remove instance of 'Timer'. commit 47a73e0b123d4257b57cfc926a5bdd75d709fcf9 Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Wed Feb 12 12:19:00 2014 +0000 Revert a couple of changes that survived somehow. - CheckpointableTimer,Timer -> SimpleTimer commit d86d9888ae93400514a8119dc2024e0a101f7170 Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Mon Jan 20 14:13:09 2014 +0000 Revised commits following comments. - All utility merged into `SimpleTimer`. - All tests merged into `SimpleTimerUnitTest`. - Behaviour of `getElapsedTime` should now be consistent with `stop`. - Use 'TimeUnit' class for all unit conversions. - A bit more tidying. commit 354ee49b7fc880e944ff9df4343a86e9a5d477c7 Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Fri Jan 17 17:04:39 2014 +0000 Add a new CheckpointableTimerUnitTest. Revert SimpleTimerUnitTest to the version before any changes were made. commit 2ad1b6c87c158399ededd706525c776372bbaf6e Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Tue Jan 14 16:11:18 2014 +0000 Add test specifically checking behaviour under checkpoint/restart. Slight alteration to the checkpointable timer based on observations during the testing - it seems that there's a fair amount of drift between the sources anyway, so each time we stop we resynchronise the offset. Hopefully this should avoid gradual drift building up and presenting as checkpoint/restart drift. commit 1c98881594dc51e4e2365ac95b31d410326d8b53 Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Tue Jan 14 14:11:31 2014 +0000 Should use consistent time units commit 6f70d42d660b31eee4c2e9d918e74c4129f46036 Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Tue Jan 14 14:01:10 2014 +0000 Add a new timer supporting checkpoint mechanisms. The issue with this is that the current timer is locked to JVM nanoTime. This can be reset after a checkpoint/restart and result in negative elapsed times, which causes an error. This patch addresses the issue in two ways: - Moves the check on timer information in GenomeAnalysisEngine.java to only occur if a time limit has been set. - Create a new timer (CheckpointableTimer) which keeps track of the relation between system and nano time. If this changes drastically, then the assumption is that there has been a JVM restart owing to checkpoint/restart. Any time straddling a checkpoint/restart event will not be counted towards total running time. Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2014-02-14 21:45:47 +08:00
Laura Gauthier	29bb3d4dc1	Check for empty BAM lists in command line input	2014-02-14 08:09:47 -05:00
Khalid Shakir	225ee4880b	Using new parameters via skashin to run gatkdocs in the maven conventional subdirectory. Updated path for output gatkdocs in nightly build script. Removed patch in plugin manager that contained a workaround for gatkdocs running in the top level directory.	2014-02-14 15:57:21 +08:00
skashin	1b3ac95798	Added the following arguments: -settings-dir -destination-dir -forum-key-path Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2014-02-14 14:28:35 +08:00
Eric Banks	7095a60c8e	Merge pull request #516 from broadinstitute/dr_reenable_tests_failing_due_to_java_update Re-enable tests that were failing post-maven due to changes in Java's Math.pow() implementation	2014-02-13 21:05:18 -05:00
David Roazen	4b4b93ad1b	Re-enable tests that were failing post-maven due to changes in Java's Math.pow() implementation After extensive detective work, Joel determined that these tests were failing due to changes in the implementation of Math.pow() in newer versions of Java 1.7. All GSA members should ensure that they're using a JDK that is at least as current as the one in the Java-1.7 dotkit on the Broad servers (build 1.7.0_51-b13).	2014-02-12 16:08:16 -05:00
Eric Banks	5bde7fbf37	Merge pull request #511 from broadinstitute/dr_enable_exclusions_in_maven_package_tests Exclude all transitive dependencies in maven package-tests	2014-02-12 15:38:39 -05:00
Joel Thibault	ef87b051b0	Rev Picard to 1.107.1683 (4 jars)	2014-02-12 15:25:50 -05:00
David Roazen	6f12c8b0dc	Exclude all transitive dependencies in maven package-tests This change should allow us to test that the GATK jar has been correctly packaged at release time, by ensuring that only the packaged jar + a few test-related dependencies are on the classpath when tests are run. Note that we still need to actually test that this works as intended before we can make this live in the Bamboo release plan.	2014-02-12 14:59:05 -05:00
David Roazen	95e1402d21	Add ability to run *KnowledgeBaseTests to maven Run with: mvn verify -Dsting.knowledgebasetests.skipped=false	2014-02-11 14:08:24 -05:00
Khalid Shakir	1666bb7e3a	Patched PluginManager to ignore null classes, that will allow gatkdocs to build successfully when running from the source root directory, due to its hardcoded paths.	2014-02-12 00:48:58 +08:00
Karthik Gururaj	316501b32e	Fixed denominator in profiling	2014-02-10 10:11:03 -08:00
Karthik Gururaj	d081c19178	Minor: added support in C++ sandbox to choose implementation and check from command line	2014-02-09 18:05:35 -08:00
Ryan Poplin	b81494b704	Merge pull request #499 from broadinstitute/eb_fix_ad_updates Fixed bug in generating AD values when new alleles are present for genot...	2014-02-09 17:55:00 -05:00
Eric Banks	abb67cfa5e	Fixed bug in generating AD values when new alleles are present for genotpying GVCFs. This was a dumb mistake that wasn't well tested (but is now).	2014-02-09 15:15:19 -05:00
Khalid Shakir	12bb6fd361	Removed use of picard private. Updated picard-maven script to tag locally modified builds with -SNAPSHOT. Removed old picard jars.	2014-02-09 17:08:52 +08:00
Khalid Shakir	4e0f7521f2	Made scala.maxmemory an argument, and defaulted it to 1g.	2014-02-09 09:24:44 +08:00
Karthik Gururaj	a03d83579b	Matrices in baseline C++ (no vector) implementation of PairHMM are now allocated on heap using "new". Stack allocation led to program crashes for large matrix sizes.	2014-02-07 23:22:05 -08:00
Karthik Gururaj	20a46e4098	Check only for SSE 4.1 (rather than SSE 4.2) when trying to use the SSE implementation of PairHMM	2014-02-07 15:19:55 -08:00
Eric Banks	d689f61005	Fixed up some of the genotype-level annotations being propogated in the single sample HC pipeline. 1. AD values now propogate up (they weren't before). 2. MIN_DP gets transferred over to DP and removed. 3. SB gets removed after FS is calculated. Also, added a bunch of new integration tests for GenotypeGVCFs.	2014-02-07 12:47:54 -05:00
Eric Banks	db68d3fa10	Fixing failing unit tests	2014-02-07 12:24:14 -05:00
Eric Banks	2648219c42	Implementation of a hierarchical merger for gVCFs, called CombineGVCFs. This tool will take any number of gVCFs and create a merged gVCF (as opposed to GenotypeGVCFs which produces a standard VCF). Added unit/integration tests and fixed up GATK docs.	2014-02-07 08:49:18 -05:00
mghodrat	7815c30df8	Adding comments to pairhmm-template-kernel	2014-02-06 20:13:06 -08:00
Karthik Gururaj	b729fc0136	1. Split main JNI function into initializeTestcases, compute_testcases and releaseReads 2. FTZ enabled 3. Cleaner profiling code	2014-02-06 14:35:32 -08:00
Karthik Gururaj	166f91d698	Merge branch 'test_branch' Conflicts: public/c++/VectorPairHMM/LoadTimeInitializer.cc public/c++/VectorPairHMM/pairhmm-1-base.cc public/c++/VectorPairHMM/utils.cc public/c++/VectorPairHMM/utils.h Merged test_branch with intel_pairhmm	2014-02-06 11:18:18 -08:00
Karthik Gururaj	fab6f57e97	1. Enabled FTZ in LoadTimeInitializer.cc 2. Added Sandbox.java for testing 3. Moved compute to utils.cc (inside library) 4. Added flag for disabling FTZ in Makefile	2014-02-06 11:01:33 -08:00
Karthik Gururaj	78642944c0	1. Moved break statement in utils.cc to correct position 2. Tested sandbox with regions 3. Lots of profiling code from previous commit exists	2014-02-06 09:32:56 -08:00
Khalid Shakir	b21c35482e	Packages link private/testdata, so that mvn test -Dsting.serialunittests.skipped=false works.	2014-02-06 08:25:38 -05:00
Khalid Shakir	3848159086	Added a set of serial tests to gatk/queue packages, which runs all tests under their package in one TestNG execution. New properties to disable regenerating example resources artifact when each parallel test runs under packagetest. Moved collection of packagetest parameters from shell scripts into maven profiles. Fixed necessity of test-utils jar by removing incorrect dependenciesToScan element during packagetests. When building picard libraries, run clean first. Fixed tools jar dependency in picard pom. Integration tests properly use the ant-bridge.sh test.debug.port variable, like unit tests.	2014-02-06 08:25:38 -05:00
Valentin Ruano Rubio	988e3b4890	Merge pull request #487 from broadinstitute/vrr_reference_model_with_trimming Get gVCF to work without --dontTrimActiveRegions	2014-02-05 22:52:17 -05:00
Valentin Ruano-Rubio	98ffcf6833	Get gVCF to work without --dontTrimActiveRegions Story: https://www.pivotaltracker.com/story/show/65048706 https://www.pivotaltracker.com/story/show/65116908 Changes: ActiveRegionTrimmer in now an argument collection and it returns not only the trimmed down active region but also the non-variant containing flanking regions HaplotypeCaller code has been simplified significantly pushing some functionality two other classes like ActiveRegion and AssemblyResultSet. Fixed a problem with the way the trimming was done causing some gVCF non-variant records no have conservative 0,0,0 PLs	2014-02-05 22:50:45 -05:00
Karthik Gururaj	acda6ca27b	1. Whew, finally debugged the source of performance issues with PairHMM JNI. See copied text from email below. 2. This commit contains all the code used in profiling, detecting FP exceptions, dumping intermediate results. All flagged off using ifdefs, but it's there. --------------Text from email As we discussed before, it's the denormal numbers that are causing the slowdown - the core executes some microcode uops (called FP assists) when denormal numbers are detected for FP operations (even un-vectorized code). The C++ compiler by default enables flush to zero (FTZ) - when set, the hardware simply converts denormal numbers to 0. The Java binary (executable provided by Oracle, not the native library) seems to be compiled without FTZ (sensible choice, they want to be conservative). Hence, the JNI invocation sees a large slowdown. Disabling FTZ in C++ slows down the C++ sandbox performance to the JNI version (fortunately, the reverse also holds :)). Not sure how to show the overhead for these FP assists easily - measured a couple of counters. FP_ASSISTS:ANY - shows number of uops executed as part of the FP assists. When FTZ is enabled, this is 0 (both C++ and JNI), when FTZ is disabled this value is around 203540557 (both C++ and JNI) IDQ:MS_UOPS_CYCLES - shows the number of cycles the decoder was issuing uops when the microcode sequencing engine was busy. When FTZ is enabled, this is around 1.77M cycles (both C++ and JNI), when FTZ is disabled this value is around 4.31B cycles (both C++ and JNI). This number is still small with respect to total cycles (~40B), but it only reflects the cycles in the decode stage. The total overhead of the microcode assist ops could be larger. As suggested by Mustafa, I compared intermediate values (matrices M,X,Y) and final output of compute_full_prob. The values produced by C++ and Java are identical to the last bit (as long as both use FTZ or no-FTZ). Comparing the outputs of compute_full_prob for the cases no-FTZ and FTZ, there are differences for very small values (denormal numbers). Examples: Diff values 1.952970E-33 1.952967E-33 Diff values 1.135071E-32 1.135070E-32 Diff values 1.135071E-32 1.135070E-32 Diff values 1.135071E-32 1.135070E-32 For this test case (low coverage NA12878), all these values would be recomputed using the double precision version. Enabling FTZ should be fine. -------------------End text from email	2014-02-05 17:09:57 -08:00
Ryan Poplin	693bfac341	Bug fix for missing annotations in CombineReferenceCalculationVariants. They were being dropped in the handoff between engines in a couple of places. -- Updated single sample pipeline test data using Valentin's files and re-enabled CRCV tests	2014-02-05 12:58:48 -05:00
Eric Banks	740b33acbb	We were never validating the sequence dictionary of tabix indexed VCFs for some reason. Fixed. These changes happened in Tribble, but Joel clobbered them with his commit. We can now change the logging priority on failures to validate the sequence dictionary to WARN. Thanks to Tim F for indirectly pointing this out.	2014-02-05 10:12:38 -05:00
Eric Banks	9cac24d1e6	Moving logging status of VCF indexing to DEBUG instead of INFO, otherwise it's painful when reading in lots of files	2014-02-05 10:12:37 -05:00
Eric Banks	91bdf069d3	Some updates to CRCV. 1. Throw a user error when the input data for a given genotype does not contain PLs. 2. Add VCF header line for --dbsnp input 3. Need to check that the UG result is not null 4. Don't error out at positions with no gVCFs (which is possible when using a dbSNP rod)	2014-02-05 10:12:37 -05:00
Joel Thibault	7923e786e9	Rev Picard (public) to 1.107.1676 - Rename snappy to snappy-java - Add maven-metadata-local.xml to .gitignore	2014-02-04 22:04:28 -05:00
Joel Thibault	0025fe190d	Exclude sam's older TestNG	2014-02-04 22:04:27 -05:00
Karthik Gururaj	24f8aef344	Contains profiling, exception tracking, PAPI code Contains Sandbox Java	2014-02-04 16:27:29 -08:00
David Roazen	76086f30b7	Temporarily disable tests that started failing post-maven Joel is working on these failures in a separate branch. Since maven (currently! we're working on this..) won't run the whole test suite to completion if there's a failure early on, we need to temporarily disable these tests in order to allow group members to run tests on their branches again.	2014-02-04 15:31:24 -05:00
David Roazen	3b2f07990d	Re-break the MWUnitTest for Joel to debug	2014-02-04 15:19:09 -05:00
David Roazen	c9032f0b5c	Fix failing unit tests	2014-02-04 03:05:30 -05:00
Khalid Shakir	a4289711e2	Distinct failsafe summary reports, just like invoker report directories.	2014-02-03 13:50:47 -05:00
Khalid Shakir	857e6e0d6f	Bumped version to 2.8-SNAPSHOT, using new update_pom_versions.sh script.	2014-02-03 13:50:46 -05:00
Khalid Shakir	9ca3004fc3	Setting the test-utils' type to test-jar, such that the multi-module build uses testClasses instead of classes as a directory dependency.	2014-02-03 13:50:46 -05:00
Khalid Shakir	de13f41fc3	One step closer to a proper test-utils artifact. Using the maven-jar-plugin to create a test classifer, excluding actual tests, until we can properly separate the classes into separate artifacts/modules.	2014-02-03 13:50:46 -05:00
Khalid Shakir	25aee7164e	Fixed missing "mvn" command execution in ant-bridge. Added pom.xml workarounds for duplicate classpath error, due to gatk-framework dependency containing required BaseTest, and jarred UnitTest/IntegrationTest classes that also exist as files under target/test-classes.	2014-02-03 13:50:46 -05:00
Khalid Shakir	caa76cdac4	Added maven pom.xmls for various artifacts.	2014-02-03 13:50:46 -05:00
Khalid Shakir	d1a689af33	Added new utility files used by maven build, including the ant-bridge script.	2014-02-03 13:50:46 -05:00
Khalid Shakir	88150e0166	Switched commited dependency repository from ivy to maven.	2014-02-03 13:50:46 -05:00
Khalid Shakir	1e25a758f5	Moved files to maven directories. Here are the git moved directories in case other files need to be moved during a merge: git-mv private/java/src/ private/gatk-private/src/main/java/ git-mv private/R/scripts/ private/gatk-private/src/main/resources/ git-mv private/java/test/ private/gatk-private/src/test/java/ git-mv private/testdata/ private/gatk-private/src/test/resources/ git-mv private/scala/qscript/ private/queue-private/src/main/qscripts/ git-mv private/scala/src/ private/queue-private/src/main/scala/ git-mv protected/java/src/ protected/gatk-protected/src/main/java/ git-mv protected/java/test/ protected/gatk-protected/src/test/java/ git-mv public/java/src/ public/gatk-framework/src/main/java/ git-mv public/java/test/ public/gatk-framework/src/test/java/ git-mv public/testdata/ public/gatk-framework/src/test/resources/ git-mv public/scala/qscript/ public/queue-framework/src/main/qscripts/ git-mv public/scala/src/ public/queue-framework/src/main/scala/ git-mv public/scala/test/ public/queue-framework/src/test/scala/	2014-02-03 13:50:44 -05:00
Khalid Shakir	faaef236ea	Moved gsalib, R and other resources, Queue GATK extensions generator, Queue version java files.	2014-02-03 13:49:21 -05:00
Khalid Shakir	eb52dc6a9b	Moved build.xml, ivy.xml, ivysettings.xml, ivy properties, public/packages/*.xml into private/archive/ant	2014-02-03 13:49:20 -05:00
Karthik Gururaj	6d4d776633	Includes code for all debug code for obtaining profiling info	2014-01-30 12:08:06 -08:00
Valentin Ruano-Rubio	89c4e57478	gVCF <NON_REF> in all vcf lines including variant ones when –ERC gVCF is requested. Changes: ------- <NON_REF> likelihood in variant sites is calculated as the maximum possible likelihood for an unseen alternative allele: for reach read is calculated as the second best likelihood amongst the reported alleles. When –ERC gVCF, stand_conf_emit and stand_conf_call are forcefully set to 0. Also dontGenotype is set to false for consistency sake. Integration test MD5 have been changed accordingly. Additional fix: -------------- Specially after adding the <NON_REF> allele, but also happened without that, QUAL values tend to go to 0 (very large integer number in log 10) due to underflow when combining GLs (GenotypingEngine.combineGLs). To fix that combineGLs has been substituted by combineGLsPrecise that uses the log-sum-exp trick. In just a few cases this change results in genotype changes in integration tests but after double-checking using unit-test and difference between combineGLs and combineGLsPrecise in the affected integration test, the previous GT calls were either border-line cases and or due to the underflow.	2014-01-30 11:23:33 -05:00
Karthik Gururaj	5c7427e48c	Temporary commit containing debug profiling code - commented out	2014-01-29 12:10:29 -08:00
Karthik Gururaj	0c63d6264f	1. Added synchronization block around loadLibrary in VectorLoglessPairHMM 2. Edited Makefile to use static libraries where possible	2014-01-27 15:34:58 -08:00
Karthik Gururaj	a15137a667	Modified run.sh	2014-01-27 14:56:46 -08:00
Karthik Gururaj	2c0d70c863	Moved vector JNI code to public/c++/VectorPairHMM	2014-01-27 14:52:59 -08:00
Karthik Gururaj	85a748860e	1. Added more profiling code 2. Modified JNI_README	2014-01-27 14:32:44 -08:00
Valentin Ruano-Rubio	748d2fdf92	Added Integration test to verify the bugs are not there anymore as reported in pivotracker	2014-01-26 23:29:31 -05:00
Karthik Gururaj	018e9e2c5f	1. Cleaned up code 2. Split into DebugJNILoglessPairHMM and VectorLoglessPairHMM with base class JNILoglessPairHMM. DebugJNILoglessPairHMM can, in principle, invoke any other child class of JNILoglessPairHMM. 3. Added more profiling code for Java parts of LoglessPairHMM	2014-01-26 19:18:12 -08:00
Valentin Ruano-Rubio	9e7bf75e89	Fix for the PairHMM transition probability miscalculation. Problem: matchToMatch transition calculation was wrong resulting in transition probabilites coming out of the Match state that added more than 1. Reports: https://www.pivotaltracker.com/s/projects/793457/stories/62471780 https://www.pivotaltracker.com/s/projects/793457/stories/61082450 Changes: The transition matrix update code has been moved to a common place in PairHMMModel to dry out its multiple copies. MatchToMatch transtion calculation has been fixed and implemented in PairHMMModel. Affected integration test md5 have been updated, there were no differences in GT fields and example differences always implied small changes in likelihoods that is what is expected.	2014-01-26 16:30:36 -05:00
Karthik Gururaj	81bdfbd00d	Temporary commit before moving to new native library	2014-01-24 16:29:35 -08:00
Karthik Gururaj	733a84e4f9	Added support to transfer haplotypes once per region to the JNI Re-use transferred haplotypes (stored in GlobalRef) across calls to computeLikelihoods	2014-01-22 10:52:41 -08:00
Karthik Gururaj	88c08e78e7	1. Inserted #define in sandbox pairhmm-template-main.cc 2. Wrapped _mm_empty() with ifdef SIMD_TYPE_SSE 3. OpenMP disabled 4. Added code for initializing PairHMM's data inside initializePairHMM - not used yet	2014-01-21 09:57:14 -08:00
Karthik Gururaj	7180c392af	1. Integrated Mohammad's SSE4.2 code, Mustafa's bug fix and code to fix the SSE compilation warning. 2. Added code to dynamically select between AVX, SSE4.2 and normal C++ (in that order) 3. Created multiple files to compile with different compilation flags: avx_function_prototypes.cc is compiled with -xAVX while sse_function_instantiations.cc is compiled with -xSSE4.2 flag. 4. Added jniClose() and support in Java (HaplotypeCaller, PairHMMLikelihoodCalculationEngine) to call this function at the end of the program. 5. Removed debug code, kept assertions and profiling in C++ 6. Disabled OpenMP for now.	2014-01-20 08:03:42 -08:00
Yossi Farjoun	c79e8ca53e	Added an info log containing the SAM/BAM files that were eventually found from the commandline (useful for when there are files hiding inside bam.lists which may or may not have been constructed correctly...) Added a @hidden option controling the appearance of the full BamList in the log	2014-01-17 11:25:21 -05:00
Karthik Gururaj	f1c772ceea	Same log message as before - forgot -a option 1. Moved computeLikelihoods from PairHMM to native implementation 2. Disabled debug - debug code still left (hopefully, not part of bytecode) 3. Added directory PairHMM_JNI in the root which holds the C++ library that contains the PairHMM AVX implementation. See PairHMM_JNI/JNI_README first	2014-01-16 21:40:04 -08:00
Eric Banks	de56134579	Fixed up and refactored what seems to be a useful private tool to create simulated reads around a VCF. It didn't completely work before (it was hard-coded for a particular long-lost data set) but it should work now. Since I thought that it might prove useful to others, I moved it to protected and added integration tests. GERALDINE: NEW TOOL ALERT!	2014-01-15 13:49:31 -05:00
Geraldine Van der Auwera	edf5880022	Updated SAMPileup codec and pileup-related docs Problem: the codec was written to take in consensus pileups produced with pileup -c option (which consists of 10 or 13 fields per line depending on the variant type) but errored out on the basic pileup format (which only has 6 fields per line). This was inconsistent and confusing to users. Solution: I added a switch in the parsing to recognize and handle both cases more appropriately, and updated related docs. While I was at it I also improved error messages in CheckPileup, which now emits User Error: Bad Input exceptions when reporting mismatches. Which may not be the best thing to do (ultimately they're not really errors, they're just reporting unwelcome results) but it beats emitting Runtime Exceptions. Tested by CheckPileupIntegrationTest which tests both format cases.	2014-01-14 09:14:16 -05:00
Eric Banks	16ecc53749	Merge pull request #469 from broadinstitute/gg_gatkdoc_fixes Assorted fixes and improvements to gatkdocs	2014-01-14 05:56:07 -08:00
droazen	347fab4717	Merge pull request #471 from broadinstitute/eb_output_log_info_for_tim Adding more meta information about the user to the GATK logging output, per Tim F's request.	2014-01-13 17:48:40 -08:00
Geraldine Van der Auwera	bdb3954eb3	removed maxRuntime minValue	2014-01-13 20:45:43 -05:00
Geraldine Van der Auwera	8fcad6680b	Assorted fixes and improvements to gatkdocs -Added docs for ERC mode in HC -Move RecalibrationPerformance walker since to private since it is experimental and unsupported -Updated VR docs and restored percentBad/numBad (but @Hidden) to enable deprecation alert if users try to use them -Improved error msg for conflict between per-interval aggregation and -nt -Minor clean up in exception docs -Added Toy Walkers category for devs and dev supercat (to build out docs for developers) -Added more detailed info to GenotypeConcordance doc based on Chris forum post -Added system to include min/max argument values in gatkdocs (build gatkdocs with 'ant gatkdocs' to test it, see engine and DoC args for in situ examples) -Added tentative min/max argument annotations to DepthOfCoverage and CommandLineGATK arguments (and improved docs while at it) -Added gotoDev annotation to GATKDocumentedFeature to track who is the go-to person in GSA for questions & issues about specific walkers/tools (now discreetly indicated in each gatkdoc)	2014-01-13 17:46:22 -05:00
Eric Banks	851ec67bdc	Adding more meta information about the user to the GATK logging output, per Tim F's request.	2014-01-13 14:36:02 -05:00
droazen	7cd304fb41	Merge pull request #470 from broadinstitute/mf_new_RBP Mf new rbp	2014-01-13 08:46:27 -08:00
Eric Banks	0323caefc8	Added some bug fixes to the gVCF merging code after finally getting some real data to play with. Still under construction, awaiting more test data from Valentin.	2014-01-08 08:34:35 -05:00
Eric Banks	f172c349f6	Adding the functionality to enable users to input a file of VCFs for -V. To do this I have added a RodBindingCollection which can represent either a VCF or a file of VCFs. Note that e.g. SelectVariants allows a list of RodBindingCollections so that one can intermix VCFs and VCF lists. For VariantContext tags with a list, by default the tags for the -V argument are applied unless overridden by the individual line. In other words, any given line can have either one token (the file path) or two tokens (the new tags and the file path). For example: foo.vcf VCF,name=bar bar.vcf Note that a VCF list file name must end with '.list'. Added this functionality to CombineVariants, CombineReferenceCalculationVariants, and VariantRecalibrator.	2014-01-08 00:45:00 -05:00
Menachem Fromer	d1275651ae	Merge remote-tracking branch 'origin/master' into mf_new_RBP	2014-01-03 01:13:40 -05:00
Ami Levy-Moonshine	6da53aea09	Write a new tool for spliting reads that have N cigar string. For example, this tool can be used for processing bowtie RNA-seq data. Each read with k N-cigar elemments is plit to k+1 reads. The split is done by hard clipping the bases rest of the bases. In order to do it, few changes were introduced to some other clipping methods: - make a segnificant change in ClippingOp.hardClip() that prevent the spliting of read with cigar: 1M2I1N1M3I. - change getReadCoordinateForReferenceCoordinate in ReadUtil to recognize Ns create unitTests for that walker: - change ReadClipperTestUtils to be more general in order to use its code and avoid code duplication - move some useful methods from ReadClipperTestUtils to CigarUtils create integration test for that class small change in a comment in FullProcessingPipeline last commit: Address review comments: - move to protected under walkers/rnaseq - change the read splitting methods to be more readable and more efficiant - change (minor changes) some methods in ReadClipper to allow the changes in split reads - add (minor change) one method to CigarUtils to allow the changes in split reads - change ReadUtils.getReadCoordinateForReferenceCoordinate to include possible N in the cigar - address the rest of the review comments (minor changes) - fix ReadUtilsUnitTest.testReadWithNs acoording to the defult behaviour of getReadCoordinateForReferenceCoordinate (in case of refernce index that fall into deletion, return the read index of the base before the deletion). - add another test to ReadUtilsUnitTest.testReadWithNs - Allow the user to print the split positions (not working proparly currently)	2014-01-01 22:21:36 -05:00
Mauricio Carneiro	d1febb89c8	Better documentation for ReadClippingStats walker * add overall walker GATKDocs * add explanation for skip parameter and make it advanced * reverse the logic on exculding unmapped reads for clarity * fix read length calculation to no longer include indels ps: I am not sure how useful this walker is (I didn't write it) but the skip logic is poor and calculates the entire statistic for the reads it is eventually going to skip. This would be an easy fix, but only worth our time if people actually use this.	2014-01-01 14:26:26 -05:00
Eric Banks	f82a7c3f4c	Updating variant jar. The update contains: 1. documentation changes for VariantContext and Allele (which used to discuss the now obsolete null allele) 2. better error messages for VCFs containing complex rearrangements with breakends 3. instead of failing badly on format field lists with '.'s, just ignore them Also, there is a trivial change to use a more efficient method to remove a bunch of attributes from a VC. Delivers PT#s 59675378, 59496612, and 60524016.	2013-12-31 22:48:29 -05:00
Eric Banks	5a1564d1f2	Merge pull request #456 from broadinstitute/eb_unify_hc_combination_steps Created a new walker to do the full combination of N gVCFs from the HC single-sample ref calc pipeline.	2013-12-31 18:57:27 -08:00
Eric Banks	83e09b1f64	Created a new walker to do the full combination of N gVCFs from the HC single-sample ref calc pipeline. Basically, it does 3 things (as opposed to having to call into 3 separate walkers): 1. merge the records at any given position into a single one with all alleles and appropriate PLs 2. re-genotype the record using the exact AF calculation model 3. re-annotate the record using the VariantAnnotatorEngine In the course of this work it became clear that we couldn't just use the simpleMerge() method used by CombineVariants; combining HC-based gVCFs is really a complicated process. So I added a new utility method to handle this merging and pulled any related code out of CombineVariants. I tried to clean up a lot of that code, but ultimately that's out of the scope of this project. Added unit tests for correctness testing. Integration tests cannot be used yet because the HC doesn't output correct gVCFs.	2013-12-31 12:07:56 -05:00
Menachem Fromer	48ef7a1a2f	Merge remote-tracking branch 'origin/master' into mf_new_RBP	2013-12-19 10:42:20 -05:00
David Roazen	4a79831adc	Add ability to specify min/max required/recommended values for numeric arguments in the @Argument annotation -You can now add "minValue", "maxValue", "minRecommendedValue", and "maxRecommendedValue" attributes to @Argument annotations for command-line arguments -"minValue" and "maxValue" specify hard limits that generate an exception if violated -"minRecommendedValue" and "maxRecommendedValue" specify soft limits that generate a warning if violated -Works only for numeric arguments (int, double, etc.) with @Argument annotations -Only considers values actually specified by the user on the command line, not default values assigned in the code As requested by Geraldine	2013-12-18 18:09:08 -05:00
Eric Banks	400e7c1404	Fixed bug in the filtering of lifted over variants where a deletion at the end of a contig could cause it to error out. Added a unit test.	2013-12-11 14:07:18 -05:00
Eric Banks	418fbdfbab	Added HC trio calls and NA12878 KB snapshot to resource bundle. Also, don't touch the current link until the resources are finished being produced.	2013-12-07 22:08:34 -05:00
David Roazen	932cd3ada7	Fix 3rd-party library dependency issues in the HC/PairHMM tests In general, test classes cannot use 3rd-party libraries that are not also dependencies of the GATK proper without causing problems when, at release time, we test that the GATK jar has been packaged correctly with all required dependencies. If a test class needs to use a 3rd-party library that is not a GATK dependency, write wrapper methods in the GATK utils/* classes, and invoke those wrapper methods from the test class.	2013-12-06 13:16:55 -05:00
David Roazen	0e65296efb	Rev picard, sam-jdk, tribble, and variant jars to 1.104.1628 -update VariantFiltration to work with new Lazy wrapper around the JexlEngine in VariantContextUtils	2013-12-05 12:45:32 -05:00
Joel Thibault	5fe0531b4d	Throw a GVCFIndexException when the user doesn't specify the optimal indexing strategy	2013-12-03 23:12:14 -05:00
Joel Thibault	8571a641bf	Add @Advanced to variant_index_type and variant_index_parameter	2013-12-03 23:12:14 -05:00
Joel Thibault	fd0a02e52e	New VCF engine arguments to specify an alternate IndexCreator - CatVariants updates to use custom VCF indices - Scala scripts for VCF index testing	2013-12-03 13:31:02 -05:00
Joel Thibault	42f78bdb3a	Add a class-based DataProvider	2013-12-03 13:31:01 -05:00
Joel Thibault	cd3ee2ae7e	whitespace	2013-12-03 13:31:01 -05:00
Eric Banks	6bee6a1b53	Change the behavior of SelectVariants for PL/AD when it encounters a record that has lost one or more alternate alleles. Previously, we would strip out the PLs and AD values since they were no longer accurate. However, this is not ideal because then that information is just lost and 1) users complain on the forum and post it as a bug and 2) it gives us problems in both the current and future (single sample) calling pipelines because we subset samples/alleles all the time and lose info. Now the PLs and AD get correctly selected down. While I was in there I also refactored some related code in subsetDiploidAlleles(). There were no real changes there - I just broke it out into smaller chunks as per our best practices. Added unit tests and updated integration tests. Addressed reviews.	2013-12-03 09:23:03 -05:00
Valentin Ruano-Rubio	0f99778a59	Adding Graph-based likelihood ratio calculation to HC To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line. New HC Options (both Advanced and Hidden): ========================================== --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM) Specifies what engine should be used to generate read vs haplotype likelihoods. PairHMM : standard full-PairHMM approach. GraphBased : using the assembly graph to accelarate the process. Random : generate random likelihoods - used for benchmarking purposes only. --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN) It idicates how to merge haplotypes produced using different kmerSizes. Only has effect when used in combination with (--likelihooCalculationEngine GraphBased) COMBO_MIN : use the smallest kmerSize with all haplotypes. COMBO_MAX : use the larger kmerSize with all haplotypes. MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it. MAX_ONLY : use the larger kmerSize with haplotypes asembled using it. Major code changes: =================== * Introduce multiple likelihood calculation engines (before there was just one). * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class. * Added yet another PairHMM implementation with a different API in order to spport local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype). Major components: ================ * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution of the graph-based likelihood approach. * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals to calcualte the likelihoods using the graph as an scafold. * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one used by GraphBasedLikelihoodCalculationEngineInstance to do its work. * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is used by GraphBasedLikelihoodCalcuationEngineInstance to do its work. Remove mergeCommonChains from HaplotypeGraph creation Fixed bamboo issues with HaplotypeGraphUnitTest Fixed probrems with HaplotypeCallerIntegrationTest Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest Fixed ReadThreadingLikelihoodCalculationEngine issues Moved event-block iteration outside GraphBasedEngineInstance Removed unecessary parameter from ReadAnchoring constructor. Fixed test problem Added a bit more documentation to EventBlockSearchEngine Fixing some private - protected dependency issues Further refactoring making GraphBasedInstance and HaplotypeGraph slimmer. Addressed last pull request commit comments Fixed FastLoglessPairHMM public -> protected dependency Fixed probrem with HaplotypeGraph unit test Adding Graph-based likelihood ratio calculation to HC To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line. New HC Options (both Advanced and Hidden): ========================================== --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM) Specifies what engine should be used to generate read vs haplotype likelihoods. PairHMM : standard full-PairHMM approach. GraphBased : using the assembly graph to accelarate the process. Random : generate random likelihoods - used for benchmarking purposes only. --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN) It idicates how to merge haplotypes produced using different kmerSizes. Only has effect when used in combination with (--likelihooCalculationEngine GraphBased) COMBO_MIN : use the smallest kmerSize with all haplotypes. COMBO_MAX : use the larger kmerSize with all haplotypes. MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it. MAX_ONLY : use the larger kmerSize with haplotypes asembled using it. Major code changes: =================== * Introduce multiple likelihood calculation engines (before there was just one). * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class. * Added yet another PairHMM implementation with a different API in order to spport local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype). Major components: ================ * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution of the graph-based likelihood approach. * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals to calcualte the likelihoods using the graph as an scafold. * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one used by GraphBasedLikelihoodCalculationEngineInstance to do its work. * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is used by GraphBasedLikelihoodCalcuationEngineInstance to do its work. Remove mergeCommonChains from HaplotypeGraph creation Fixed bamboo issues with HaplotypeGraphUnitTest Fixed probrems with HaplotypeCallerIntegrationTest Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest Fixed ReadThreadingLikelihoodCalculationEngine issues Moved event-block iteration outside GraphBasedEngineInstance Removed unecessary parameter from ReadAnchoring constructor. Fixed test problem Added a bit more documentation to EventBlockSearchEngine Fixing some private - protected dependency issues Further refactoring making GraphBasedInstance and HaplotypeGraph slimmer. Addressed last pull request commit comments Fixed FastLoglessPairHMM public -> protected dependency Fixed probrem with HaplotypeGraph unit test	2013-12-02 19:37:19 -05:00
Chris Hartl	1f777c4898	Introducing the latest-and-greatest in genotyping: CalculatePosteriors. CalculatePosteriors enables the user to calculate genotype likelihood posteriors (and set genotypes accordingly) given one or more panels containing allele counts (for instance, calculating NA12878 genotypes based on 1000G EUR frequencies). The uncertainty in allele frequency is modeled by a Dirichlet distribution (parameters being the observed allele counts across each allele), and the genotype state is modeled by assuming independent draws (Hardy-Weinberg Equilibrium). This leads to the Dirichlet-Multinomial distribution. Currently this is implemented only for ploidy=2. It should be straightforward to generalize. In addition there's a parameter for "EM" that currently does nothing but throw an exception -- another extension of this method is to run an EM over the Maximum A-Posteriori (MAP) allele count in the input sample as follows: while not converged: * AC = [external AC] + [sample AC] * Prior = DirichletMultinomial[AC] * Posteriors = [sample GL + Prior] * sample AC = MLEAC(Posteriors) This is more useful for large callsets with small panels than for small callsets with large panels -- the latter of these being the more common usecase. Fully unit tested. Reviewer (Eric) jumped in to address many of his own comments plus removed public->protected dependencies.	2013-11-27 13:00:45 -05:00
Geraldine Van der Auwera	429582589f	Set SAMFileWriter to create index in ReadUtils to fix SplitSamFile issue	2013-11-26 15:54:47 -05:00
Geraldine Van der Auwera	25bc6e64ae	Patched Queue extensions lacking a main class definition	2013-11-22 14:57:09 -05:00
Ami Levy-Moonshine	6ad841cec5	Rewrite ReadLengthDistribution to count the read lengths into a hash table first and only at the end to produce a GATK report table. Before that fix, the tool was couldn't work with more then one RG before. - Address all review comments	2013-11-18 17:29:31 -05:00
Ami Levy-Moonshine	9c1023c933	fix a (ugly) weird error from last commit that changed all the scala files to end with MoleculoPipeline.scala	2013-11-18 11:44:24 -05:00
MauricioCarneiro	7f08250870	Merge pull request #417 from broadinstitute/bt_pairhmm_api_cleanup2 Improve the PairHMM API for better FPGA integration	2013-11-14 10:47:07 -08:00
bradtaylor	e40a07bb58	Improve the PairHMM API for better FPGA integration Motivation: The API was different between the regular PairHMM and the FPGA-implementation via CnyPairHMM. As a result, the LikelihoodCalculationEngine had to use account for this. The goal is to change the API to be the same for all implementations, and make it easier to access. PairHMM PairHMM now accepts a list of reads and a map of alleles/haplotpes and returns a PerReadAlleleLikelihoodMap. Added a new primary method that loops the reads and haplotypes, extracts qualities, and passes them to the computeReadLikelihoodGivenHaplotypeLog10 method. Did not alter that method, or its subcompute method, at all. PairHMM also now handles its own (re)initialization, so users don't have to worry about that. CnyPairHMM Added that same new primary access method to this FPGA class. Method overrides the default implementation in PairHMM. Walks through a list of reads. Individual-read quals and the full haplotype list are fed to batchAdd(), as before. However, instead of waiting for every read to get added, and then walking through the reads again to extract results, we just get the haplotype-results array for each read as soon as it is generated, and pack it into a perReadAlleleLikelihoodMap for return. The main access method is now the same no matter whether the FPGA CnyPairHMM is used or not. LikelihoodCalculationEngine The functionality to loop through the reads and haplotypes and get individual log10-likelihoods was moved to the PairHMM, and so removed from here. However, this class does need to retain the ability to pre-process the reads, and post-process the resulting likelihoods map. Those features were separated from running the HMM and refactored into their own methods Commented out the (unused) system for finding best N haplotypes for genotyping. PairHMMIndelErrorModel Similar changes were made as to the LCE. However, in this case the haplotypes are modified based on each individual read, so the read-list we feed into the HMM only has one read.	2013-11-14 09:45:33 -05:00
Geraldine Van der Auwera	f22ab033f6	Merge pull request #424 from broadinstitute/gg_yetanothergatkdocfix Yet another gatkdoc fix	2013-11-13 11:35:59 -08:00
Geraldine Van der Auwera	dac3dbc997	Improved gatkdocs for InbreedingCoefficient, ReduceReads, ErrorRatePerCycle Clarified caveat for InbreedingCoefficient Cleaned up docstrings for ReduceReads Brushed up doc for ErrorRatePerCycle	2013-11-13 14:33:04 -05:00
Phillip Dexheimer	296bcc7fb1	Changed name of jobs submitted to cluster job runners -- Added 'jobRunnerJobName' definition to QFunction, defaults to value of shortDescription -- Edited Lsf and Drmaa JobRunners to use this string instead of description for naming jobs in the scheduler Signed-off-by: Joel Thibault <thibault@broadinstitute.org>	2013-11-12 14:34:56 -05:00
Mauricio Carneiro	725656ae7e	Generalizing the FullProcessingPipeline Qscript We have generalized the processing script to be able to handle multiple scenarios. Originally it was designed for PCR free data only, we added all the steps necessary to start from fastq and process RNA-seq as well as non-human data. This is our go to script in TechDev. * add optional "starting from fastq" path to the pipeline * add mark duplicates (optionally) to the pipeline * add an option to run with the mouse data (without dbsnp and with single ended fastq) * add option to process RNA-seq data from topHat (add RG and reassign mapping quality if necessary) * add option to filter or include reads with N in the cigar string * add parameter to allow keeping the intermediate files	2013-11-07 16:34:29 -05:00
Eric Banks	f15355856a	Merge pull request #418 from broadinstitute/eb_fix_liftover_script Fixing the liftover script to not require strict VCF header validation.	2013-11-07 06:04:56 -08:00
Eric Banks	2fc40a0aed	Fixing the liftover script to not require strict VCF header validation. Apparently no one has used the liftover script for a while (which I guess is a good thing)...	2013-11-07 09:02:17 -05:00
Eric Banks	0e3d83d1ef	Merge pull request #413 from broadinstitute/rp_qd_and_qual_updates_in_ref_model_pipeline Improvements to the reference model pipeline.	2013-11-05 06:33:17 -08:00
Eric Banks	09dfaf1a68	Merge pull request #416 from broadinstitute/mc_quick_fixes_to_cser_pipeline Add interpretation to QualifyMissingIntervals	2013-11-05 06:08:13 -08:00
Eric Banks	96024403bf	Update the dbsnp version in the bundle from 137 to 138; resolves PT #59771004 .	2013-11-04 10:01:22 -05:00
Ryan Poplin	b22c9c2cb4	Improvements to the reference model pipeline. -- We use the RegenotypeVariants walker to recompute the qual field. (instead of the discussed idea of adding this functionality to CombineVariants) -- QualByDepth will now be recomputed even if the stratified contexts are missing. This greatly improves the QD estimate for this pipeline. Doesn't work for multi-allelics since the qual can't be recomputed.	2013-11-01 17:58:25 -04:00
Eric Banks	cafcb34855	Merge pull request #411 from broadinstitute/eb_add_exome_intervals_to_bundle_script Updated the GATK bundle script to:	2013-10-29 07:38:44 -07:00
Eric Banks	209f2a61aa	Updated the GATK bundle script to: 1. Include exome target list for b37 2. Not delete the 'current' link unless -run is applied to the command line! (sorry, Ryan)	2013-10-29 10:33:51 -04:00
Louis Bergelson	9498950b1c	Adding more specific error message when one of the scripts doesn't exist. --Previously it gave a cryptic message: ----IO error while decoding blarg.script with UTF-8 ----Please try specifying another one using the -encoding option	2013-10-21 14:57:42 -04:00
David Roazen	5a2ef37ead	Tweak dcov documentation to help prevent user confusion Geraldine-approved!	2013-10-16 15:24:33 -04:00
Mauricio Carneiro	efbfdb64fe	Qscript to Downsample and analyze an exome BAM this script downsamples an exome BAM several times and makes a coverage distribution analysis (of bases that pass filters) as well as haplotype caller calls with a NA12878 Knowledge Base assessment with comparison against multi-sample calling with the UG. This script was used for the "downsampling the exome" presentation	2013-10-10 14:37:33 -04:00
Chris Hartl	55bab9fa87	Merged bug fix from Stable into Unstable	2013-10-10 13:01:12 -04:00
Chris Hartl	06d28c7f8b	VariantsToBinaryPed: Move .fam file writing to initialize to ensure ordering matches the ordering of the VCF. Change the documentation to clarify that the fam files are not directly copied, but subset and re-ordered.	2013-10-10 12:53:15 -04:00
Mauricio Carneiro	5d6421494b	Fix mismatching number of columns in report Quick fix the missing column header in the QualifyMissingIntervals report. Adding a QScript for the tool as well as a few minor updates to the GATKReportGatherer.	2013-10-09 14:38:15 -04:00
Ryan Poplin	f3a67edc24	Merge pull request #402 from broadinstitute/gg_dcov_docs Improvements to gatkdocs related to downsampling	2013-09-27 07:07:21 -07:00
kshakir	a29f1f84bf	Merge pull request #397 from lbergelson/lb_scala_2.10.2 Update scala from 2.9 to 2.10.2	2013-09-26 21:51:43 -07:00
Geraldine Van der Auwera	66d0235efc	Minor clarifications & formatting tweaks for dcov docs	2013-09-26 14:28:22 -04:00
Michael McCowan	5113e21437	Bug fix: annotation values ar parsed as Doubles when they should be parsed as Integers due to implicit conversion. * Updated expected test data in which an integer annotation (MQ0) was formatted as a double.	2013-09-25 13:12:02 -04:00
Louis Bergelson	c05208ecec	Resolving warnings --specifying exception types in cases where none was already specified ----mostly changed to catch Exception instead of Throwable ----EmailMessage has a point where it should only be expecting a RetryException but was catching everything --changing build.xml so that it prints scala feature warning details --added necessary imports needed to remove feature warnings --updating a newly deprecated enum declaration to match the new syntax	2013-09-23 12:42:22 -04:00
Louis Bergelson	b32ad99d3f	Changing from scala 2.9.2 to 2.10.2. --modified ivy dependencies --modified scala classpath in build.xml to include scala-reflect --changed imports to point to the new scala scala.reflect.internal.util --set the bootclasspath in QScriptManager as well as the classpath variable. --removing Set[File] <-> Set[String] conversions ----Set is invariant now and the conversions broke --removing unit tests for Set[File] <-> Set[String] conversions	2013-09-23 12:42:22 -04:00
chapmanb	2f5064dd1d	Provide close methods to clean up resources used while creating AlignmentContexts from BAM file regions. Allows utilization of CoveredLocusView via the API Signed-off-by: David Roazen <droazen@broadinstitute.org>	2013-09-10 15:32:54 -04:00
Geraldine Van der Auwera	292426b504	Merge pull request #390 from broadinstitute/mc_update_clipreads Added REVERT SOFTCLIPPED bases to ClipReads	2013-09-09 16:43:03 -07:00
Geraldine Van der Auwera	8b829255e7	Clarified docs on using clipping options	2013-09-09 19:40:03 -04:00
MauricioCarneiro	014bc4269e	Merge pull request #361 from broadinstitute/bt_pairhmm_array_implementation Add Array Logless PairHMM	2013-09-08 20:16:53 -07:00
Ryan Poplin	3503050a39	Created a single sample calling pipeline which leverages the reference model calculation mode of the HaplotypeCaller -- Adding changes to CombineVariants to work with the Reference Model mode of the HaplotypeCaller. -- Added -combineAnnotations mode to CombineVariants to merge the info field annotations by taking the median -- Added new StrandBiasBySample genotype annotation for use in computing strand bias from single sample input vcfs -- Bug fixes to calcGenotypeLikelihoodsOfRefVsAny, used in isActive() as well as the reference model -- Added active region trimming capabilities to the reference model mode, not perfect yet, turn off with --dontTrimActiveRegions -- We only realign reads in the reference model if there are non-reference haplotypes, a big time savings -- We only realign reads in the reference model if the read is informative for a particular haplotype over another -- GVCF blocks will now track and output the minimum PLs over the block -- MD5 changes! -- HC tests: from bug fixes in calcGenotypeLikelihoodsOfRefVsAny -- GVCF tests: from HC changes above and adding in active region trimming	2013-09-06 16:56:34 -04:00
Mauricio Carneiro	b6c3ed0295	Added REVERT SOFTCLIPPED bases to ClipReads	2013-09-06 09:30:01 -04:00
Louis Bergelson	4473b0065e	adding a check for the UNAVAILABLE case of GenotypeType in CountVariants	2013-08-29 17:27:00 -04:00
bradtaylor	3671e41b0c	Add Array Logless PairHMM A new PairHMM implementation for read/haplotype likelihood calculations. Output is the same as the LOGLESS_CACHING version. Instead of allocating an entire (read x haplotype) matrix for each HMM state, this version stores sub-computations in 1D arrays. It also accesses intersections of the (read x haplotype) alignment in a different order, proceeding over "diagonals" if we think of the alignment as a matrix. This implementation makes use of haplotype caching. Because arrays are overwritten, it has to explicitly store mid-process information. Knowing where to capture this info requires us to look ahead at the subsequent haplotype to be analyzed. This necessitated a signature change in the primary method for all pairHMM implementations. We also had to adjust the classes that employ the pairHMM: LikelihoodCalculationEngine (used by HaplotypeCaller) PairHMMIndelErrorModel (used by indel genotyping classes) Made the array version the default in the HaplotypeCaller and the UnifiedArgumentCollection. The latter affects classes: ErrorModel GeneralPloidyIndelGenotypeLikelihoodsCalculationModel IndelGenotypeLikelihoodsCalculationModel ... all of which use the pairHMM via PairHMMIndelErrorModel	2013-08-28 17:21:23 -04:00
David Roazen	42d771f748	Remove org.apache.commons.collections.IteratorUtils dependency from the test suite -This was a dependency of the test suite, but not the GATK proper, which caused problems when running the test suite on the packaged GATK jar at release time -Use GATKVCFUtils.readVCF() instead	2013-08-21 19:44:02 -04:00
Eric Banks	9424008055	Merge pull request #383 from broadinstitute/dr_change_phone_home_aws_settings Update GATK AWS phone-home configuration	2013-08-21 14:08:21 -07:00
David Roazen	9fbb4920d0	Update GATK AWS phone-home configuration -Switch to using new GSA AWS account for storage of phone home data -Use DNS-compliant bucket names, as per Amazon's best practices -Encrypt publicly-distributed version of credentials. Grant only PutObject permission, and only for the relevant buckets. -Store non-distributed credentials in private/GATKLogs/newAWSAccountCredentials for now -- need to integrate with existing python/shell scripts later to get the log downloading working with the new account	2013-08-21 14:31:46 -04:00
Ami Levy-Moonshine	0f5bb706ff	- update picard, sam, variants and tribble after fixing bug in BCF2Utils.makeDictionary as reported in ticket 52571227 - update call for VCFSimpleHeaderLine constructor in GATKVCFUtils	2013-08-21 12:06:42 -04:00
Eric Banks	e1174a582d	Merge pull request #379 from broadinstitute/mc_dpp_updates_part2 Including SplitByRG in the FullProcessingPipeline	2013-08-19 18:42:12 -07:00
Eric Banks	6663d48ffe	Merge pull request #381 from broadinstitute/mm_rev_picard_to_get_tribble_updates Adaptations to accomodate Tribble API changes.	2013-08-19 18:31:02 -07:00
Michael McCowan	c3a933ce84	Adaptations to accomodate Tribble API changes, comprising mostly of the following. * Refactoring implementations of readHeader(LineReader) -> readActualHeader(LineIterator), including nullary implementations where applicable. * Galvanizing fo generic types. * Test fixups, mostly to pass around LineIterators instead of LineReaders. * New rev of tribble, which incorporates a fix that addresses a problem with TribbleIndexedFeatureReader reading a header twice in some instances. * New rev of sam, to make AbstractIterator visible (was moved from picard -> sam in Tribble API refactor).	2013-08-19 15:52:47 -04:00
Mauricio Carneiro	e991307eb5	Including SplitByRG in the FullProcessingPipeline Why wasn't it there before, you ask ---------------------------------- Before I was running it separately (by hand), but now it's integrated in the FullProcessingPipeline. Integration was a pain because of Queue's limitation of only allowing 1 @Output file. This forced me to write the ugliest piece of code of my life, but it's working and it's processing the YRI from scratch using that right now. So I'm happy... somewhat. Other changes to the pipeline ----------------------------- * Add --filter_bases_not_stored to the IndelRealigner step -- sometimes BAM files have reads with no bases stored in the unmapped section (no idea why) but this disrupts the pipeline. * Change adaptor marking parameter to "dual indexed" instead of "pair-ended" -- for PCR Free data.	2013-08-18 00:51:32 -04:00
droazen	ee5de8510d	Merge pull request #380 from broadinstitute/gg_gatkdocs_arglabels More detailed labels for arguments in the gakdocs	2013-08-16 15:34:56 -07:00
Geraldine Van der Auwera	80ed186971	More detailed labels for arguments in the gakdocs (requested by David)	2013-08-16 14:25:53 -04:00
Geraldine Van der Auwera	9bb0aac7bf	Disabled the help system's printout of cmdline options when GATK errors out. Now the user has to explicitly ask for it using -h.	2013-08-16 13:09:52 -04:00
Geraldine Van der Auwera	3841635fcb	Changed 'depreciated' to the more correct 'deprecated'	2013-08-16 13:06:41 -04:00
Eric Banks	08be871309	Removing unused code in VariantsToTable: GQ is not an INFO field and is taken care of by -GF and not -F.	2013-08-16 01:57:24 -04:00
Eric Banks	1a5e4cc4e7	Merge pull request #375 from broadinstitute/rp_queue_jobreport_rscript Something changed with the ggtitle syntax in the latest version of ggplo...	2013-08-14 12:48:42 -07:00
Geraldine Van der Auwera	19a4bf9ff0	made AR an Advanced argument to discourage basic users from fiddling with it	2013-08-14 14:46:56 -04:00

... 5 6 7 8 9 ...

4696 Commits (bc3b3ac0ec4b4fd72a9e856470edaeb4c7566a06)