gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	341d1bf2dd	Fix the usage message for CatVariants to make it accurate. It just hit a user on our forum...	2014-02-19 20:42:08 -05:00
Valentin Ruano-Rubio	c167fb5fdf	Fixing GenotypesGVCF. Bug uncovered by some untrimmed alleles in the single sample pipeline output. Notice however does not fix the untrimmed alleles in general. Story: https://www.pivotaltracker.com/story/show/65481104 Changes: 1. Fixed the bug itself. 2. Fixed non-working tests (sliently skipped due to exception in dataProvider).	2014-02-19 14:20:39 -05:00
Ryan Poplin	43c20264b0	Initial commit of the random forest classifier.	2014-02-17 13:07:27 -05:00
Khalid Shakir	a505db79f5	Fixed build bug in ./ant-bridge.sh unittest -Dsingle=..., due to external-example. pipeline.run property no longer required to be passed by test executor.	2014-02-15 13:52:20 +08:00
droazen	1e82f117ad	Merge pull request #518 from broadinstitute/ks_skashin_gatkdocs_arguments Ks skashin gatkdocs arguments	2014-02-14 13:57:19 -05:00
Eric Banks	f6022a944b	Merge pull request #513 from broadinstitute/eb_clean_up_genotype_posteriors Various small fixes for CalculateGenotypePosteriors based on feedback fr...	2014-02-14 13:50:46 -05:00
Eric Banks	3724d4e5f3	Various small fixes for CalculateGenotypePosteriors based on feedback from guys in Ben Neale's group. Note that this tool is still a work in progress and very experimental, so isn't 100% stable. Most of the features are untested (both by people and by unit/integration tests) because Chris Hartl implemented it right before he left, and we're going to need to add tests at some point soon. I added a first integration test in this commit, but it's just a start. The fixes include: 1. Stop having the genotyping code strip out AD values. It doesn't make sense that it should do this so I don't know why it was doing that at all. Updated GenotypeGVCFs so that it doesn't need to manually recover them anymore. This also helps CalculateGenotypePosteriors which was losing the AD values. Updated code in LeftAlignAndTrimVariants to strip out PLs and AD, since it wasn't doing that before. Updated the integration test for that walker to include such data. 2. Chris was calling Math.pow directly on the normalized posteriors which isn't safe. Instead, the normalization routine itself can revert back to log scale in a safe manner so let's use it. Also, renamed the variable to posteriorProbabilities (and not likelihoods). 3. Have CGP update the AC/AF/AN counts after fixing GTs.	2014-02-14 13:48:14 -05:00
kshakir	8b136d53b9	Merge pull request #524 from broadinstitute/ks_symlink_bin_jar Create symlinks target/GenomeAnalysisTK.jar and target/Queue.jar	2014-02-15 02:32:59 +08:00
Khalid Shakir	bc9ac93b6c	Adding the external example to the build.	2014-02-15 01:26:07 +08:00
Khalid Shakir	2e99a6ecf8	Create symlinks target/GenomeAnalysisTK.jar and target/Queue.jar during package phase.	2014-02-15 01:12:32 +08:00
Nicholas Clarke	7ae19953f5	Squashed commit of the following: commit 5e73b94eed3d1fc75c88863c2cf07d5972eb348b Merge: e12593a d04a585 Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Fri Feb 14 09:25:22 2014 +0000 Merge pull request #1 from broadinstitute/checkpoint SimpleTimer passes tests, with formatting commit d04a58533f1bf5e39b0b43018c9db3302943d985 Author: kshakir <github@kshakir.org> Date: Fri Feb 14 14:46:01 2014 +0800 SimpleTimer passes tests, with formatting Fixed getNanoOffset() to offset nano to nano, instead of nano to seconds. Updated warning message with comma separated numbers, and exact values of offsets. commit e12593ae66a5e6f0819316f2a580dbc7ae5896ad Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Wed Feb 12 13:27:07 2014 +0000 Remove instance of 'Timer'. commit 47a73e0b123d4257b57cfc926a5bdd75d709fcf9 Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Wed Feb 12 12:19:00 2014 +0000 Revert a couple of changes that survived somehow. - CheckpointableTimer,Timer -> SimpleTimer commit d86d9888ae93400514a8119dc2024e0a101f7170 Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Mon Jan 20 14:13:09 2014 +0000 Revised commits following comments. - All utility merged into `SimpleTimer`. - All tests merged into `SimpleTimerUnitTest`. - Behaviour of `getElapsedTime` should now be consistent with `stop`. - Use 'TimeUnit' class for all unit conversions. - A bit more tidying. commit 354ee49b7fc880e944ff9df4343a86e9a5d477c7 Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Fri Jan 17 17:04:39 2014 +0000 Add a new CheckpointableTimerUnitTest. Revert SimpleTimerUnitTest to the version before any changes were made. commit 2ad1b6c87c158399ededd706525c776372bbaf6e Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Tue Jan 14 16:11:18 2014 +0000 Add test specifically checking behaviour under checkpoint/restart. Slight alteration to the checkpointable timer based on observations during the testing - it seems that there's a fair amount of drift between the sources anyway, so each time we stop we resynchronise the offset. Hopefully this should avoid gradual drift building up and presenting as checkpoint/restart drift. commit 1c98881594dc51e4e2365ac95b31d410326d8b53 Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Tue Jan 14 14:11:31 2014 +0000 Should use consistent time units commit 6f70d42d660b31eee4c2e9d918e74c4129f46036 Author: Nicholas Clarke <nc6@sanger.ac.uk> Date: Tue Jan 14 14:01:10 2014 +0000 Add a new timer supporting checkpoint mechanisms. The issue with this is that the current timer is locked to JVM nanoTime. This can be reset after a checkpoint/restart and result in negative elapsed times, which causes an error. This patch addresses the issue in two ways: - Moves the check on timer information in GenomeAnalysisEngine.java to only occur if a time limit has been set. - Create a new timer (CheckpointableTimer) which keeps track of the relation between system and nano time. If this changes drastically, then the assumption is that there has been a JVM restart owing to checkpoint/restart. Any time straddling a checkpoint/restart event will not be counted towards total running time. Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2014-02-14 21:45:47 +08:00
Laura Gauthier	29bb3d4dc1	Check for empty BAM lists in command line input	2014-02-14 08:09:47 -05:00
Khalid Shakir	225ee4880b	Using new parameters via skashin to run gatkdocs in the maven conventional subdirectory. Updated path for output gatkdocs in nightly build script. Removed patch in plugin manager that contained a workaround for gatkdocs running in the top level directory.	2014-02-14 15:57:21 +08:00
skashin	1b3ac95798	Added the following arguments: -settings-dir -destination-dir -forum-key-path Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>	2014-02-14 14:28:35 +08:00
Eric Banks	7095a60c8e	Merge pull request #516 from broadinstitute/dr_reenable_tests_failing_due_to_java_update Re-enable tests that were failing post-maven due to changes in Java's Math.pow() implementation	2014-02-13 21:05:18 -05:00
David Roazen	4b4b93ad1b	Re-enable tests that were failing post-maven due to changes in Java's Math.pow() implementation After extensive detective work, Joel determined that these tests were failing due to changes in the implementation of Math.pow() in newer versions of Java 1.7. All GSA members should ensure that they're using a JDK that is at least as current as the one in the Java-1.7 dotkit on the Broad servers (build 1.7.0_51-b13).	2014-02-12 16:08:16 -05:00
Eric Banks	5bde7fbf37	Merge pull request #511 from broadinstitute/dr_enable_exclusions_in_maven_package_tests Exclude all transitive dependencies in maven package-tests	2014-02-12 15:38:39 -05:00
Joel Thibault	ef87b051b0	Rev Picard to 1.107.1683 (4 jars)	2014-02-12 15:25:50 -05:00
David Roazen	6f12c8b0dc	Exclude all transitive dependencies in maven package-tests This change should allow us to test that the GATK jar has been correctly packaged at release time, by ensuring that only the packaged jar + a few test-related dependencies are on the classpath when tests are run. Note that we still need to actually test that this works as intended before we can make this live in the Bamboo release plan.	2014-02-12 14:59:05 -05:00
David Roazen	95e1402d21	Add ability to run *KnowledgeBaseTests to maven Run with: mvn verify -Dsting.knowledgebasetests.skipped=false	2014-02-11 14:08:24 -05:00
Khalid Shakir	1666bb7e3a	Patched PluginManager to ignore null classes, that will allow gatkdocs to build successfully when running from the source root directory, due to its hardcoded paths.	2014-02-12 00:48:58 +08:00
Karthik Gururaj	316501b32e	Fixed denominator in profiling	2014-02-10 10:11:03 -08:00
Karthik Gururaj	d081c19178	Minor: added support in C++ sandbox to choose implementation and check from command line	2014-02-09 18:05:35 -08:00
Ryan Poplin	b81494b704	Merge pull request #499 from broadinstitute/eb_fix_ad_updates Fixed bug in generating AD values when new alleles are present for genot...	2014-02-09 17:55:00 -05:00
Eric Banks	abb67cfa5e	Fixed bug in generating AD values when new alleles are present for genotpying GVCFs. This was a dumb mistake that wasn't well tested (but is now).	2014-02-09 15:15:19 -05:00
Khalid Shakir	12bb6fd361	Removed use of picard private. Updated picard-maven script to tag locally modified builds with -SNAPSHOT. Removed old picard jars.	2014-02-09 17:08:52 +08:00
Khalid Shakir	4e0f7521f2	Made scala.maxmemory an argument, and defaulted it to 1g.	2014-02-09 09:24:44 +08:00
Karthik Gururaj	a03d83579b	Matrices in baseline C++ (no vector) implementation of PairHMM are now allocated on heap using "new". Stack allocation led to program crashes for large matrix sizes.	2014-02-07 23:22:05 -08:00
Karthik Gururaj	20a46e4098	Check only for SSE 4.1 (rather than SSE 4.2) when trying to use the SSE implementation of PairHMM	2014-02-07 15:19:55 -08:00
Eric Banks	d689f61005	Fixed up some of the genotype-level annotations being propogated in the single sample HC pipeline. 1. AD values now propogate up (they weren't before). 2. MIN_DP gets transferred over to DP and removed. 3. SB gets removed after FS is calculated. Also, added a bunch of new integration tests for GenotypeGVCFs.	2014-02-07 12:47:54 -05:00
Eric Banks	db68d3fa10	Fixing failing unit tests	2014-02-07 12:24:14 -05:00
Eric Banks	2648219c42	Implementation of a hierarchical merger for gVCFs, called CombineGVCFs. This tool will take any number of gVCFs and create a merged gVCF (as opposed to GenotypeGVCFs which produces a standard VCF). Added unit/integration tests and fixed up GATK docs.	2014-02-07 08:49:18 -05:00
mghodrat	7815c30df8	Adding comments to pairhmm-template-kernel	2014-02-06 20:13:06 -08:00
Karthik Gururaj	b729fc0136	1. Split main JNI function into initializeTestcases, compute_testcases and releaseReads 2. FTZ enabled 3. Cleaner profiling code	2014-02-06 14:35:32 -08:00
Karthik Gururaj	166f91d698	Merge branch 'test_branch' Conflicts: public/c++/VectorPairHMM/LoadTimeInitializer.cc public/c++/VectorPairHMM/pairhmm-1-base.cc public/c++/VectorPairHMM/utils.cc public/c++/VectorPairHMM/utils.h Merged test_branch with intel_pairhmm	2014-02-06 11:18:18 -08:00
Karthik Gururaj	fab6f57e97	1. Enabled FTZ in LoadTimeInitializer.cc 2. Added Sandbox.java for testing 3. Moved compute to utils.cc (inside library) 4. Added flag for disabling FTZ in Makefile	2014-02-06 11:01:33 -08:00
Karthik Gururaj	78642944c0	1. Moved break statement in utils.cc to correct position 2. Tested sandbox with regions 3. Lots of profiling code from previous commit exists	2014-02-06 09:32:56 -08:00
Khalid Shakir	b21c35482e	Packages link private/testdata, so that mvn test -Dsting.serialunittests.skipped=false works.	2014-02-06 08:25:38 -05:00
Khalid Shakir	3848159086	Added a set of serial tests to gatk/queue packages, which runs all tests under their package in one TestNG execution. New properties to disable regenerating example resources artifact when each parallel test runs under packagetest. Moved collection of packagetest parameters from shell scripts into maven profiles. Fixed necessity of test-utils jar by removing incorrect dependenciesToScan element during packagetests. When building picard libraries, run clean first. Fixed tools jar dependency in picard pom. Integration tests properly use the ant-bridge.sh test.debug.port variable, like unit tests.	2014-02-06 08:25:38 -05:00
Valentin Ruano Rubio	988e3b4890	Merge pull request #487 from broadinstitute/vrr_reference_model_with_trimming Get gVCF to work without --dontTrimActiveRegions	2014-02-05 22:52:17 -05:00
Valentin Ruano-Rubio	98ffcf6833	Get gVCF to work without --dontTrimActiveRegions Story: https://www.pivotaltracker.com/story/show/65048706 https://www.pivotaltracker.com/story/show/65116908 Changes: ActiveRegionTrimmer in now an argument collection and it returns not only the trimmed down active region but also the non-variant containing flanking regions HaplotypeCaller code has been simplified significantly pushing some functionality two other classes like ActiveRegion and AssemblyResultSet. Fixed a problem with the way the trimming was done causing some gVCF non-variant records no have conservative 0,0,0 PLs	2014-02-05 22:50:45 -05:00
Karthik Gururaj	acda6ca27b	1. Whew, finally debugged the source of performance issues with PairHMM JNI. See copied text from email below. 2. This commit contains all the code used in profiling, detecting FP exceptions, dumping intermediate results. All flagged off using ifdefs, but it's there. --------------Text from email As we discussed before, it's the denormal numbers that are causing the slowdown - the core executes some microcode uops (called FP assists) when denormal numbers are detected for FP operations (even un-vectorized code). The C++ compiler by default enables flush to zero (FTZ) - when set, the hardware simply converts denormal numbers to 0. The Java binary (executable provided by Oracle, not the native library) seems to be compiled without FTZ (sensible choice, they want to be conservative). Hence, the JNI invocation sees a large slowdown. Disabling FTZ in C++ slows down the C++ sandbox performance to the JNI version (fortunately, the reverse also holds :)). Not sure how to show the overhead for these FP assists easily - measured a couple of counters. FP_ASSISTS:ANY - shows number of uops executed as part of the FP assists. When FTZ is enabled, this is 0 (both C++ and JNI), when FTZ is disabled this value is around 203540557 (both C++ and JNI) IDQ:MS_UOPS_CYCLES - shows the number of cycles the decoder was issuing uops when the microcode sequencing engine was busy. When FTZ is enabled, this is around 1.77M cycles (both C++ and JNI), when FTZ is disabled this value is around 4.31B cycles (both C++ and JNI). This number is still small with respect to total cycles (~40B), but it only reflects the cycles in the decode stage. The total overhead of the microcode assist ops could be larger. As suggested by Mustafa, I compared intermediate values (matrices M,X,Y) and final output of compute_full_prob. The values produced by C++ and Java are identical to the last bit (as long as both use FTZ or no-FTZ). Comparing the outputs of compute_full_prob for the cases no-FTZ and FTZ, there are differences for very small values (denormal numbers). Examples: Diff values 1.952970E-33 1.952967E-33 Diff values 1.135071E-32 1.135070E-32 Diff values 1.135071E-32 1.135070E-32 Diff values 1.135071E-32 1.135070E-32 For this test case (low coverage NA12878), all these values would be recomputed using the double precision version. Enabling FTZ should be fine. -------------------End text from email	2014-02-05 17:09:57 -08:00
Ryan Poplin	693bfac341	Bug fix for missing annotations in CombineReferenceCalculationVariants. They were being dropped in the handoff between engines in a couple of places. -- Updated single sample pipeline test data using Valentin's files and re-enabled CRCV tests	2014-02-05 12:58:48 -05:00
Eric Banks	740b33acbb	We were never validating the sequence dictionary of tabix indexed VCFs for some reason. Fixed. These changes happened in Tribble, but Joel clobbered them with his commit. We can now change the logging priority on failures to validate the sequence dictionary to WARN. Thanks to Tim F for indirectly pointing this out.	2014-02-05 10:12:38 -05:00
Eric Banks	9cac24d1e6	Moving logging status of VCF indexing to DEBUG instead of INFO, otherwise it's painful when reading in lots of files	2014-02-05 10:12:37 -05:00
Eric Banks	91bdf069d3	Some updates to CRCV. 1. Throw a user error when the input data for a given genotype does not contain PLs. 2. Add VCF header line for --dbsnp input 3. Need to check that the UG result is not null 4. Don't error out at positions with no gVCFs (which is possible when using a dbSNP rod)	2014-02-05 10:12:37 -05:00
Joel Thibault	7923e786e9	Rev Picard (public) to 1.107.1676 - Rename snappy to snappy-java - Add maven-metadata-local.xml to .gitignore	2014-02-04 22:04:28 -05:00
Joel Thibault	0025fe190d	Exclude sam's older TestNG	2014-02-04 22:04:27 -05:00
Karthik Gururaj	24f8aef344	Contains profiling, exception tracking, PAPI code Contains Sandbox Java	2014-02-04 16:27:29 -08:00
David Roazen	76086f30b7	Temporarily disable tests that started failing post-maven Joel is working on these failures in a separate branch. Since maven (currently! we're working on this..) won't run the whole test suite to completion if there's a failure early on, we need to temporarily disable these tests in order to allow group members to run tests on their branches again.	2014-02-04 15:31:24 -05:00
David Roazen	3b2f07990d	Re-break the MWUnitTest for Joel to debug	2014-02-04 15:19:09 -05:00
David Roazen	c9032f0b5c	Fix failing unit tests	2014-02-04 03:05:30 -05:00
Khalid Shakir	a4289711e2	Distinct failsafe summary reports, just like invoker report directories.	2014-02-03 13:50:47 -05:00
Khalid Shakir	857e6e0d6f	Bumped version to 2.8-SNAPSHOT, using new update_pom_versions.sh script.	2014-02-03 13:50:46 -05:00
Khalid Shakir	9ca3004fc3	Setting the test-utils' type to test-jar, such that the multi-module build uses testClasses instead of classes as a directory dependency.	2014-02-03 13:50:46 -05:00
Khalid Shakir	de13f41fc3	One step closer to a proper test-utils artifact. Using the maven-jar-plugin to create a test classifer, excluding actual tests, until we can properly separate the classes into separate artifacts/modules.	2014-02-03 13:50:46 -05:00
Khalid Shakir	25aee7164e	Fixed missing "mvn" command execution in ant-bridge. Added pom.xml workarounds for duplicate classpath error, due to gatk-framework dependency containing required BaseTest, and jarred UnitTest/IntegrationTest classes that also exist as files under target/test-classes.	2014-02-03 13:50:46 -05:00
Khalid Shakir	caa76cdac4	Added maven pom.xmls for various artifacts.	2014-02-03 13:50:46 -05:00
Khalid Shakir	d1a689af33	Added new utility files used by maven build, including the ant-bridge script.	2014-02-03 13:50:46 -05:00
Khalid Shakir	88150e0166	Switched commited dependency repository from ivy to maven.	2014-02-03 13:50:46 -05:00
Khalid Shakir	1e25a758f5	Moved files to maven directories. Here are the git moved directories in case other files need to be moved during a merge: git-mv private/java/src/ private/gatk-private/src/main/java/ git-mv private/R/scripts/ private/gatk-private/src/main/resources/ git-mv private/java/test/ private/gatk-private/src/test/java/ git-mv private/testdata/ private/gatk-private/src/test/resources/ git-mv private/scala/qscript/ private/queue-private/src/main/qscripts/ git-mv private/scala/src/ private/queue-private/src/main/scala/ git-mv protected/java/src/ protected/gatk-protected/src/main/java/ git-mv protected/java/test/ protected/gatk-protected/src/test/java/ git-mv public/java/src/ public/gatk-framework/src/main/java/ git-mv public/java/test/ public/gatk-framework/src/test/java/ git-mv public/testdata/ public/gatk-framework/src/test/resources/ git-mv public/scala/qscript/ public/queue-framework/src/main/qscripts/ git-mv public/scala/src/ public/queue-framework/src/main/scala/ git-mv public/scala/test/ public/queue-framework/src/test/scala/	2014-02-03 13:50:44 -05:00
Khalid Shakir	faaef236ea	Moved gsalib, R and other resources, Queue GATK extensions generator, Queue version java files.	2014-02-03 13:49:21 -05:00
Khalid Shakir	eb52dc6a9b	Moved build.xml, ivy.xml, ivysettings.xml, ivy properties, public/packages/*.xml into private/archive/ant	2014-02-03 13:49:20 -05:00
Karthik Gururaj	6d4d776633	Includes code for all debug code for obtaining profiling info	2014-01-30 12:08:06 -08:00
Valentin Ruano-Rubio	89c4e57478	gVCF <NON_REF> in all vcf lines including variant ones when –ERC gVCF is requested. Changes: ------- <NON_REF> likelihood in variant sites is calculated as the maximum possible likelihood for an unseen alternative allele: for reach read is calculated as the second best likelihood amongst the reported alleles. When –ERC gVCF, stand_conf_emit and stand_conf_call are forcefully set to 0. Also dontGenotype is set to false for consistency sake. Integration test MD5 have been changed accordingly. Additional fix: -------------- Specially after adding the <NON_REF> allele, but also happened without that, QUAL values tend to go to 0 (very large integer number in log 10) due to underflow when combining GLs (GenotypingEngine.combineGLs). To fix that combineGLs has been substituted by combineGLsPrecise that uses the log-sum-exp trick. In just a few cases this change results in genotype changes in integration tests but after double-checking using unit-test and difference between combineGLs and combineGLsPrecise in the affected integration test, the previous GT calls were either border-line cases and or due to the underflow.	2014-01-30 11:23:33 -05:00
Karthik Gururaj	5c7427e48c	Temporary commit containing debug profiling code - commented out	2014-01-29 12:10:29 -08:00
Karthik Gururaj	0c63d6264f	1. Added synchronization block around loadLibrary in VectorLoglessPairHMM 2. Edited Makefile to use static libraries where possible	2014-01-27 15:34:58 -08:00
Karthik Gururaj	a15137a667	Modified run.sh	2014-01-27 14:56:46 -08:00
Karthik Gururaj	2c0d70c863	Moved vector JNI code to public/c++/VectorPairHMM	2014-01-27 14:52:59 -08:00
Karthik Gururaj	85a748860e	1. Added more profiling code 2. Modified JNI_README	2014-01-27 14:32:44 -08:00
Valentin Ruano-Rubio	748d2fdf92	Added Integration test to verify the bugs are not there anymore as reported in pivotracker	2014-01-26 23:29:31 -05:00
Karthik Gururaj	018e9e2c5f	1. Cleaned up code 2. Split into DebugJNILoglessPairHMM and VectorLoglessPairHMM with base class JNILoglessPairHMM. DebugJNILoglessPairHMM can, in principle, invoke any other child class of JNILoglessPairHMM. 3. Added more profiling code for Java parts of LoglessPairHMM	2014-01-26 19:18:12 -08:00
Valentin Ruano-Rubio	9e7bf75e89	Fix for the PairHMM transition probability miscalculation. Problem: matchToMatch transition calculation was wrong resulting in transition probabilites coming out of the Match state that added more than 1. Reports: https://www.pivotaltracker.com/s/projects/793457/stories/62471780 https://www.pivotaltracker.com/s/projects/793457/stories/61082450 Changes: The transition matrix update code has been moved to a common place in PairHMMModel to dry out its multiple copies. MatchToMatch transtion calculation has been fixed and implemented in PairHMMModel. Affected integration test md5 have been updated, there were no differences in GT fields and example differences always implied small changes in likelihoods that is what is expected.	2014-01-26 16:30:36 -05:00
Karthik Gururaj	81bdfbd00d	Temporary commit before moving to new native library	2014-01-24 16:29:35 -08:00
Karthik Gururaj	733a84e4f9	Added support to transfer haplotypes once per region to the JNI Re-use transferred haplotypes (stored in GlobalRef) across calls to computeLikelihoods	2014-01-22 10:52:41 -08:00
Karthik Gururaj	88c08e78e7	1. Inserted #define in sandbox pairhmm-template-main.cc 2. Wrapped _mm_empty() with ifdef SIMD_TYPE_SSE 3. OpenMP disabled 4. Added code for initializing PairHMM's data inside initializePairHMM - not used yet	2014-01-21 09:57:14 -08:00
Karthik Gururaj	7180c392af	1. Integrated Mohammad's SSE4.2 code, Mustafa's bug fix and code to fix the SSE compilation warning. 2. Added code to dynamically select between AVX, SSE4.2 and normal C++ (in that order) 3. Created multiple files to compile with different compilation flags: avx_function_prototypes.cc is compiled with -xAVX while sse_function_instantiations.cc is compiled with -xSSE4.2 flag. 4. Added jniClose() and support in Java (HaplotypeCaller, PairHMMLikelihoodCalculationEngine) to call this function at the end of the program. 5. Removed debug code, kept assertions and profiling in C++ 6. Disabled OpenMP for now.	2014-01-20 08:03:42 -08:00
Yossi Farjoun	c79e8ca53e	Added an info log containing the SAM/BAM files that were eventually found from the commandline (useful for when there are files hiding inside bam.lists which may or may not have been constructed correctly...) Added a @hidden option controling the appearance of the full BamList in the log	2014-01-17 11:25:21 -05:00
Karthik Gururaj	f1c772ceea	Same log message as before - forgot -a option 1. Moved computeLikelihoods from PairHMM to native implementation 2. Disabled debug - debug code still left (hopefully, not part of bytecode) 3. Added directory PairHMM_JNI in the root which holds the C++ library that contains the PairHMM AVX implementation. See PairHMM_JNI/JNI_README first	2014-01-16 21:40:04 -08:00
Eric Banks	de56134579	Fixed up and refactored what seems to be a useful private tool to create simulated reads around a VCF. It didn't completely work before (it was hard-coded for a particular long-lost data set) but it should work now. Since I thought that it might prove useful to others, I moved it to protected and added integration tests. GERALDINE: NEW TOOL ALERT!	2014-01-15 13:49:31 -05:00
Geraldine Van der Auwera	edf5880022	Updated SAMPileup codec and pileup-related docs Problem: the codec was written to take in consensus pileups produced with pileup -c option (which consists of 10 or 13 fields per line depending on the variant type) but errored out on the basic pileup format (which only has 6 fields per line). This was inconsistent and confusing to users. Solution: I added a switch in the parsing to recognize and handle both cases more appropriately, and updated related docs. While I was at it I also improved error messages in CheckPileup, which now emits User Error: Bad Input exceptions when reporting mismatches. Which may not be the best thing to do (ultimately they're not really errors, they're just reporting unwelcome results) but it beats emitting Runtime Exceptions. Tested by CheckPileupIntegrationTest which tests both format cases.	2014-01-14 09:14:16 -05:00
Eric Banks	16ecc53749	Merge pull request #469 from broadinstitute/gg_gatkdoc_fixes Assorted fixes and improvements to gatkdocs	2014-01-14 05:56:07 -08:00
droazen	347fab4717	Merge pull request #471 from broadinstitute/eb_output_log_info_for_tim Adding more meta information about the user to the GATK logging output, per Tim F's request.	2014-01-13 17:48:40 -08:00
Geraldine Van der Auwera	bdb3954eb3	removed maxRuntime minValue	2014-01-13 20:45:43 -05:00
Geraldine Van der Auwera	8fcad6680b	Assorted fixes and improvements to gatkdocs -Added docs for ERC mode in HC -Move RecalibrationPerformance walker since to private since it is experimental and unsupported -Updated VR docs and restored percentBad/numBad (but @Hidden) to enable deprecation alert if users try to use them -Improved error msg for conflict between per-interval aggregation and -nt -Minor clean up in exception docs -Added Toy Walkers category for devs and dev supercat (to build out docs for developers) -Added more detailed info to GenotypeConcordance doc based on Chris forum post -Added system to include min/max argument values in gatkdocs (build gatkdocs with 'ant gatkdocs' to test it, see engine and DoC args for in situ examples) -Added tentative min/max argument annotations to DepthOfCoverage and CommandLineGATK arguments (and improved docs while at it) -Added gotoDev annotation to GATKDocumentedFeature to track who is the go-to person in GSA for questions & issues about specific walkers/tools (now discreetly indicated in each gatkdoc)	2014-01-13 17:46:22 -05:00
Eric Banks	851ec67bdc	Adding more meta information about the user to the GATK logging output, per Tim F's request.	2014-01-13 14:36:02 -05:00
droazen	7cd304fb41	Merge pull request #470 from broadinstitute/mf_new_RBP Mf new rbp	2014-01-13 08:46:27 -08:00
Eric Banks	0323caefc8	Added some bug fixes to the gVCF merging code after finally getting some real data to play with. Still under construction, awaiting more test data from Valentin.	2014-01-08 08:34:35 -05:00
Eric Banks	f172c349f6	Adding the functionality to enable users to input a file of VCFs for -V. To do this I have added a RodBindingCollection which can represent either a VCF or a file of VCFs. Note that e.g. SelectVariants allows a list of RodBindingCollections so that one can intermix VCFs and VCF lists. For VariantContext tags with a list, by default the tags for the -V argument are applied unless overridden by the individual line. In other words, any given line can have either one token (the file path) or two tokens (the new tags and the file path). For example: foo.vcf VCF,name=bar bar.vcf Note that a VCF list file name must end with '.list'. Added this functionality to CombineVariants, CombineReferenceCalculationVariants, and VariantRecalibrator.	2014-01-08 00:45:00 -05:00
Menachem Fromer	d1275651ae	Merge remote-tracking branch 'origin/master' into mf_new_RBP	2014-01-03 01:13:40 -05:00
Ami Levy-Moonshine	6da53aea09	Write a new tool for spliting reads that have N cigar string. For example, this tool can be used for processing bowtie RNA-seq data. Each read with k N-cigar elemments is plit to k+1 reads. The split is done by hard clipping the bases rest of the bases. In order to do it, few changes were introduced to some other clipping methods: - make a segnificant change in ClippingOp.hardClip() that prevent the spliting of read with cigar: 1M2I1N1M3I. - change getReadCoordinateForReferenceCoordinate in ReadUtil to recognize Ns create unitTests for that walker: - change ReadClipperTestUtils to be more general in order to use its code and avoid code duplication - move some useful methods from ReadClipperTestUtils to CigarUtils create integration test for that class small change in a comment in FullProcessingPipeline last commit: Address review comments: - move to protected under walkers/rnaseq - change the read splitting methods to be more readable and more efficiant - change (minor changes) some methods in ReadClipper to allow the changes in split reads - add (minor change) one method to CigarUtils to allow the changes in split reads - change ReadUtils.getReadCoordinateForReferenceCoordinate to include possible N in the cigar - address the rest of the review comments (minor changes) - fix ReadUtilsUnitTest.testReadWithNs acoording to the defult behaviour of getReadCoordinateForReferenceCoordinate (in case of refernce index that fall into deletion, return the read index of the base before the deletion). - add another test to ReadUtilsUnitTest.testReadWithNs - Allow the user to print the split positions (not working proparly currently)	2014-01-01 22:21:36 -05:00
Mauricio Carneiro	d1febb89c8	Better documentation for ReadClippingStats walker * add overall walker GATKDocs * add explanation for skip parameter and make it advanced * reverse the logic on exculding unmapped reads for clarity * fix read length calculation to no longer include indels ps: I am not sure how useful this walker is (I didn't write it) but the skip logic is poor and calculates the entire statistic for the reads it is eventually going to skip. This would be an easy fix, but only worth our time if people actually use this.	2014-01-01 14:26:26 -05:00
Eric Banks	f82a7c3f4c	Updating variant jar. The update contains: 1. documentation changes for VariantContext and Allele (which used to discuss the now obsolete null allele) 2. better error messages for VCFs containing complex rearrangements with breakends 3. instead of failing badly on format field lists with '.'s, just ignore them Also, there is a trivial change to use a more efficient method to remove a bunch of attributes from a VC. Delivers PT#s 59675378, 59496612, and 60524016.	2013-12-31 22:48:29 -05:00
Eric Banks	5a1564d1f2	Merge pull request #456 from broadinstitute/eb_unify_hc_combination_steps Created a new walker to do the full combination of N gVCFs from the HC single-sample ref calc pipeline.	2013-12-31 18:57:27 -08:00
Eric Banks	83e09b1f64	Created a new walker to do the full combination of N gVCFs from the HC single-sample ref calc pipeline. Basically, it does 3 things (as opposed to having to call into 3 separate walkers): 1. merge the records at any given position into a single one with all alleles and appropriate PLs 2. re-genotype the record using the exact AF calculation model 3. re-annotate the record using the VariantAnnotatorEngine In the course of this work it became clear that we couldn't just use the simpleMerge() method used by CombineVariants; combining HC-based gVCFs is really a complicated process. So I added a new utility method to handle this merging and pulled any related code out of CombineVariants. I tried to clean up a lot of that code, but ultimately that's out of the scope of this project. Added unit tests for correctness testing. Integration tests cannot be used yet because the HC doesn't output correct gVCFs.	2013-12-31 12:07:56 -05:00
Menachem Fromer	48ef7a1a2f	Merge remote-tracking branch 'origin/master' into mf_new_RBP	2013-12-19 10:42:20 -05:00
David Roazen	4a79831adc	Add ability to specify min/max required/recommended values for numeric arguments in the @Argument annotation -You can now add "minValue", "maxValue", "minRecommendedValue", and "maxRecommendedValue" attributes to @Argument annotations for command-line arguments -"minValue" and "maxValue" specify hard limits that generate an exception if violated -"minRecommendedValue" and "maxRecommendedValue" specify soft limits that generate a warning if violated -Works only for numeric arguments (int, double, etc.) with @Argument annotations -Only considers values actually specified by the user on the command line, not default values assigned in the code As requested by Geraldine	2013-12-18 18:09:08 -05:00
Eric Banks	400e7c1404	Fixed bug in the filtering of lifted over variants where a deletion at the end of a contig could cause it to error out. Added a unit test.	2013-12-11 14:07:18 -05:00
Eric Banks	418fbdfbab	Added HC trio calls and NA12878 KB snapshot to resource bundle. Also, don't touch the current link until the resources are finished being produced.	2013-12-07 22:08:34 -05:00
David Roazen	932cd3ada7	Fix 3rd-party library dependency issues in the HC/PairHMM tests In general, test classes cannot use 3rd-party libraries that are not also dependencies of the GATK proper without causing problems when, at release time, we test that the GATK jar has been packaged correctly with all required dependencies. If a test class needs to use a 3rd-party library that is not a GATK dependency, write wrapper methods in the GATK utils/* classes, and invoke those wrapper methods from the test class.	2013-12-06 13:16:55 -05:00

1 2 3 4 5 ...

4258 Commits (4e74e77e746e79fee8eaabb2a5f9f8a62c3a5700)