gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Karthik Gururaj	acda6ca27b	1. Whew, finally debugged the source of performance issues with PairHMM JNI. See copied text from email below. 2. This commit contains all the code used in profiling, detecting FP exceptions, dumping intermediate results. All flagged off using ifdefs, but it's there. --------------Text from email As we discussed before, it's the denormal numbers that are causing the slowdown - the core executes some microcode uops (called FP assists) when denormal numbers are detected for FP operations (even un-vectorized code). The C++ compiler by default enables flush to zero (FTZ) - when set, the hardware simply converts denormal numbers to 0. The Java binary (executable provided by Oracle, not the native library) seems to be compiled without FTZ (sensible choice, they want to be conservative). Hence, the JNI invocation sees a large slowdown. Disabling FTZ in C++ slows down the C++ sandbox performance to the JNI version (fortunately, the reverse also holds :)). Not sure how to show the overhead for these FP assists easily - measured a couple of counters. FP_ASSISTS:ANY - shows number of uops executed as part of the FP assists. When FTZ is enabled, this is 0 (both C++ and JNI), when FTZ is disabled this value is around 203540557 (both C++ and JNI) IDQ:MS_UOPS_CYCLES - shows the number of cycles the decoder was issuing uops when the microcode sequencing engine was busy. When FTZ is enabled, this is around 1.77M cycles (both C++ and JNI), when FTZ is disabled this value is around 4.31B cycles (both C++ and JNI). This number is still small with respect to total cycles (~40B), but it only reflects the cycles in the decode stage. The total overhead of the microcode assist ops could be larger. As suggested by Mustafa, I compared intermediate values (matrices M,X,Y) and final output of compute_full_prob. The values produced by C++ and Java are identical to the last bit (as long as both use FTZ or no-FTZ). Comparing the outputs of compute_full_prob for the cases no-FTZ and FTZ, there are differences for very small values (denormal numbers). Examples: Diff values 1.952970E-33 1.952967E-33 Diff values 1.135071E-32 1.135070E-32 Diff values 1.135071E-32 1.135070E-32 Diff values 1.135071E-32 1.135070E-32 For this test case (low coverage NA12878), all these values would be recomputed using the double precision version. Enabling FTZ should be fine. -------------------End text from email	2014-02-05 17:09:57 -08:00
Ryan Poplin	6a7a197362	Merge pull request #486 from broadinstitute/rp_fix_missing_annotations_CombineReferenceCalculationVariants Bug fix for missing annotations in CombineReferenceCalculationVariants. ...	2014-02-05 14:22:59 -05:00
Ryan Poplin	693bfac341	Bug fix for missing annotations in CombineReferenceCalculationVariants. They were being dropped in the handoff between engines in a couple of places. -- Updated single sample pipeline test data using Valentin's files and re-enabled CRCV tests	2014-02-05 12:58:48 -05:00
Eric Banks	8aa8acf81d	Merge pull request #485 from broadinstitute/eb_more_combine_rc_variants_iterations Eb more combine rc variants iterations	2014-02-05 11:32:30 -05:00
Eric Banks	740b33acbb	We were never validating the sequence dictionary of tabix indexed VCFs for some reason. Fixed. These changes happened in Tribble, but Joel clobbered them with his commit. We can now change the logging priority on failures to validate the sequence dictionary to WARN. Thanks to Tim F for indirectly pointing this out.	2014-02-05 10:12:38 -05:00
Eric Banks	9cac24d1e6	Moving logging status of VCF indexing to DEBUG instead of INFO, otherwise it's painful when reading in lots of files	2014-02-05 10:12:37 -05:00
Eric Banks	91bdf069d3	Some updates to CRCV. 1. Throw a user error when the input data for a given genotype does not contain PLs. 2. Add VCF header line for --dbsnp input 3. Need to check that the UG result is not null 4. Don't error out at positions with no gVCFs (which is possible when using a dbSNP rod)	2014-02-05 10:12:37 -05:00
droazen	22bcd10372	Merge pull request #484 from broadinstitute/jt_select_variants_nt_maven Fix for the SelectVariants -nt race condition corruption of the AD and PL fields	2014-02-05 08:15:02 -05:00
Joel Thibault	7923e786e9	Rev Picard (public) to 1.107.1676 - Rename snappy to snappy-java - Add maven-metadata-local.xml to .gitignore	2014-02-04 22:04:28 -05:00
Joel Thibault	0025fe190d	Exclude sam's older TestNG	2014-02-04 22:04:27 -05:00
Joel Thibault	9eaee8c73c	Integration test for the -nt race condition corrupting AD and PL fields	2014-02-04 22:04:27 -05:00
Karthik Gururaj	24f8aef344	Contains profiling, exception tracking, PAPI code Contains Sandbox Java	2014-02-04 16:27:29 -08:00
David Roazen	1de7a27471	Disable an additional test that is runtime dependent on one of the temporarily-disabled tests	2014-02-04 16:07:58 -05:00
David Roazen	76086f30b7	Temporarily disable tests that started failing post-maven Joel is working on these failures in a separate branch. Since maven (currently! we're working on this..) won't run the whole test suite to completion if there's a failure early on, we need to temporarily disable these tests in order to allow group members to run tests on their branches again.	2014-02-04 15:31:24 -05:00
David Roazen	3b2f07990d	Re-break the MWUnitTest for Joel to debug	2014-02-04 15:19:09 -05:00
David Roazen	c9032f0b5c	Fix failing unit tests	2014-02-04 03:05:30 -05:00
droazen	4eaa724be6	Merge pull request #483 from broadinstitute/ks_new_maven_build_system New maven build system	2014-02-03 10:53:18 -08:00
David Roazen	60567c8d7e	Minor ant-bridge.sh changes -add "gatk" target to mimic old "ant gatk" target -comment out release targets to prevent accidental releases	2014-02-03 13:50:47 -05:00
Khalid Shakir	a4289711e2	Distinct failsafe summary reports, just like invoker report directories.	2014-02-03 13:50:47 -05:00
Khalid Shakir	5ab0b117d2	Re-split the invoked integration tests and verify into separate phases.	2014-02-03 13:50:47 -05:00
Khalid Shakir	b10b42c10a	Added a "dry" argument to ant-bridge.sh, that just prints the command that would run.	2014-02-03 13:50:47 -05:00
Khalid Shakir	857e6e0d6f	Bumped version to 2.8-SNAPSHOT, using new update_pom_versions.sh script.	2014-02-03 13:50:46 -05:00
Khalid Shakir	20b471ef7b	ant-bridge dist -> verify, test.compile -> test-compile Added a utility script for running a single test, usually in parallel.	2014-02-03 13:50:46 -05:00
Khalid Shakir	9ca3004fc3	Setting the test-utils' type to test-jar, such that the multi-module build uses testClasses instead of classes as a directory dependency.	2014-02-03 13:50:46 -05:00
Khalid Shakir	f968b8a58b	Crash when integration tests fail by running install and verify at the same time, instead of trying to do them separately.	2014-02-03 13:50:46 -05:00
Khalid Shakir	de13f41fc3	One step closer to a proper test-utils artifact. Using the maven-jar-plugin to create a test classifer, excluding actual tests, until we can properly separate the classes into separate artifacts/modules.	2014-02-03 13:50:46 -05:00
Khalid Shakir	25aee7164e	Fixed missing "mvn" command execution in ant-bridge. Added pom.xml workarounds for duplicate classpath error, due to gatk-framework dependency containing required BaseTest, and jarred UnitTest/IntegrationTest classes that also exist as files under target/test-classes.	2014-02-03 13:50:46 -05:00
Khalid Shakir	caa76cdac4	Added maven pom.xmls for various artifacts.	2014-02-03 13:50:46 -05:00
Khalid Shakir	d1a689af33	Added new utility files used by maven build, including the ant-bridge script.	2014-02-03 13:50:46 -05:00
Khalid Shakir	88150e0166	Switched commited dependency repository from ivy to maven.	2014-02-03 13:50:46 -05:00
Khalid Shakir	1e25a758f5	Moved files to maven directories. Here are the git moved directories in case other files need to be moved during a merge: git-mv private/java/src/ private/gatk-private/src/main/java/ git-mv private/R/scripts/ private/gatk-private/src/main/resources/ git-mv private/java/test/ private/gatk-private/src/test/java/ git-mv private/testdata/ private/gatk-private/src/test/resources/ git-mv private/scala/qscript/ private/queue-private/src/main/qscripts/ git-mv private/scala/src/ private/queue-private/src/main/scala/ git-mv protected/java/src/ protected/gatk-protected/src/main/java/ git-mv protected/java/test/ protected/gatk-protected/src/test/java/ git-mv public/java/src/ public/gatk-framework/src/main/java/ git-mv public/java/test/ public/gatk-framework/src/test/java/ git-mv public/testdata/ public/gatk-framework/src/test/resources/ git-mv public/scala/qscript/ public/queue-framework/src/main/qscripts/ git-mv public/scala/src/ public/queue-framework/src/main/scala/ git-mv public/scala/test/ public/queue-framework/src/test/scala/	2014-02-03 13:50:44 -05:00
Khalid Shakir	faaef236ea	Moved gsalib, R and other resources, Queue GATK extensions generator, Queue version java files.	2014-02-03 13:49:21 -05:00
Khalid Shakir	eb52dc6a9b	Moved build.xml, ivy.xml, ivysettings.xml, ivy properties, public/packages/*.xml into private/archive/ant	2014-02-03 13:49:20 -05:00
Karthik Gururaj	6d4d776633	Includes code for all debug code for obtaining profiling info	2014-01-30 12:08:06 -08:00
Eric Banks	83d07280ef	Merge pull request #482 from broadinstitute/vrr_reference_model_alt_allele gVCF <NON_REF> in all vcf lines including variant ones when –ERC gVCF is...	2014-01-30 08:25:43 -08:00
Valentin Ruano-Rubio	89c4e57478	gVCF <NON_REF> in all vcf lines including variant ones when –ERC gVCF is requested. Changes: ------- <NON_REF> likelihood in variant sites is calculated as the maximum possible likelihood for an unseen alternative allele: for reach read is calculated as the second best likelihood amongst the reported alleles. When –ERC gVCF, stand_conf_emit and stand_conf_call are forcefully set to 0. Also dontGenotype is set to false for consistency sake. Integration test MD5 have been changed accordingly. Additional fix: -------------- Specially after adding the <NON_REF> allele, but also happened without that, QUAL values tend to go to 0 (very large integer number in log 10) due to underflow when combining GLs (GenotypingEngine.combineGLs). To fix that combineGLs has been substituted by combineGLsPrecise that uses the log-sum-exp trick. In just a few cases this change results in genotype changes in integration tests but after double-checking using unit-test and difference between combineGLs and combineGLsPrecise in the affected integration test, the previous GT calls were either border-line cases and or due to the underflow.	2014-01-30 11:23:33 -05:00
Karthik Gururaj	5c7427e48c	Temporary commit containing debug profiling code - commented out	2014-01-29 12:10:29 -08:00
Karthik Gururaj	0c63d6264f	1. Added synchronization block around loadLibrary in VectorLoglessPairHMM 2. Edited Makefile to use static libraries where possible	2014-01-27 15:34:58 -08:00
Karthik Gururaj	a15137a667	Modified run.sh	2014-01-27 14:56:46 -08:00
Karthik Gururaj	2c0d70c863	Moved vector JNI code to public/c++/VectorPairHMM	2014-01-27 14:52:59 -08:00
Karthik Gururaj	85a748860e	1. Added more profiling code 2. Modified JNI_README	2014-01-27 14:32:44 -08:00
Valentin Ruano Rubio	383a4f4a70	Merge pull request #481 from broadinstitute/vrr_pairhmm_log_probability_fix Fix for the PairHMM transition probability miscalculation.	2014-01-27 10:59:08 -08:00
Valentin Ruano-Rubio	748d2fdf92	Added Integration test to verify the bugs are not there anymore as reported in pivotracker	2014-01-26 23:29:31 -05:00
Karthik Gururaj	a14a11c0cf	Pulled Mohammad's changes for creating variable sized arrays Merge branch 'master' of /home/mghodrat/PairHMM/shared-repository into intel_pairhmm Conflicts: PairHMM_JNI/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.cc	2014-01-26 19:40:43 -08:00
Karthik Gururaj	018e9e2c5f	1. Cleaned up code 2. Split into DebugJNILoglessPairHMM and VectorLoglessPairHMM with base class JNILoglessPairHMM. DebugJNILoglessPairHMM can, in principle, invoke any other child class of JNILoglessPairHMM. 3. Added more profiling code for Java parts of LoglessPairHMM	2014-01-26 19:18:12 -08:00
Valentin Ruano-Rubio	9e7bf75e89	Fix for the PairHMM transition probability miscalculation. Problem: matchToMatch transition calculation was wrong resulting in transition probabilites coming out of the Match state that added more than 1. Reports: https://www.pivotaltracker.com/s/projects/793457/stories/62471780 https://www.pivotaltracker.com/s/projects/793457/stories/61082450 Changes: The transition matrix update code has been moved to a common place in PairHMMModel to dry out its multiple copies. MatchToMatch transtion calculation has been fixed and implemented in PairHMMModel. Affected integration test md5 have been updated, there were no differences in GT fields and example differences always implied small changes in likelihoods that is what is expected.	2014-01-26 16:30:36 -05:00
mghodrat	e7598dde8b	Clean up	2014-01-26 11:36:06 -08:00
Karthik Gururaj	81bdfbd00d	Temporary commit before moving to new native library	2014-01-24 16:29:35 -08:00
Intel Repocontact	f7fa79e561	Merge branch 'intel_pairhmm' of /home/karthikg/broad/gsa-unstable into intel_pairhmm Committing into central_repo	2014-01-22 23:08:32 -08:00
Karthik Gururaj	936e9e175e	1. Converted q,i,d,c in C++ from int* to char* 2. Use clock_gettime to measure performance 3. Disabled OpenMP 4. Moved LoadTimeInitializer to different file	2014-01-22 22:57:32 -08:00

... 12 13 14 15 16 ...

13678 Commits (bcf6be0b08f8cd87dfdde05aa4e7fb6a370663cb) All Branches Search

13678 Commits (bcf6be0b08f8cd87dfdde05aa4e7fb6a370663cb)

All Branches