Commit Graph

4243 Commits (51b8ea5d59b7bb856713d2b6048e7b758ad3bf0f)

Author SHA1 Message Date
Karthik Gururaj ec54528605 Fixed error in Sandbox.java 2014-03-05 09:36:55 -08:00
Karthik Gururaj 8fcbf9272c Merge branch 'intel_pairhmm' of /data/broad/gsa-unstable into intel_pairhmm
Conflicts:
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java
	public/VectorPairHMM/src/main/c++/Sandbox.java
2014-03-05 09:35:50 -08:00
Intel Repocontact d81116eb1d Added vectorized PairHMM implementation by Mohammad and Mustafa into the Maven build of GATK.
C++ code has PAPI calls for reading hardware counters

Followed Khalid's suggestion for packing libVectorLoglessCaching into
the jar file with Maven

Native library part of git repo

1. Renamed directory structure from public/c++/VectorPairHMM to
public/VectorPairHMM/src/main/c++ as per Khalid's suggestion
2. Use java.home in public/VectorPairHMM/pom.xml to pass environment
variable JRE_HOME to the make process. This is needed because the
Makefile needs to compile JNI code with the flag -I<JRE_HOME>/../include (among
others). Assuming that the Maven build process uses a JDK (and not just
a JRE), the variable java.home points to the JRE inside maven.
3. Dropped all pretense at cross-platform compatibility. Removed Mac
profile from pom.xml for VectorPairHMM

Moved JNI_README

1. Added the catch UnsatisfiedLinkError exception in
PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING
in case the native library could not be loaded. Made
VECTOR_LOGLESS_CACHING as the default implementation.
2. Updated the README with Mauricio's comments
3. baseline.cc is used within the library - if the machine supports
neither AVX nor SSE4.1, the native library falls back to un-vectorized
C++ in baseline.cc.
4. pairhmm-1-base.cc: This is not part of the library, but is being
heavily used for debugging/profiling. Can I request that we keep it
there for now? In the next release, we can delete it from the
repository.
5. I agree with Mauricio about the ifdefs. I am sure you already know,
but just to reassure you the debug code is not compiled into the library
(because of the ifdefs) and will not affect performance.

1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java
2. Committing the right set of files after rebase

Added public license text to all C++ files

Added license to Makefile

Add package info to Sandbox.java

Conflicts:
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/DebugJNILoglessPairHMM.java
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/JNILoglessPairHMM.java
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/VectorLoglessPairHMM.java
	public/VectorPairHMM/src/main/c++/.gitignore
	public/VectorPairHMM/src/main/c++/LoadTimeInitializer.cc
	public/VectorPairHMM/src/main/c++/LoadTimeInitializer.h
	public/VectorPairHMM/src/main/c++/Makefile
	public/VectorPairHMM/src/main/c++/Sandbox.cc
	public/VectorPairHMM/src/main/c++/Sandbox.h
	public/VectorPairHMM/src/main/c++/Sandbox.java
	public/VectorPairHMM/src/main/c++/Sandbox_JNIHaplotypeDataHolderClass.h
	public/VectorPairHMM/src/main/c++/Sandbox_JNIReadDataHolderClass.h
	public/VectorPairHMM/src/main/c++/baseline.cc
	public/VectorPairHMM/src/main/c++/define-double.h
	public/VectorPairHMM/src/main/c++/define-float.h
	public/VectorPairHMM/src/main/c++/define-sse-double.h
	public/VectorPairHMM/src/main/c++/define-sse-float.h
	public/VectorPairHMM/src/main/c++/headers.h
	public/VectorPairHMM/src/main/c++/jnidebug.h
	public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.cc
	public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.h
	public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.cc
	public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.h
	public/VectorPairHMM/src/main/c++/pairhmm-template-kernel.cc
	public/VectorPairHMM/src/main/c++/pairhmm-template-main.cc
	public/VectorPairHMM/src/main/c++/run.sh
	public/VectorPairHMM/src/main/c++/shift_template.c
	public/VectorPairHMM/src/main/c++/utils.cc
	public/VectorPairHMM/src/main/c++/utils.h
	public/VectorPairHMM/src/main/c++/vector_function_prototypes.h
2014-03-05 09:30:29 -08:00
Joel Thibault 57747ad35e Logger output should go to STDERR instead of STDOUT 2014-03-05 10:01:06 -05:00
Joel Thibault b4dde6a78c Add WARN to the valid log types error message
- order if statements and error message in increasing severity
2014-03-05 10:01:06 -05:00
Valentin Ruano Rubio 243d1bc07a Merge pull request #542 from broadinstitute/vrr_efficient_find_best_haplotypes
Added a more efficient implementation of the KBest haplotype finder code...
2014-03-05 09:44:50 -05:00
David Roazen 58905e8fe0 Disable the intermittently-failing and flawed ProgressMeterDaemonUnitTest
-created a Pivotal ticket to eventually redesign this test
2014-03-05 09:15:26 -05:00
Valentin Ruano-Rubio 69bf2b3247 Added a more efficient implementation of the KBest haplotype finder code (CONT.)
Changes:

  1. Addressed review comments on new K-best haplotype assembly graph finder.
  2. Generalize KBestHaplotypeFinder to deal with multiple source and sink vertices.
  3. Updated test to use KBestHaplotypeFinder instead of KBestPaths
  4. Retired KBestPaths to the archive.
  5. Small improvements to the code and documentation.
2014-03-04 23:22:27 -05:00
Valentin Ruano-Rubio 7acf2eb0e7 Added a more efficient implementation of the KBest haplotype finder code.
Story:

  https://www.pivotaltracker.com/story/show/66238286

Changes:

  1. Created a new k-best haplotype search implementation in class KBestHaplotypeFinder.
  2. Changed HC code to use the new implementation.
  This seems to fix the original problem without causing significant changes in outputs using some empirical data test cases
  3. Moved haplotype's cigar calculation code from Path to CigarUtils; need that in order to gain independence from Path in some parts of the code.
     In any case that seems like a more natural location for that functionality.
2014-03-04 12:22:14 -05:00
Karthik Gururaj a893765ae2 Added license to Makefile 2014-03-03 09:11:02 -08:00
Karthik Gururaj 7cd23543a1 Added public license text to all C++ files 2014-03-03 09:04:00 -08:00
Eric Banks 22ad18b919 Moving Reduce Reads to the archive.
The GATK now fails with a user error if you try to run with a reduced bam.
(I added a unit test for that; everything else here is just the removal of all traces of RR)
2014-03-02 02:03:14 -05:00
Khalid Shakir 387188e5bb Attempting to limit gc during Maven tests, using defaults found in JavaCommandLineFunction 2014-03-01 15:24:45 +08:00
Karthik Gururaj 1b395a871a 1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java
2. Committing the right set of files after rebase
2014-02-28 16:08:28 -08:00
Karthik Gururaj 37526dfad5 1. Added the catch UnsatisfiedLinkError exception in
PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING
in case the native library could not be loaded. Made
VECTOR_LOGLESS_CACHING as the default implementation.
2. Updated the README with Mauricio's comments
3. baseline.cc is used within the library - if the machine supports
neither AVX nor SSE4.1, the native library falls back to un-vectorized
C++ in baseline.cc.
4. pairhmm-1-base.cc: This is not part of the library, but is being
heavily used for debugging/profiling. Can I request that we keep it
there for now? In the next release, we can delete it from the
repository.
5. I agree with Mauricio about the ifdefs. I am sure you already know,
but just to reassure you the debug code is not compiled into the library
(because of the ifdefs) and will not affect performance.
2014-02-28 08:59:55 -08:00
Chris Whelan e61ba8b340 Added command line checks for duplicate files in ROD lists
-- Keep a list of processed files in ArgumentTypeDescriptor.getRodBindingsCollection
  -- Throw user exception if a file name duplicates one that was previously parsed
  -- Throw user exception if the ROD list is empty
  -- Added two unit tests to RodBindingCollectionUnitTest
2014-02-27 13:32:18 -05:00
Karthik Gururaj 2d0ce45bb0 Moved JNI_README 2014-02-27 10:12:23 -08:00
Karthik Gururaj c645725fc3 1. Renamed directory structure from public/c++/VectorPairHMM to
public/VectorPairHMM/src/main/c++ as per Khalid's suggestion
2. Use java.home in public/VectorPairHMM/pom.xml to pass environment
variable JRE_HOME to the make process. This is needed because the
Makefile needs to compile JNI code with the flag -I<JRE_HOME>/../include (among
others). Assuming that the Maven build process uses a JDK (and not just
a JRE), the variable java.home points to the JRE inside maven.
3. Dropped all pretense at cross-platform compatibility. Removed Mac
profile from pom.xml for VectorPairHMM
2014-02-26 15:17:15 -08:00
Karthik Gururaj bd71ba35e5 Moved pom.xml to VectorPairHMM and updated artifactId 2014-02-26 14:01:46 -08:00
Khalid Shakir da587d48ed Using absolute paths in generated diff commands, to ease running them from any directory. 2014-02-27 04:43:39 +08:00
Khalid Shakir c163e6d0d2 Separate failsafe directories for each of the integration test types [#66515572] 2014-02-27 04:43:39 +08:00
Karthik Gururaj b81e2c2948 Native library part of git repo 2014-02-26 11:47:42 -08:00
Karthik Gururaj 0fe843bfd9 Followed Khalid's suggestion for packing libVectorLoglessCaching into
the jar file with Maven
2014-02-26 11:47:42 -08:00
Karthik Gururaj 15fe244e4b Now has PAPI values 2014-02-26 11:47:42 -08:00
Intel Repocontact e32e9e6af6 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2014-02-26 11:47:01 -08:00
Intel Repocontact ff2a972ab5 Merge branch 'master' of github.com:broadinstitute/gsa-unstable
Conflicts:
	.gitignore
2014-02-25 20:56:28 -08:00
Khalid Shakir f02ce6eca7 Added tests for cleaning up scattered .bai files, and using the log directory.
Re-added import java.io.File for BamGatherFunction.
Other cleanup to resolve scala syntax warnings from intellij.
Moved Example UG script to from protected to public.
2014-02-26 02:11:28 +08:00
pdexheimer 0405afeab2 Inherit BamGatherFunction from MergeSamFiles rather than PicardBamFunction
- This change means that BamGatherFunction will now have an @Output field for the BAM index, which will allow the bai to be deleted for intermediate functions

Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-26 02:11:28 +08:00
pdexheimer 504c125c26 Ensure .out files are saved into logDirectory
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-26 02:11:28 +08:00
pdexheimer 51dcd364a5 Added logDirectory argument
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-26 02:11:28 +08:00
Khalid Shakir 7e516b294f Replaced local drmaa and Jama artifacts with versions from maven central.
Removed unused caliper binary from local repo.
2014-02-22 01:21:35 +08:00
Khalid Shakir a75043b207 When git describe fails use "exported" instead of "unknown". 2014-02-22 01:21:35 +08:00
Khalid Shakir 4670c87313 Fixed mvn run for packagetests over external-example. 2014-02-22 01:21:34 +08:00
Khalid Shakir 70ecce2a0f Fixed scope for test-jar depedencies. 2014-02-22 01:21:34 +08:00
Eric Banks 235f0c6fa0 Merge pull request #528 from broadinstitute/eb_fix_cat_variants_usage_message
Fix the usage message for CatVariants to make it accurate.
2014-02-19 22:45:22 -05:00
Eric Banks 341d1bf2dd Fix the usage message for CatVariants to make it accurate.
It just hit a user on our forum...
2014-02-19 20:42:08 -05:00
Valentin Ruano-Rubio c167fb5fdf Fixing GenotypesGVCF.
Bug uncovered by some untrimmed alleles in the single sample pipeline output.

Notice however does not fix the untrimmed alleles in general.

Story:

https://www.pivotaltracker.com/story/show/65481104

Changes:

1. Fixed the bug itself.
2. Fixed non-working tests (sliently skipped due to exception in dataProvider).
2014-02-19 14:20:39 -05:00
Ryan Poplin 43c20264b0 Initial commit of the random forest classifier. 2014-02-17 13:07:27 -05:00
Khalid Shakir a505db79f5 Fixed build bug in ./ant-bridge.sh unittest -Dsingle=..., due to external-example.
pipeline.run property no longer required to be passed by test executor.
2014-02-15 13:52:20 +08:00
droazen 1e82f117ad Merge pull request #518 from broadinstitute/ks_skashin_gatkdocs_arguments
Ks skashin gatkdocs arguments
2014-02-14 13:57:19 -05:00
Eric Banks f6022a944b Merge pull request #513 from broadinstitute/eb_clean_up_genotype_posteriors
Various small fixes for CalculateGenotypePosteriors based on feedback fr...
2014-02-14 13:50:46 -05:00
Eric Banks 3724d4e5f3 Various small fixes for CalculateGenotypePosteriors based on feedback from guys in Ben Neale's group.
Note that this tool is still a work in progress and very experimental, so isn't 100% stable.  Most of
the features are untested (both by people and by unit/integration tests) because Chris Hartl implemented
it right before he left, and we're going to need to add tests at some point soon.  I added a first
integration test in this commit, but it's just a start.

The fixes include:

1. Stop having the genotyping code strip out AD values.  It doesn't make sense that it should do this so
I don't know why it was doing that at all.
Updated GenotypeGVCFs so that it doesn't need to manually recover them anymore.
This also helps CalculateGenotypePosteriors which was losing the AD values.
Updated code in LeftAlignAndTrimVariants to strip out PLs and AD, since it wasn't doing that before.
Updated the integration test for that walker to include such data.

2. Chris was calling Math.pow directly on the normalized posteriors which isn't safe.
Instead, the normalization routine itself can revert back to log scale in a safe manner so let's use it.
Also, renamed the variable to posteriorProbabilities (and not likelihoods).

3. Have CGP update the AC/AF/AN counts after fixing GTs.
2014-02-14 13:48:14 -05:00
kshakir 8b136d53b9 Merge pull request #524 from broadinstitute/ks_symlink_bin_jar
Create symlinks target/GenomeAnalysisTK.jar and target/Queue.jar
2014-02-15 02:32:59 +08:00
Khalid Shakir bc9ac93b6c Adding the external example to the build. 2014-02-15 01:26:07 +08:00
Khalid Shakir 2e99a6ecf8 Create symlinks target/GenomeAnalysisTK.jar and target/Queue.jar during package phase. 2014-02-15 01:12:32 +08:00
Nicholas Clarke 7ae19953f5 Squashed commit of the following:
commit 5e73b94eed3d1fc75c88863c2cf07d5972eb348b
Merge: e12593a d04a585
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Fri Feb 14 09:25:22 2014 +0000

    Merge pull request #1 from broadinstitute/checkpoint

    SimpleTimer passes tests, with formatting

commit d04a58533f1bf5e39b0b43018c9db3302943d985
Author: kshakir <github@kshakir.org>
Date:   Fri Feb 14 14:46:01 2014 +0800

    SimpleTimer passes tests, with formatting

    Fixed getNanoOffset() to offset nano to nano, instead of nano to seconds.
    Updated warning message with comma separated numbers, and exact values of offsets.

commit e12593ae66a5e6f0819316f2a580dbc7ae5896ad
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Wed Feb 12 13:27:07 2014 +0000

    Remove instance of 'Timer'.

commit 47a73e0b123d4257b57cfc926a5bdd75d709fcf9
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Wed Feb 12 12:19:00 2014 +0000

    Revert a couple of changes that survived somehow.

    - CheckpointableTimer,Timer -> SimpleTimer

commit d86d9888ae93400514a8119dc2024e0a101f7170
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Mon Jan 20 14:13:09 2014 +0000

    Revised commits following comments.

    - All utility merged into `SimpleTimer`.
    - All tests merged into `SimpleTimerUnitTest`.
    - Behaviour of `getElapsedTime` should now be consistent with `stop`.
    - Use 'TimeUnit' class for all unit conversions.
    - A bit more tidying.

commit 354ee49b7fc880e944ff9df4343a86e9a5d477c7
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Fri Jan 17 17:04:39 2014 +0000

    Add a new CheckpointableTimerUnitTest.

    Revert SimpleTimerUnitTest to the version before any changes were made.

commit 2ad1b6c87c158399ededd706525c776372bbaf6e
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Tue Jan 14 16:11:18 2014 +0000

    Add test specifically checking behaviour under checkpoint/restart.

    Slight alteration to the checkpointable timer based on observations
    during the testing - it seems that there's a fair amount of drift
    between the sources anyway, so each time we stop we resynchronise the
    offset. Hopefully this should avoid gradual drift building up and
    presenting as checkpoint/restart drift.

commit 1c98881594dc51e4e2365ac95b31d410326d8b53
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Tue Jan 14 14:11:31 2014 +0000

    Should use consistent time units

commit 6f70d42d660b31eee4c2e9d918e74c4129f46036
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Tue Jan 14 14:01:10 2014 +0000

    Add a new timer supporting checkpoint mechanisms.

    The issue with this is that the current timer is locked to JVM nanoTime. This can be reset after
    a checkpoint/restart and result in negative elapsed times, which causes an error.

    This patch addresses the issue in two ways:
     - Moves the check on timer information in GenomeAnalysisEngine.java to only occur if a time limit has been
    set.
     - Create a new timer (CheckpointableTimer) which keeps track of the relation between system and nano time. If
    this changes drastically, then the assumption is that there has been a JVM restart owing to checkpoint/restart.
    Any time straddling a checkpoint/restart event will not be counted towards total running time.

Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-14 21:45:47 +08:00
Laura Gauthier 29bb3d4dc1 Check for empty BAM lists in command line input 2014-02-14 08:09:47 -05:00
Khalid Shakir 225ee4880b Using new parameters via skashin to run gatkdocs in the maven conventional subdirectory.
Updated path for output gatkdocs in nightly build script.
Removed patch in plugin manager that contained a workaround for gatkdocs running in the top level directory.
2014-02-14 15:57:21 +08:00
skashin 1b3ac95798 Added the following arguments: -settings-dir -destination-dir -forum-key-path
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-14 14:28:35 +08:00
Eric Banks 7095a60c8e Merge pull request #516 from broadinstitute/dr_reenable_tests_failing_due_to_java_update
Re-enable tests that were failing post-maven due to changes in Java's Math.pow() implementation
2014-02-13 21:05:18 -05:00