- SamPairUtils migrated in Picard r1737
- Revert IndelRealigner changes made in commit 4f4b85
-- Those changes were based on Picard revision 1722 to net/sf/picard/sam/SamPairUtil.java
-- Picard revision 1723 reverts these changes, so we also revert to match
This is currently my leading suspect for the cause of the
intermittent NoSuchElementException errors on master, since
the maven surefire plugin seems unable to handle errors in
TestNG DataProviders without blowing up.
This test is not, as I had initially thought, the cause of the
maven errors. Our master branch is failing intermittently
regardless of whether this test is enabled or disabled.
This reverts commit 45fc9ff515eec8d676b64a04fb34fb357492ff84.
This test passes when run individually, as part of the commit tests, or as
part of the package tests. However, when running the unit tests in isolation
it causes maven/surefire to throw a NoSuchElementException.
This is clearly a maven/surefire bug or configuration issue. I will re-enable
this test on a branch as Khalid and I try to work through it.
-Hide AWS downloader credentials in a private properties file
-Remove references to private ActiveRegion walker
Allows phone home functionality to be tested at release time
when we are running tests on the release jar.
1. Enable on-the-fly indexing for vcf.gz.
2. Handle on-the-fly indexing where file to be indexed is not a regular file, thus index should not be created.
3. Add method setProgressLogger to all SAMFileWriter implementations.
4. Revved picard to 1.109.1722
5. IndelRealigner md5s change because the MC tag is added to records now.
Fixed up and signed off by ebanks.
Package tests now hard coding just the gatk-framework tests jar, to include ONLY BaseTest, until the exclusions may be debugged.
Removing cofoja's annotation service from the package jars, to allow javac -cp <package>.jar.
Currently the best haplotypes are those that accumulate the largest ABSOLUTE edge *multiplicity* sum across their path in the assembly graph.
The edge *mulitplicity* is equal to the number of reads that expand through that edge, i.e. have a kmer that uniquely map to some vertex up-stream from the edge and the following base calls extend across that edge to vertices downstream from it.
Despite that it is obvious that higher multiplicties correlated with haplotype probability this criterion fails short in some regards of which the most relevant is:
As it is evaluated in condensed seq-graph (as supposed to uncompressed read-threading-graphs) it is bias to haplotypes that have more short-sequence vetices
( -> ATGC -> CA -> has worse score than -> A -> T -> G -> C -> C -> A ->). This is partly result of how we modify the edge multiplicities when we merge vertices from a linear chain.
This pull-request addresses the problem by changing to a new scoring schema based in likelihood estimates:
Each haplotype's likelihood can be calculated as the multiplication of the likelihood of "taking" its edges in the assembly graph. The likelihood of "taking" an edge in the assembly
graph is calculated as its multiplicity divide by the sum of multiplicity of edges that share the same source vertex.
This pull-request addresses the following stories:
https://www.pivotaltracker.com/story/show/66691418https://www.pivotaltracker.com/story/show/64319760
Change Summary:
1. Change to the new scoring schema.
2. Added a graph DOT printing code to KBestHaplotypeFinder in order to diagnose scoring.
3. Graph transformation have been modified in order to generate no 0-multiplicity edges. (Nevertheless the schema above should work with 0 edges assuming that they are in fact 0.5)
The maven shade plugin was eliminating a necessary class (IgnoreCookiesSpec)
when packaging the GATK/Queue. Work around this by telling maven to
always package all of commons-httpclient.
Enable it with the new --useIUPAC argument.
Added both unit and integration tests for the new functionality - and fixed up the
exising tests once I was in there.
-These tests are really integration tests for Queue rather than generalized
pipeline tests, so it makes sense to call them QueueTests.
-Rename test classes and maven build targets, and update shell scripts
to reflect new naming.
This module was causing failures during the release
packaging tests. After discussing with Khalid, we've
decided to disable it for now until a fix can be
developed.
C++ code has PAPI calls for reading hardware counters
Followed Khalid's suggestion for packing libVectorLoglessCaching into
the jar file with Maven
Native library part of git repo
1. Renamed directory structure from public/c++/VectorPairHMM to
public/VectorPairHMM/src/main/c++ as per Khalid's suggestion
2. Use java.home in public/VectorPairHMM/pom.xml to pass environment
variable JRE_HOME to the make process. This is needed because the
Makefile needs to compile JNI code with the flag -I<JRE_HOME>/../include (among
others). Assuming that the Maven build process uses a JDK (and not just
a JRE), the variable java.home points to the JRE inside maven.
3. Dropped all pretense at cross-platform compatibility. Removed Mac
profile from pom.xml for VectorPairHMM
Moved JNI_README
1. Added the catch UnsatisfiedLinkError exception in
PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING
in case the native library could not be loaded. Made
VECTOR_LOGLESS_CACHING as the default implementation.
2. Updated the README with Mauricio's comments
3. baseline.cc is used within the library - if the machine supports
neither AVX nor SSE4.1, the native library falls back to un-vectorized
C++ in baseline.cc.
4. pairhmm-1-base.cc: This is not part of the library, but is being
heavily used for debugging/profiling. Can I request that we keep it
there for now? In the next release, we can delete it from the
repository.
5. I agree with Mauricio about the ifdefs. I am sure you already know,
but just to reassure you the debug code is not compiled into the library
(because of the ifdefs) and will not affect performance.
1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java
2. Committing the right set of files after rebase
Added public license text to all C++ files
Added license to Makefile
Add package info to Sandbox.java
Conflicts:
protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java
protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java
protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/DebugJNILoglessPairHMM.java
protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/JNILoglessPairHMM.java
protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/VectorLoglessPairHMM.java
public/VectorPairHMM/src/main/c++/.gitignore
public/VectorPairHMM/src/main/c++/LoadTimeInitializer.cc
public/VectorPairHMM/src/main/c++/LoadTimeInitializer.h
public/VectorPairHMM/src/main/c++/Makefile
public/VectorPairHMM/src/main/c++/Sandbox.cc
public/VectorPairHMM/src/main/c++/Sandbox.h
public/VectorPairHMM/src/main/c++/Sandbox.java
public/VectorPairHMM/src/main/c++/Sandbox_JNIHaplotypeDataHolderClass.h
public/VectorPairHMM/src/main/c++/Sandbox_JNIReadDataHolderClass.h
public/VectorPairHMM/src/main/c++/baseline.cc
public/VectorPairHMM/src/main/c++/define-double.h
public/VectorPairHMM/src/main/c++/define-float.h
public/VectorPairHMM/src/main/c++/define-sse-double.h
public/VectorPairHMM/src/main/c++/define-sse-float.h
public/VectorPairHMM/src/main/c++/headers.h
public/VectorPairHMM/src/main/c++/jnidebug.h
public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.cc
public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.h
public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.cc
public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.h
public/VectorPairHMM/src/main/c++/pairhmm-template-kernel.cc
public/VectorPairHMM/src/main/c++/pairhmm-template-main.cc
public/VectorPairHMM/src/main/c++/run.sh
public/VectorPairHMM/src/main/c++/shift_template.c
public/VectorPairHMM/src/main/c++/utils.cc
public/VectorPairHMM/src/main/c++/utils.h
public/VectorPairHMM/src/main/c++/vector_function_prototypes.h
-- throws UserException; added tests in PosteriorLikelihoodsUtilsUnitTests
Add error handling to CalculateGenotypePosteriors for cases where MLEAC>AN; add tests in PosteriorLikelihoodsUtilsUnitTests
Add unit tests to confirm that CalculateGenotypePosteriors has the ability to switch genotypes for four cases
Changes:
1. Addressed review comments on new K-best haplotype assembly graph finder.
2. Generalize KBestHaplotypeFinder to deal with multiple source and sink vertices.
3. Updated test to use KBestHaplotypeFinder instead of KBestPaths
4. Retired KBestPaths to the archive.
5. Small improvements to the code and documentation.
Story:
https://www.pivotaltracker.com/story/show/66238286
Changes:
1. Created a new k-best haplotype search implementation in class KBestHaplotypeFinder.
2. Changed HC code to use the new implementation.
This seems to fix the original problem without causing significant changes in outputs using some empirical data test cases
3. Moved haplotype's cigar calculation code from Path to CigarUtils; need that in order to gain independence from Path in some parts of the code.
In any case that seems like a more natural location for that functionality.
The GATK now fails with a user error if you try to run with a reduced bam.
(I added a unit test for that; everything else here is just the removal of all traces of RR)
PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING
in case the native library could not be loaded. Made
VECTOR_LOGLESS_CACHING as the default implementation.
2. Updated the README with Mauricio's comments
3. baseline.cc is used within the library - if the machine supports
neither AVX nor SSE4.1, the native library falls back to un-vectorized
C++ in baseline.cc.
4. pairhmm-1-base.cc: This is not part of the library, but is being
heavily used for debugging/profiling. Can I request that we keep it
there for now? In the next release, we can delete it from the
repository.
5. I agree with Mauricio about the ifdefs. I am sure you already know,
but just to reassure you the debug code is not compiled into the library
(because of the ifdefs) and will not affect performance.