C++ code has PAPI calls for reading hardware counters
Followed Khalid's suggestion for packing libVectorLoglessCaching into
the jar file with Maven
Native library part of git repo
1. Renamed directory structure from public/c++/VectorPairHMM to
public/VectorPairHMM/src/main/c++ as per Khalid's suggestion
2. Use java.home in public/VectorPairHMM/pom.xml to pass environment
variable JRE_HOME to the make process. This is needed because the
Makefile needs to compile JNI code with the flag -I<JRE_HOME>/../include (among
others). Assuming that the Maven build process uses a JDK (and not just
a JRE), the variable java.home points to the JRE inside maven.
3. Dropped all pretense at cross-platform compatibility. Removed Mac
profile from pom.xml for VectorPairHMM
Moved JNI_README
1. Added the catch UnsatisfiedLinkError exception in
PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING
in case the native library could not be loaded. Made
VECTOR_LOGLESS_CACHING as the default implementation.
2. Updated the README with Mauricio's comments
3. baseline.cc is used within the library - if the machine supports
neither AVX nor SSE4.1, the native library falls back to un-vectorized
C++ in baseline.cc.
4. pairhmm-1-base.cc: This is not part of the library, but is being
heavily used for debugging/profiling. Can I request that we keep it
there for now? In the next release, we can delete it from the
repository.
5. I agree with Mauricio about the ifdefs. I am sure you already know,
but just to reassure you the debug code is not compiled into the library
(because of the ifdefs) and will not affect performance.
1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java
2. Committing the right set of files after rebase
Added public license text to all C++ files
Added license to Makefile
Add package info to Sandbox.java
Conflicts:
protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java
protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java
protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/DebugJNILoglessPairHMM.java
protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/JNILoglessPairHMM.java
protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/VectorLoglessPairHMM.java
public/VectorPairHMM/src/main/c++/.gitignore
public/VectorPairHMM/src/main/c++/LoadTimeInitializer.cc
public/VectorPairHMM/src/main/c++/LoadTimeInitializer.h
public/VectorPairHMM/src/main/c++/Makefile
public/VectorPairHMM/src/main/c++/Sandbox.cc
public/VectorPairHMM/src/main/c++/Sandbox.h
public/VectorPairHMM/src/main/c++/Sandbox.java
public/VectorPairHMM/src/main/c++/Sandbox_JNIHaplotypeDataHolderClass.h
public/VectorPairHMM/src/main/c++/Sandbox_JNIReadDataHolderClass.h
public/VectorPairHMM/src/main/c++/baseline.cc
public/VectorPairHMM/src/main/c++/define-double.h
public/VectorPairHMM/src/main/c++/define-float.h
public/VectorPairHMM/src/main/c++/define-sse-double.h
public/VectorPairHMM/src/main/c++/define-sse-float.h
public/VectorPairHMM/src/main/c++/headers.h
public/VectorPairHMM/src/main/c++/jnidebug.h
public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.cc
public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.h
public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.cc
public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.h
public/VectorPairHMM/src/main/c++/pairhmm-template-kernel.cc
public/VectorPairHMM/src/main/c++/pairhmm-template-main.cc
public/VectorPairHMM/src/main/c++/run.sh
public/VectorPairHMM/src/main/c++/shift_template.c
public/VectorPairHMM/src/main/c++/utils.cc
public/VectorPairHMM/src/main/c++/utils.h
public/VectorPairHMM/src/main/c++/vector_function_prototypes.h
C++ code has PAPI calls for reading hardware counters
Followed Khalid's suggestion for packing libVectorLoglessCaching into
the jar file with Maven
Native library part of git repo
1. Renamed directory structure from public/c++/VectorPairHMM to
public/VectorPairHMM/src/main/c++ as per Khalid's suggestion
2. Use java.home in public/VectorPairHMM/pom.xml to pass environment
variable JRE_HOME to the make process. This is needed because the
Makefile needs to compile JNI code with the flag -I<JRE_HOME>/../include (among
others). Assuming that the Maven build process uses a JDK (and not just
a JRE), the variable java.home points to the JRE inside maven.
3. Dropped all pretense at cross-platform compatibility. Removed Mac
profile from pom.xml for VectorPairHMM
Moved JNI_README
1. Added the catch UnsatisfiedLinkError exception in
PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING
in case the native library could not be loaded. Made
VECTOR_LOGLESS_CACHING as the default implementation.
2. Updated the README with Mauricio's comments
3. baseline.cc is used within the library - if the machine supports
neither AVX nor SSE4.1, the native library falls back to un-vectorized
C++ in baseline.cc.
4. pairhmm-1-base.cc: This is not part of the library, but is being
heavily used for debugging/profiling. Can I request that we keep it
there for now? In the next release, we can delete it from the
repository.
5. I agree with Mauricio about the ifdefs. I am sure you already know,
but just to reassure you the debug code is not compiled into the library
(because of the ifdefs) and will not affect performance.
1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java
2. Committing the right set of files after rebase
Added public license text to all C++ files
Added license to Makefile
Add package info to Sandbox.java
Changes:
1. Addressed review comments on new K-best haplotype assembly graph finder.
2. Generalize KBestHaplotypeFinder to deal with multiple source and sink vertices.
3. Updated test to use KBestHaplotypeFinder instead of KBestPaths
4. Retired KBestPaths to the archive.
5. Small improvements to the code and documentation.
Story:
https://www.pivotaltracker.com/story/show/66238286
Changes:
1. Created a new k-best haplotype search implementation in class KBestHaplotypeFinder.
2. Changed HC code to use the new implementation.
This seems to fix the original problem without causing significant changes in outputs using some empirical data test cases
3. Moved haplotype's cigar calculation code from Path to CigarUtils; need that in order to gain independence from Path in some parts of the code.
In any case that seems like a more natural location for that functionality.
The purpose of this is to be able to call SNPs that fall at the beginning of a capture region (or exon).
Before, the read threading code would only start threading from the first kmer that matched the reference. But
that means that, in the case of a SNP at the beginning of an exome, it wouldn't start threading the read until
after the SNP position - so we'd lose the SNP.
For now, this is still very experimental. It works well for RNAseq data, but does introduce FPs in normal exomes.
I know why this is and how to fix it, but it requires a much larger fix to the HC: the HC needs to pass all reads
and bases to the annotation engine (like UG does) instead of just the high quality ones. So for now, the head
merging is disabled by default.
As per reviewer comments, I moved the head and tail merging code out into their own class.
We use a "manager" to keep track of observed splits and previous reads. This can be extended/modified in the
future to try to salvage those overhangs instead of hard-clipping them and/or try other possible strategies.
Added unit tests and more integration tests.
The GATK now fails with a user error if you try to run with a reduced bam.
(I added a unit test for that; everything else here is just the removal of all traces of RR)
PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING
in case the native library could not be loaded. Made
VECTOR_LOGLESS_CACHING as the default implementation.
2. Updated the README with Mauricio's comments
3. baseline.cc is used within the library - if the machine supports
neither AVX nor SSE4.1, the native library falls back to un-vectorized
C++ in baseline.cc.
4. pairhmm-1-base.cc: This is not part of the library, but is being
heavily used for debugging/profiling. Can I request that we keep it
there for now? In the next release, we can delete it from the
repository.
5. I agree with Mauricio about the ifdefs. I am sure you already know,
but just to reassure you the debug code is not compiled into the library
(because of the ifdefs) and will not affect performance.
Re-added import java.io.File for BamGatherFunction.
Other cleanup to resolve scala syntax warnings from intellij.
Moved Example UG script to from protected to public.
This commit consists of 2 main changes:
1. When the strand table gets too large, we normalize it down to values that are more reasonable.
2. We don't include a particular sample's contribution unless the total ref and alt counts are at least 2 each;
this is a heuristic method for dealing only with hets.
MD5s change as expected.
Hopefully we'll have a more robust implementation for GATK 3.1.
The slicePrefix method functionality was broken.
Story:
https://www.pivotaltracker.com/story/show/64595624
Changes:
1. Fixed the bug.
2. Added unit test to check on the method functionality.
3. Added a integration test to verify the bug has been fixed in a empirical data reprudible case.
Story:
https://www.pivotaltracker.com/s/projects/1007536
Changes:
1. HC's GenotypingEngine now invokes reverseAlleleTrimming on GVCF variant output lines.
2. GenotypeGVCFs also reverse trim after regenotyping as some alt. alleles are dropped (observed in real-data).
The writer was never resetting the pointer to the end of the last non-ref VariantContext that it saw.
This was fine except when it jumped to a new contig - and a lower position on that contig - where it
thought that it was still part of that previous non-ref VariantContext so wouldn't emit a reference
block. Therefore, ref blocks were missing from the beginnings of all chromosomes (except chr1).
Added unit test to cover this case.
Bug uncovered by some untrimmed alleles in the single sample pipeline output.
Notice however does not fix the untrimmed alleles in general.
Story:
https://www.pivotaltracker.com/story/show/65481104
Changes:
1. Fixed the bug itself.
2. Fixed non-working tests (sliently skipped due to exception in dataProvider).
Note that this tool is still a work in progress and very experimental, so isn't 100% stable. Most of
the features are untested (both by people and by unit/integration tests) because Chris Hartl implemented
it right before he left, and we're going to need to add tests at some point soon. I added a first
integration test in this commit, but it's just a start.
The fixes include:
1. Stop having the genotyping code strip out AD values. It doesn't make sense that it should do this so
I don't know why it was doing that at all.
Updated GenotypeGVCFs so that it doesn't need to manually recover them anymore.
This also helps CalculateGenotypePosteriors which was losing the AD values.
Updated code in LeftAlignAndTrimVariants to strip out PLs and AD, since it wasn't doing that before.
Updated the integration test for that walker to include such data.
2. Chris was calling Math.pow directly on the normalized posteriors which isn't safe.
Instead, the normalization routine itself can revert back to log scale in a safe manner so let's use it.
Also, renamed the variable to posteriorProbabilities (and not likelihoods).
3. Have CGP update the AC/AF/AN counts after fixing GTs.
After extensive detective work, Joel determined that these tests were failing
due to changes in the implementation of Math.pow() in newer versions of
Java 1.7.
All GSA members should ensure that they're using a JDK that is at least
as current as the one in the Java-1.7 dotkit on the Broad servers
(build 1.7.0_51-b13).
1. updated QualByDepth not to use AD-restricted depth if it is zero.
Added unit test this change.
2. Fixed small bug in CombineGVCFs where spanning deletions were not being treated consistently throughout.
Added test for this situation.
3. Make sure GenotypeGVCFs puts in the required headers.
Updated test files to make sure this is covered.
4. Have GenotypeGVCFs propagate up the MLEAC/AF (which were getting clobbered out).
Tests updated to account for this.
when the AD annotation is present for a given genotype then we only use its depth for QD if the variant depth > 1.
Added new unit tests for QualByDepth.
Creating new VariantContexts each time we broke up a block was very expensive because we break up
blocks so often. Also, calling into GATKVariantContextUtils.simpleMerge was really hurting performance.
MD5 changes because we no longer propogate any INFO fields (except for END) for reference blocks; the tests
have the now unused BLOCK_SIZE field that now get dropped.
Story:
https://www.pivotaltracker.com/story/show/65388246
Additional changes and notes:
1. The fix consist in forcing the output of all PLs by setting the standard flag for that '-allSitePLs'.
2. BP_RESOLUTION was handled differently to GVCF in some aspect that should be common. That has been fixed.