Commit Graph

4413 Commits (36c27155afef2d7d8b69db00a6686dfc647ff98f)

Author SHA1 Message Date
Alec Wysoker 0369f93b24 GATK changes to conform to Tribble refactoring as part improving Tabix support in Tribble (among other things).
1. Enable on-the-fly indexing for vcf.gz.
2. Handle on-the-fly indexing where file to be indexed is not a regular file, thus index should not be created.
3. Add method setProgressLogger to all SAMFileWriter implementations.
4. Revved picard to 1.109.1722
5. IndelRealigner md5s change because the MC tag is added to records now.

Fixed up and signed off by ebanks.
2014-03-17 11:56:22 -04:00
Khalid Shakir 639247ab48 Updated package-tests classpath, and allowing javac -cp <package>.jar.
Package tests now hard coding just the gatk-framework tests jar, to include ONLY BaseTest, until the exclusions may be debugged.
Removing cofoja's annotation service from the package jars, to allow javac -cp <package>.jar.
2014-03-17 05:47:59 -04:00
Valentin Ruano-Rubio 2e964c59b4 Improved criteria to select best haplotypes out from the assembly graph.
Currently the best haplotypes are those that accumulate the largest ABSOLUTE edge *multiplicity* sum across their path in the assembly graph.

The edge *mulitplicity* is equal to the number of reads that expand through that edge, i.e. have a kmer that uniquely map to some vertex up-stream from the edge and the following base calls extend across that edge to vertices downstream from it.

Despite that it is obvious that higher multiplicties correlated with haplotype probability this criterion fails short in some regards of which the most relevant is:

As it is evaluated in condensed seq-graph (as supposed to uncompressed read-threading-graphs) it is bias to haplotypes that have more short-sequence vetices
  ( -> ATGC -> CA -> has worse score than -> A -> T -> G -> C -> C -> A ->). This is partly result of how we modify the edge multiplicities when we merge vertices from a linear chain.

This pull-request addresses the problem by changing to a new scoring schema based in likelihood estimates:

Each haplotype's likelihood can be calculated as the multiplication of the likelihood of "taking" its edges in the assembly graph. The likelihood of "taking" an edge in the assembly
graph is calculated as its multiplicity divide by the sum of multiplicity of edges that share the same source vertex.

This pull-request addresses the following stories:

https://www.pivotaltracker.com/story/show/66691418
https://www.pivotaltracker.com/story/show/64319760

Change Summary:

1. Change to the new scoring schema.
2. Added a graph DOT printing code to KBestHaplotypeFinder in order to diagnose scoring.
3. Graph transformation have been modified in order to generate no 0-multiplicity edges. (Nevertheless the schema above should work with 0 edges assuming that they are in fact 0.5)
2014-03-14 18:37:01 -04:00
David Roazen 1324120c17 Unconditionally include all of commons-httpclient in the GATK/Queue jars
The maven shade plugin was eliminating a necessary class (IgnoreCookiesSpec)
when packaging the GATK/Queue. Work around this by telling maven to
always package all of commons-httpclient.
2014-03-14 10:50:15 -04:00
Eric Banks ffaf92f871 Added new functionality to the FastaAlternateReferenceMaker to have it output IUPAC codes for het sites.
Enable it with the new --useIUPAC argument.
Added both unit and integration tests for the new functionality - and fixed up the
exising tests once I was in there.
2014-03-12 14:31:57 -04:00
droazen 8b53567dc7 Merge pull request #553 from broadinstitute/dr_rename_pipeline_tests
Rename existing PipelineTests to QueueTests to prepare for upcoming push of new pipeline tests
2014-03-10 21:36:45 -04:00
David Roazen 78562c14bb Rename existing PipelineTests to QueueTests to prepare for upcoming push of new pipeline tests
-These tests are really integration tests for Queue rather than generalized
 pipeline tests, so it makes sense to call them QueueTests.

-Rename test classes and maven build targets, and update shell scripts
 to reflect new naming.
2014-03-10 21:24:03 -04:00
David Roazen 7c34f05082 Merge remote-tracking branch 'origin/master' into intel 2014-03-10 14:07:36 -04:00
Ami Levy-Moonshine 2a6f05a8a1 add an option to randomly (uniformly) split a vcf file/s to more than 2 files.
The old code that allow split to two files (given in the input) is kept to allow uneven splitting between files.
2014-03-10 10:58:44 -04:00
Karthik Gururaj 6e98e9e589 Removed g_haplotype* global variables in native code so that it works
with multi-threading in Java.
Modified VectorLoglessPairHMM.java so that jniInitializeRegion and
jniFinalizeRegion are empty
2014-03-06 22:08:35 -08:00
Karthik Gururaj 3999677c93 Changed to delete[] where applicable 2014-03-06 12:23:08 -08:00
Karthik Gururaj a29777765d Binary library 2014-03-06 11:14:46 -08:00
Karthik Gururaj 7844d956ac Modified delete to delete[] 2014-03-06 11:13:34 -08:00
Karthik Gururaj 27e640d640 Modified SSE4.1 and 4.2 checks with _may_i_use_cpu_feature() 2014-03-06 08:51:11 -08:00
Karthik Gururaj 37f107cb3a Using Mustafa's function _may_i_use_cpu_feature() for AVX check 2014-03-06 08:37:48 -08:00
David Roazen 9df59bd8cc Update pom versions to mark the start of GATK 3.1 development 2014-03-06 00:05:58 -05:00
David Roazen 34edcb8ddf Update pom versions for the 3.0 release 2014-03-05 23:37:21 -05:00
David Roazen a9ddfdb7c0 Remove external-example module from public pom.xml
This module was causing failures during the release
packaging tests. After discussing with Khalid, we've
decided to disable it for now until a fix can be
developed.
2014-03-05 20:25:38 -05:00
Karthik Gururaj ec54528605 Fixed error in Sandbox.java 2014-03-05 09:36:55 -08:00
Karthik Gururaj 8fcbf9272c Merge branch 'intel_pairhmm' of /data/broad/gsa-unstable into intel_pairhmm
Conflicts:
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java
	public/VectorPairHMM/src/main/c++/Sandbox.java
2014-03-05 09:35:50 -08:00
Intel Repocontact d81116eb1d Added vectorized PairHMM implementation by Mohammad and Mustafa into the Maven build of GATK.
C++ code has PAPI calls for reading hardware counters

Followed Khalid's suggestion for packing libVectorLoglessCaching into
the jar file with Maven

Native library part of git repo

1. Renamed directory structure from public/c++/VectorPairHMM to
public/VectorPairHMM/src/main/c++ as per Khalid's suggestion
2. Use java.home in public/VectorPairHMM/pom.xml to pass environment
variable JRE_HOME to the make process. This is needed because the
Makefile needs to compile JNI code with the flag -I<JRE_HOME>/../include (among
others). Assuming that the Maven build process uses a JDK (and not just
a JRE), the variable java.home points to the JRE inside maven.
3. Dropped all pretense at cross-platform compatibility. Removed Mac
profile from pom.xml for VectorPairHMM

Moved JNI_README

1. Added the catch UnsatisfiedLinkError exception in
PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING
in case the native library could not be loaded. Made
VECTOR_LOGLESS_CACHING as the default implementation.
2. Updated the README with Mauricio's comments
3. baseline.cc is used within the library - if the machine supports
neither AVX nor SSE4.1, the native library falls back to un-vectorized
C++ in baseline.cc.
4. pairhmm-1-base.cc: This is not part of the library, but is being
heavily used for debugging/profiling. Can I request that we keep it
there for now? In the next release, we can delete it from the
repository.
5. I agree with Mauricio about the ifdefs. I am sure you already know,
but just to reassure you the debug code is not compiled into the library
(because of the ifdefs) and will not affect performance.

1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java
2. Committing the right set of files after rebase

Added public license text to all C++ files

Added license to Makefile

Add package info to Sandbox.java

Conflicts:
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/DebugJNILoglessPairHMM.java
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/JNILoglessPairHMM.java
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/VectorLoglessPairHMM.java
	public/VectorPairHMM/src/main/c++/.gitignore
	public/VectorPairHMM/src/main/c++/LoadTimeInitializer.cc
	public/VectorPairHMM/src/main/c++/LoadTimeInitializer.h
	public/VectorPairHMM/src/main/c++/Makefile
	public/VectorPairHMM/src/main/c++/Sandbox.cc
	public/VectorPairHMM/src/main/c++/Sandbox.h
	public/VectorPairHMM/src/main/c++/Sandbox.java
	public/VectorPairHMM/src/main/c++/Sandbox_JNIHaplotypeDataHolderClass.h
	public/VectorPairHMM/src/main/c++/Sandbox_JNIReadDataHolderClass.h
	public/VectorPairHMM/src/main/c++/baseline.cc
	public/VectorPairHMM/src/main/c++/define-double.h
	public/VectorPairHMM/src/main/c++/define-float.h
	public/VectorPairHMM/src/main/c++/define-sse-double.h
	public/VectorPairHMM/src/main/c++/define-sse-float.h
	public/VectorPairHMM/src/main/c++/headers.h
	public/VectorPairHMM/src/main/c++/jnidebug.h
	public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.cc
	public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.h
	public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.cc
	public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.h
	public/VectorPairHMM/src/main/c++/pairhmm-template-kernel.cc
	public/VectorPairHMM/src/main/c++/pairhmm-template-main.cc
	public/VectorPairHMM/src/main/c++/run.sh
	public/VectorPairHMM/src/main/c++/shift_template.c
	public/VectorPairHMM/src/main/c++/utils.cc
	public/VectorPairHMM/src/main/c++/utils.h
	public/VectorPairHMM/src/main/c++/vector_function_prototypes.h
2014-03-05 09:30:29 -08:00
Laura Gauthier 43fdd38342 Add error handling to CalculateGenotypePosteriors to catch multiallelic variants with wrong number of ACs
-- throws UserException; added tests in PosteriorLikelihoodsUtilsUnitTests
Add error handling to CalculateGenotypePosteriors for cases where MLEAC>AN; add tests in PosteriorLikelihoodsUtilsUnitTests
Add unit tests to confirm that CalculateGenotypePosteriors has the ability to switch genotypes for four cases
2014-03-05 12:03:18 -05:00
Laura Gauthier 7f9f58dbd1 Added hidden flag to GenotypeConcordance to output sites of discordant genotypes (to System.out)
Revised ConcondanceMetrics tests to adapt to change
Added comments to PosteriorLikelihoodsUtils
2014-03-05 12:03:18 -05:00
Joel Thibault 57747ad35e Logger output should go to STDERR instead of STDOUT 2014-03-05 10:01:06 -05:00
Joel Thibault b4dde6a78c Add WARN to the valid log types error message
- order if statements and error message in increasing severity
2014-03-05 10:01:06 -05:00
Valentin Ruano Rubio 243d1bc07a Merge pull request #542 from broadinstitute/vrr_efficient_find_best_haplotypes
Added a more efficient implementation of the KBest haplotype finder code...
2014-03-05 09:44:50 -05:00
David Roazen 58905e8fe0 Disable the intermittently-failing and flawed ProgressMeterDaemonUnitTest
-created a Pivotal ticket to eventually redesign this test
2014-03-05 09:15:26 -05:00
Valentin Ruano-Rubio 69bf2b3247 Added a more efficient implementation of the KBest haplotype finder code (CONT.)
Changes:

  1. Addressed review comments on new K-best haplotype assembly graph finder.
  2. Generalize KBestHaplotypeFinder to deal with multiple source and sink vertices.
  3. Updated test to use KBestHaplotypeFinder instead of KBestPaths
  4. Retired KBestPaths to the archive.
  5. Small improvements to the code and documentation.
2014-03-04 23:22:27 -05:00
Valentin Ruano-Rubio 7acf2eb0e7 Added a more efficient implementation of the KBest haplotype finder code.
Story:

  https://www.pivotaltracker.com/story/show/66238286

Changes:

  1. Created a new k-best haplotype search implementation in class KBestHaplotypeFinder.
  2. Changed HC code to use the new implementation.
  This seems to fix the original problem without causing significant changes in outputs using some empirical data test cases
  3. Moved haplotype's cigar calculation code from Path to CigarUtils; need that in order to gain independence from Path in some parts of the code.
     In any case that seems like a more natural location for that functionality.
2014-03-04 12:22:14 -05:00
Karthik Gururaj a893765ae2 Added license to Makefile 2014-03-03 09:11:02 -08:00
Karthik Gururaj 7cd23543a1 Added public license text to all C++ files 2014-03-03 09:04:00 -08:00
Eric Banks 22ad18b919 Moving Reduce Reads to the archive.
The GATK now fails with a user error if you try to run with a reduced bam.
(I added a unit test for that; everything else here is just the removal of all traces of RR)
2014-03-02 02:03:14 -05:00
Khalid Shakir 387188e5bb Attempting to limit gc during Maven tests, using defaults found in JavaCommandLineFunction 2014-03-01 15:24:45 +08:00
Karthik Gururaj 1b395a871a 1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java
2. Committing the right set of files after rebase
2014-02-28 16:08:28 -08:00
Karthik Gururaj 37526dfad5 1. Added the catch UnsatisfiedLinkError exception in
PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING
in case the native library could not be loaded. Made
VECTOR_LOGLESS_CACHING as the default implementation.
2. Updated the README with Mauricio's comments
3. baseline.cc is used within the library - if the machine supports
neither AVX nor SSE4.1, the native library falls back to un-vectorized
C++ in baseline.cc.
4. pairhmm-1-base.cc: This is not part of the library, but is being
heavily used for debugging/profiling. Can I request that we keep it
there for now? In the next release, we can delete it from the
repository.
5. I agree with Mauricio about the ifdefs. I am sure you already know,
but just to reassure you the debug code is not compiled into the library
(because of the ifdefs) and will not affect performance.
2014-02-28 08:59:55 -08:00
Chris Whelan e61ba8b340 Added command line checks for duplicate files in ROD lists
-- Keep a list of processed files in ArgumentTypeDescriptor.getRodBindingsCollection
  -- Throw user exception if a file name duplicates one that was previously parsed
  -- Throw user exception if the ROD list is empty
  -- Added two unit tests to RodBindingCollectionUnitTest
2014-02-27 13:32:18 -05:00
Karthik Gururaj 2d0ce45bb0 Moved JNI_README 2014-02-27 10:12:23 -08:00
Karthik Gururaj c645725fc3 1. Renamed directory structure from public/c++/VectorPairHMM to
public/VectorPairHMM/src/main/c++ as per Khalid's suggestion
2. Use java.home in public/VectorPairHMM/pom.xml to pass environment
variable JRE_HOME to the make process. This is needed because the
Makefile needs to compile JNI code with the flag -I<JRE_HOME>/../include (among
others). Assuming that the Maven build process uses a JDK (and not just
a JRE), the variable java.home points to the JRE inside maven.
3. Dropped all pretense at cross-platform compatibility. Removed Mac
profile from pom.xml for VectorPairHMM
2014-02-26 15:17:15 -08:00
Karthik Gururaj bd71ba35e5 Moved pom.xml to VectorPairHMM and updated artifactId 2014-02-26 14:01:46 -08:00
Khalid Shakir da587d48ed Using absolute paths in generated diff commands, to ease running them from any directory. 2014-02-27 04:43:39 +08:00
Khalid Shakir c163e6d0d2 Separate failsafe directories for each of the integration test types [#66515572] 2014-02-27 04:43:39 +08:00
Karthik Gururaj b81e2c2948 Native library part of git repo 2014-02-26 11:47:42 -08:00
Karthik Gururaj 0fe843bfd9 Followed Khalid's suggestion for packing libVectorLoglessCaching into
the jar file with Maven
2014-02-26 11:47:42 -08:00
Karthik Gururaj 15fe244e4b Now has PAPI values 2014-02-26 11:47:42 -08:00
Intel Repocontact e32e9e6af6 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2014-02-26 11:47:01 -08:00
Intel Repocontact ff2a972ab5 Merge branch 'master' of github.com:broadinstitute/gsa-unstable
Conflicts:
	.gitignore
2014-02-25 20:56:28 -08:00
Khalid Shakir f02ce6eca7 Added tests for cleaning up scattered .bai files, and using the log directory.
Re-added import java.io.File for BamGatherFunction.
Other cleanup to resolve scala syntax warnings from intellij.
Moved Example UG script to from protected to public.
2014-02-26 02:11:28 +08:00
pdexheimer 0405afeab2 Inherit BamGatherFunction from MergeSamFiles rather than PicardBamFunction
- This change means that BamGatherFunction will now have an @Output field for the BAM index, which will allow the bai to be deleted for intermediate functions

Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-26 02:11:28 +08:00
pdexheimer 504c125c26 Ensure .out files are saved into logDirectory
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-26 02:11:28 +08:00
pdexheimer 51dcd364a5 Added logDirectory argument
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-26 02:11:28 +08:00
Khalid Shakir 7e516b294f Replaced local drmaa and Jama artifacts with versions from maven central.
Removed unused caliper binary from local repo.
2014-02-22 01:21:35 +08:00
Khalid Shakir a75043b207 When git describe fails use "exported" instead of "unknown". 2014-02-22 01:21:35 +08:00
Khalid Shakir 4670c87313 Fixed mvn run for packagetests over external-example. 2014-02-22 01:21:34 +08:00
Khalid Shakir 70ecce2a0f Fixed scope for test-jar depedencies. 2014-02-22 01:21:34 +08:00
Eric Banks 235f0c6fa0 Merge pull request #528 from broadinstitute/eb_fix_cat_variants_usage_message
Fix the usage message for CatVariants to make it accurate.
2014-02-19 22:45:22 -05:00
Eric Banks 341d1bf2dd Fix the usage message for CatVariants to make it accurate.
It just hit a user on our forum...
2014-02-19 20:42:08 -05:00
Valentin Ruano-Rubio c167fb5fdf Fixing GenotypesGVCF.
Bug uncovered by some untrimmed alleles in the single sample pipeline output.

Notice however does not fix the untrimmed alleles in general.

Story:

https://www.pivotaltracker.com/story/show/65481104

Changes:

1. Fixed the bug itself.
2. Fixed non-working tests (sliently skipped due to exception in dataProvider).
2014-02-19 14:20:39 -05:00
Ryan Poplin 43c20264b0 Initial commit of the random forest classifier. 2014-02-17 13:07:27 -05:00
Khalid Shakir a505db79f5 Fixed build bug in ./ant-bridge.sh unittest -Dsingle=..., due to external-example.
pipeline.run property no longer required to be passed by test executor.
2014-02-15 13:52:20 +08:00
droazen 1e82f117ad Merge pull request #518 from broadinstitute/ks_skashin_gatkdocs_arguments
Ks skashin gatkdocs arguments
2014-02-14 13:57:19 -05:00
Eric Banks f6022a944b Merge pull request #513 from broadinstitute/eb_clean_up_genotype_posteriors
Various small fixes for CalculateGenotypePosteriors based on feedback fr...
2014-02-14 13:50:46 -05:00
Eric Banks 3724d4e5f3 Various small fixes for CalculateGenotypePosteriors based on feedback from guys in Ben Neale's group.
Note that this tool is still a work in progress and very experimental, so isn't 100% stable.  Most of
the features are untested (both by people and by unit/integration tests) because Chris Hartl implemented
it right before he left, and we're going to need to add tests at some point soon.  I added a first
integration test in this commit, but it's just a start.

The fixes include:

1. Stop having the genotyping code strip out AD values.  It doesn't make sense that it should do this so
I don't know why it was doing that at all.
Updated GenotypeGVCFs so that it doesn't need to manually recover them anymore.
This also helps CalculateGenotypePosteriors which was losing the AD values.
Updated code in LeftAlignAndTrimVariants to strip out PLs and AD, since it wasn't doing that before.
Updated the integration test for that walker to include such data.

2. Chris was calling Math.pow directly on the normalized posteriors which isn't safe.
Instead, the normalization routine itself can revert back to log scale in a safe manner so let's use it.
Also, renamed the variable to posteriorProbabilities (and not likelihoods).

3. Have CGP update the AC/AF/AN counts after fixing GTs.
2014-02-14 13:48:14 -05:00
kshakir 8b136d53b9 Merge pull request #524 from broadinstitute/ks_symlink_bin_jar
Create symlinks target/GenomeAnalysisTK.jar and target/Queue.jar
2014-02-15 02:32:59 +08:00
Khalid Shakir bc9ac93b6c Adding the external example to the build. 2014-02-15 01:26:07 +08:00
Khalid Shakir 2e99a6ecf8 Create symlinks target/GenomeAnalysisTK.jar and target/Queue.jar during package phase. 2014-02-15 01:12:32 +08:00
Nicholas Clarke 7ae19953f5 Squashed commit of the following:
commit 5e73b94eed3d1fc75c88863c2cf07d5972eb348b
Merge: e12593a d04a585
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Fri Feb 14 09:25:22 2014 +0000

    Merge pull request #1 from broadinstitute/checkpoint

    SimpleTimer passes tests, with formatting

commit d04a58533f1bf5e39b0b43018c9db3302943d985
Author: kshakir <github@kshakir.org>
Date:   Fri Feb 14 14:46:01 2014 +0800

    SimpleTimer passes tests, with formatting

    Fixed getNanoOffset() to offset nano to nano, instead of nano to seconds.
    Updated warning message with comma separated numbers, and exact values of offsets.

commit e12593ae66a5e6f0819316f2a580dbc7ae5896ad
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Wed Feb 12 13:27:07 2014 +0000

    Remove instance of 'Timer'.

commit 47a73e0b123d4257b57cfc926a5bdd75d709fcf9
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Wed Feb 12 12:19:00 2014 +0000

    Revert a couple of changes that survived somehow.

    - CheckpointableTimer,Timer -> SimpleTimer

commit d86d9888ae93400514a8119dc2024e0a101f7170
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Mon Jan 20 14:13:09 2014 +0000

    Revised commits following comments.

    - All utility merged into `SimpleTimer`.
    - All tests merged into `SimpleTimerUnitTest`.
    - Behaviour of `getElapsedTime` should now be consistent with `stop`.
    - Use 'TimeUnit' class for all unit conversions.
    - A bit more tidying.

commit 354ee49b7fc880e944ff9df4343a86e9a5d477c7
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Fri Jan 17 17:04:39 2014 +0000

    Add a new CheckpointableTimerUnitTest.

    Revert SimpleTimerUnitTest to the version before any changes were made.

commit 2ad1b6c87c158399ededd706525c776372bbaf6e
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Tue Jan 14 16:11:18 2014 +0000

    Add test specifically checking behaviour under checkpoint/restart.

    Slight alteration to the checkpointable timer based on observations
    during the testing - it seems that there's a fair amount of drift
    between the sources anyway, so each time we stop we resynchronise the
    offset. Hopefully this should avoid gradual drift building up and
    presenting as checkpoint/restart drift.

commit 1c98881594dc51e4e2365ac95b31d410326d8b53
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Tue Jan 14 14:11:31 2014 +0000

    Should use consistent time units

commit 6f70d42d660b31eee4c2e9d918e74c4129f46036
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Tue Jan 14 14:01:10 2014 +0000

    Add a new timer supporting checkpoint mechanisms.

    The issue with this is that the current timer is locked to JVM nanoTime. This can be reset after
    a checkpoint/restart and result in negative elapsed times, which causes an error.

    This patch addresses the issue in two ways:
     - Moves the check on timer information in GenomeAnalysisEngine.java to only occur if a time limit has been
    set.
     - Create a new timer (CheckpointableTimer) which keeps track of the relation between system and nano time. If
    this changes drastically, then the assumption is that there has been a JVM restart owing to checkpoint/restart.
    Any time straddling a checkpoint/restart event will not be counted towards total running time.

Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-14 21:45:47 +08:00
Laura Gauthier 29bb3d4dc1 Check for empty BAM lists in command line input 2014-02-14 08:09:47 -05:00
Khalid Shakir 225ee4880b Using new parameters via skashin to run gatkdocs in the maven conventional subdirectory.
Updated path for output gatkdocs in nightly build script.
Removed patch in plugin manager that contained a workaround for gatkdocs running in the top level directory.
2014-02-14 15:57:21 +08:00
skashin 1b3ac95798 Added the following arguments: -settings-dir -destination-dir -forum-key-path
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-14 14:28:35 +08:00
Eric Banks 7095a60c8e Merge pull request #516 from broadinstitute/dr_reenable_tests_failing_due_to_java_update
Re-enable tests that were failing post-maven due to changes in Java's Math.pow() implementation
2014-02-13 21:05:18 -05:00
David Roazen 4b4b93ad1b Re-enable tests that were failing post-maven due to changes in Java's Math.pow() implementation
After extensive detective work, Joel determined that these tests were failing
due to changes in the implementation of Math.pow() in newer versions of
Java 1.7.

All GSA members should ensure that they're using a JDK that is at least
as current as the one in the Java-1.7 dotkit on the Broad servers
(build 1.7.0_51-b13).
2014-02-12 16:08:16 -05:00
Eric Banks 5bde7fbf37 Merge pull request #511 from broadinstitute/dr_enable_exclusions_in_maven_package_tests
Exclude all transitive dependencies in maven package-tests
2014-02-12 15:38:39 -05:00
Joel Thibault ef87b051b0 Rev Picard to 1.107.1683 (4 jars) 2014-02-12 15:25:50 -05:00
David Roazen 6f12c8b0dc Exclude all transitive dependencies in maven package-tests
This change should allow us to test that the GATK jar has been
correctly packaged at release time, by ensuring that only the
packaged jar + a few test-related dependencies are on the classpath
when tests are run.

Note that we still need to actually test that this works as intended
before we can make this live in the Bamboo release plan.
2014-02-12 14:59:05 -05:00
David Roazen 95e1402d21 Add ability to run *KnowledgeBaseTests to maven
Run with: mvn verify -Dsting.knowledgebasetests.skipped=false
2014-02-11 14:08:24 -05:00
Khalid Shakir 1666bb7e3a Patched PluginManager to ignore null classes, that will allow gatkdocs to build successfully when running from the source root directory, due to its hardcoded paths. 2014-02-12 00:48:58 +08:00
Karthik Gururaj 316501b32e Fixed denominator in profiling 2014-02-10 10:11:03 -08:00
Karthik Gururaj d081c19178 Minor: added support in C++ sandbox to choose implementation and check
from command line
2014-02-09 18:05:35 -08:00
Ryan Poplin b81494b704 Merge pull request #499 from broadinstitute/eb_fix_ad_updates
Fixed bug in generating AD values when new alleles are present for genot...
2014-02-09 17:55:00 -05:00
Eric Banks abb67cfa5e Fixed bug in generating AD values when new alleles are present for genotpying GVCFs.
This was a dumb mistake that wasn't well tested (but is now).
2014-02-09 15:15:19 -05:00
Khalid Shakir 12bb6fd361 Removed use of picard private.
Updated picard-maven script to tag locally modified builds with -SNAPSHOT.
Removed old picard jars.
2014-02-09 17:08:52 +08:00
Khalid Shakir 4e0f7521f2 Made scala.maxmemory an argument, and defaulted it to 1g. 2014-02-09 09:24:44 +08:00
Karthik Gururaj a03d83579b Matrices in baseline C++ (no vector) implementation of PairHMM are now
allocated on heap using "new". Stack allocation led to program crashes
for large matrix sizes.
2014-02-07 23:22:05 -08:00
Karthik Gururaj 20a46e4098 Check only for SSE 4.1 (rather than SSE 4.2) when trying to use the SSE
implementation of PairHMM
2014-02-07 15:19:55 -08:00
Eric Banks d689f61005 Fixed up some of the genotype-level annotations being propogated in the single sample HC pipeline.
1. AD values now propogate up (they weren't before).
2. MIN_DP gets transferred over to DP and removed.
3. SB gets removed after FS is calculated.

Also, added a bunch of new integration tests for GenotypeGVCFs.
2014-02-07 12:47:54 -05:00
Eric Banks db68d3fa10 Fixing failing unit tests 2014-02-07 12:24:14 -05:00
Eric Banks 2648219c42 Implementation of a hierarchical merger for gVCFs, called CombineGVCFs.
This tool will take any number of gVCFs and create a merged gVCF (as opposed to
GenotypeGVCFs which produces a standard VCF).

Added unit/integration tests and fixed up GATK docs.
2014-02-07 08:49:18 -05:00
mghodrat 7815c30df8 Adding comments to pairhmm-template-kernel 2014-02-06 20:13:06 -08:00
Karthik Gururaj b729fc0136 1. Split main JNI function into initializeTestcases, compute_testcases
and releaseReads
2. FTZ enabled
3. Cleaner profiling code
2014-02-06 14:35:32 -08:00
Karthik Gururaj 166f91d698 Merge branch 'test_branch'
Conflicts:
	public/c++/VectorPairHMM/LoadTimeInitializer.cc
	public/c++/VectorPairHMM/pairhmm-1-base.cc
	public/c++/VectorPairHMM/utils.cc
	public/c++/VectorPairHMM/utils.h

Merged test_branch with intel_pairhmm
2014-02-06 11:18:18 -08:00
Karthik Gururaj fab6f57e97 1. Enabled FTZ in LoadTimeInitializer.cc
2. Added Sandbox.java for testing
3. Moved compute to utils.cc (inside library)
4. Added flag for disabling FTZ in Makefile
2014-02-06 11:01:33 -08:00
Karthik Gururaj 78642944c0 1. Moved break statement in utils.cc to correct position
2. Tested sandbox with regions
3. Lots of profiling code from previous commit exists
2014-02-06 09:32:56 -08:00
Khalid Shakir b21c35482e Packages link private/testdata, so that mvn test -Dsting.serialunittests.skipped=false works. 2014-02-06 08:25:38 -05:00
Khalid Shakir 3848159086 Added a set of serial tests to gatk/queue packages, which runs all tests under their package in one TestNG execution.
New properties to disable regenerating example resources artifact when each parallel test runs under packagetest.
Moved collection of packagetest parameters from shell scripts into maven profiles.
Fixed necessity of test-utils jar by removing incorrect dependenciesToScan element during packagetests.
When building picard libraries, run clean first.
Fixed tools jar dependency in picard pom.
Integration tests properly use the ant-bridge.sh test.debug.port variable, like unit tests.
2014-02-06 08:25:38 -05:00
Valentin Ruano Rubio 988e3b4890 Merge pull request #487 from broadinstitute/vrr_reference_model_with_trimming
Get gVCF to work without --dontTrimActiveRegions
2014-02-05 22:52:17 -05:00
Valentin Ruano-Rubio 98ffcf6833 Get gVCF to work without --dontTrimActiveRegions
Story:

https://www.pivotaltracker.com/story/show/65048706
https://www.pivotaltracker.com/story/show/65116908

Changes:

ActiveRegionTrimmer in now an argument collection and it returns not only the trimmed down active region but also the non-variant containing flanking regions
HaplotypeCaller code has been simplified significantly pushing some functionality two other classes like ActiveRegion and AssemblyResultSet.

Fixed a problem with the way the trimming was done causing some gVCF non-variant records no have conservative 0,0,0 PLs
2014-02-05 22:50:45 -05:00
Karthik Gururaj acda6ca27b 1. Whew, finally debugged the source of performance issues with PairHMM
JNI. See copied text from email below.
2. This commit contains all the code used in profiling, detecting FP
exceptions, dumping intermediate results. All flagged off using ifdefs,
but it's there.
--------------Text from email
As we discussed before, it's the denormal numbers that are causing the
slowdown - the core executes some microcode uops (called FP assists)
when denormal numbers are detected for FP operations (even un-vectorized
code).
The C++ compiler by default enables flush to zero (FTZ) - when set, the
hardware simply converts denormal numbers to 0. The Java binary
(executable provided by Oracle, not the native library) seems to be
compiled without FTZ (sensible choice, they want to be conservative).
Hence, the JNI invocation sees a large slowdown. Disabling FTZ in C++
slows down the C++ sandbox performance to the JNI version (fortunately,
the reverse also holds :)).
Not sure how to show the overhead for these FP assists easily - measured
a couple of counters.
FP_ASSISTS:ANY - shows number of uops executed as part of the FP
assists. When FTZ is enabled, this is 0 (both C++ and JNI), when FTZ is
disabled this value is around 203540557 (both C++ and JNI)
IDQ:MS_UOPS_CYCLES - shows the number of cycles the decoder was issuing
uops when the microcode sequencing engine was busy. When FTZ is enabled,
this is around 1.77M cycles (both C++ and JNI), when FTZ is disabled
this value is around 4.31B cycles (both C++ and JNI). This number is
still small with respect to total cycles (~40B), but it only reflects
the cycles in the decode stage. The total overhead of the microcode
assist ops could be larger.
As suggested by Mustafa, I compared intermediate values (matrices M,X,Y)
and final output of compute_full_prob. The values produced by C++ and
Java are identical to the last bit (as long as both use FTZ or no-FTZ).
Comparing the outputs of compute_full_prob for the cases no-FTZ and FTZ,
there are differences for very small values (denormal numbers).
Examples:
Diff values 1.952970E-33 1.952967E-33
Diff values 1.135071E-32 1.135070E-32
Diff values 1.135071E-32 1.135070E-32
Diff values 1.135071E-32 1.135070E-32
For this test case (low coverage NA12878), all these values would be
recomputed using the double precision version. Enabling FTZ should be
fine.
-------------------End text from email
2014-02-05 17:09:57 -08:00
Ryan Poplin 693bfac341 Bug fix for missing annotations in CombineReferenceCalculationVariants. They were being dropped in the handoff between engines in a couple of places.
-- Updated single sample pipeline test data using Valentin's files and re-enabled CRCV tests
2014-02-05 12:58:48 -05:00
Eric Banks 740b33acbb We were never validating the sequence dictionary of tabix indexed VCFs for some reason. Fixed.
These changes happened in Tribble, but Joel clobbered them with his commit.
We can now change the logging priority on failures to validate the sequence dictionary to WARN.
Thanks to Tim F for indirectly pointing this out.
2014-02-05 10:12:38 -05:00
Eric Banks 9cac24d1e6 Moving logging status of VCF indexing to DEBUG instead of INFO, otherwise it's painful when reading in lots of files 2014-02-05 10:12:37 -05:00
Eric Banks 91bdf069d3 Some updates to CRCV.
1. Throw a user error when the input data for a given genotype does not contain PLs.
2. Add VCF header line for --dbsnp input
3. Need to check that the UG result is not null
4. Don't error out at positions with no gVCFs (which is possible when using a dbSNP rod)
2014-02-05 10:12:37 -05:00
Joel Thibault 7923e786e9 Rev Picard (public) to 1.107.1676
- Rename snappy to snappy-java
- Add maven-metadata-local.xml to .gitignore
2014-02-04 22:04:28 -05:00
Joel Thibault 0025fe190d Exclude sam's older TestNG 2014-02-04 22:04:27 -05:00
Karthik Gururaj 24f8aef344 Contains profiling, exception tracking, PAPI code
Contains Sandbox Java
2014-02-04 16:27:29 -08:00
David Roazen 76086f30b7 Temporarily disable tests that started failing post-maven
Joel is working on these failures in a separate branch. Since
maven (currently! we're working on this..) won't run the whole
test suite to completion if there's a failure early on, we need
to temporarily disable these tests in order to allow group members
to run tests on their branches again.
2014-02-04 15:31:24 -05:00
David Roazen 3b2f07990d Re-break the MWUnitTest for Joel to debug 2014-02-04 15:19:09 -05:00
David Roazen c9032f0b5c Fix failing unit tests 2014-02-04 03:05:30 -05:00
Khalid Shakir a4289711e2 Distinct failsafe summary reports, just like invoker report directories. 2014-02-03 13:50:47 -05:00
Khalid Shakir 857e6e0d6f Bumped version to 2.8-SNAPSHOT, using new update_pom_versions.sh script. 2014-02-03 13:50:46 -05:00
Khalid Shakir 9ca3004fc3 Setting the test-utils' type to test-jar, such that the multi-module build uses testClasses instead of classes as a directory dependency. 2014-02-03 13:50:46 -05:00
Khalid Shakir de13f41fc3 One step closer to a proper test-utils artifact. Using the maven-jar-plugin to create a test classifer, excluding actual tests, until we can properly separate the classes into separate artifacts/modules. 2014-02-03 13:50:46 -05:00
Khalid Shakir 25aee7164e Fixed missing "mvn" command execution in ant-bridge.
Added pom.xml workarounds for duplicate classpath error, due to gatk-framework dependency containing required BaseTest, and jarred *UnitTest/*IntegrationTest classes that also exist as files under target/test-classes.
2014-02-03 13:50:46 -05:00
Khalid Shakir caa76cdac4 Added maven pom.xmls for various artifacts. 2014-02-03 13:50:46 -05:00
Khalid Shakir d1a689af33 Added new utility files used by maven build, including the ant-bridge script. 2014-02-03 13:50:46 -05:00
Khalid Shakir 88150e0166 Switched commited dependency repository from ivy to maven. 2014-02-03 13:50:46 -05:00
Khalid Shakir 1e25a758f5 Moved files to maven directories.
Here are the git moved directories in case other files need to be moved during a merge:
  git-mv private/java/src/        private/gatk-private/src/main/java/
  git-mv private/R/scripts/       private/gatk-private/src/main/resources/
  git-mv private/java/test/       private/gatk-private/src/test/java/
  git-mv private/testdata/        private/gatk-private/src/test/resources/
  git-mv private/scala/qscript/   private/queue-private/src/main/qscripts/
  git-mv private/scala/src/       private/queue-private/src/main/scala/
  git-mv protected/java/src/      protected/gatk-protected/src/main/java/
  git-mv protected/java/test/     protected/gatk-protected/src/test/java/
  git-mv public/java/src/         public/gatk-framework/src/main/java/
  git-mv public/java/test/        public/gatk-framework/src/test/java/
  git-mv public/testdata/         public/gatk-framework/src/test/resources/
  git-mv public/scala/qscript/    public/queue-framework/src/main/qscripts/
  git-mv public/scala/src/        public/queue-framework/src/main/scala/
  git-mv public/scala/test/       public/queue-framework/src/test/scala/
2014-02-03 13:50:44 -05:00
Khalid Shakir faaef236ea Moved gsalib, R and other resources, Queue GATK extensions generator, Queue version java files. 2014-02-03 13:49:21 -05:00
Khalid Shakir eb52dc6a9b Moved build.xml, ivy.xml, ivysettings.xml, ivy properties, public/packages/*.xml into private/archive/ant 2014-02-03 13:49:20 -05:00
Karthik Gururaj 6d4d776633 Includes code for all debug code for obtaining profiling info 2014-01-30 12:08:06 -08:00
Valentin Ruano-Rubio 89c4e57478 gVCF <NON_REF> in all vcf lines including variant ones when –ERC gVCF is requested.
Changes:
-------

  <NON_REF> likelihood in variant sites is calculated as the maximum possible likelihood for an unseen alternative allele: for reach read is calculated as the second best likelihood amongst the reported alleles.

  When –ERC gVCF, stand_conf_emit and stand_conf_call are forcefully set to 0. Also dontGenotype is set to false for consistency sake.

  Integration test MD5 have been changed accordingly.

Additional fix:
--------------

  Specially after adding the <NON_REF> allele, but also happened without that, QUAL values tend to go to 0 (very large integer number in log 10) due to underflow when combining GLs (GenotypingEngine.combineGLs). To fix that combineGLs has been substituted by combineGLsPrecise that uses the log-sum-exp trick.

  In just a few cases this change results in genotype changes in integration tests but after double-checking using unit-test and difference between combineGLs and combineGLsPrecise in the affected integration test, the previous GT calls were either border-line cases and or due to the underflow.
2014-01-30 11:23:33 -05:00
Karthik Gururaj 5c7427e48c Temporary commit containing debug profiling code - commented out 2014-01-29 12:10:29 -08:00
Karthik Gururaj 0c63d6264f 1. Added synchronization block around loadLibrary in
VectorLoglessPairHMM
2. Edited Makefile to use static libraries where possible
2014-01-27 15:34:58 -08:00
Karthik Gururaj a15137a667 Modified run.sh 2014-01-27 14:56:46 -08:00
Karthik Gururaj 2c0d70c863 Moved vector JNI code to public/c++/VectorPairHMM 2014-01-27 14:52:59 -08:00
Karthik Gururaj 85a748860e 1. Added more profiling code
2. Modified JNI_README
2014-01-27 14:32:44 -08:00
Valentin Ruano-Rubio 748d2fdf92 Added Integration test to verify the bugs are not there anymore as reported in pivotracker 2014-01-26 23:29:31 -05:00
Karthik Gururaj 018e9e2c5f 1. Cleaned up code
2. Split into DebugJNILoglessPairHMM and VectorLoglessPairHMM with base
class JNILoglessPairHMM. DebugJNILoglessPairHMM can, in principle,
invoke any other child class of JNILoglessPairHMM.
3. Added more profiling code for Java parts of LoglessPairHMM
2014-01-26 19:18:12 -08:00
Valentin Ruano-Rubio 9e7bf75e89 Fix for the PairHMM transition probability miscalculation.
Problem:

matchToMatch transition calculation was wrong resulting in transition probabilites coming out of the Match state that added more than 1.

Reports:

https://www.pivotaltracker.com/s/projects/793457/stories/62471780
https://www.pivotaltracker.com/s/projects/793457/stories/61082450

Changes:

The transition matrix update code has been moved to a common place in PairHMMModel to dry out its multiple copies.

MatchToMatch transtion calculation has been fixed and implemented in PairHMMModel.

Affected integration test md5 have been updated, there were no differences in GT fields and example differences always implied
small changes in likelihoods that is what is expected.
2014-01-26 16:30:36 -05:00
Karthik Gururaj 81bdfbd00d Temporary commit before moving to new native library 2014-01-24 16:29:35 -08:00
Karthik Gururaj 733a84e4f9 Added support to transfer haplotypes once per region to the JNI
Re-use transferred haplotypes (stored in GlobalRef) across calls to
computeLikelihoods
2014-01-22 10:52:41 -08:00
Karthik Gururaj 88c08e78e7 1. Inserted #define in sandbox pairhmm-template-main.cc
2. Wrapped _mm_empty() with ifdef SIMD_TYPE_SSE
3. OpenMP disabled
4. Added code for initializing PairHMM's data inside initializePairHMM -
not used yet
2014-01-21 09:57:14 -08:00
Karthik Gururaj 7180c392af 1. Integrated Mohammad's SSE4.2 code, Mustafa's bug fix and code to fix the
SSE compilation warning.
2. Added code to dynamically select between AVX, SSE4.2 and normal C++ (in
that order)
3. Created multiple files to compile with different compilation flags:
avx_function_prototypes.cc is compiled with -xAVX while
sse_function_instantiations.cc is compiled with -xSSE4.2 flag.
4. Added jniClose() and support in Java (HaplotypeCaller,
PairHMMLikelihoodCalculationEngine) to call this function at the end of
the program.
5. Removed debug code, kept assertions and profiling in C++
6. Disabled OpenMP for now.
2014-01-20 08:03:42 -08:00
Yossi Farjoun c79e8ca53e Added an info log containing the SAM/BAM files that were eventually found from the commandline (useful for when there are files hiding inside bam.lists which may or may not have been constructed correctly...)
Added a @hidden option controling the appearance of the full BamList in the log
2014-01-17 11:25:21 -05:00
Karthik Gururaj f1c772ceea Same log message as before - forgot -a option
1. Moved computeLikelihoods from PairHMM to native implementation
2. Disabled debug - debug code still left (hopefully, not part of
    bytecode)
3. Added directory PairHMM_JNI in the root which holds the C++
library that contains the PairHMM AVX implementation. See
PairHMM_JNI/JNI_README first
2014-01-16 21:40:04 -08:00
Eric Banks de56134579 Fixed up and refactored what seems to be a useful private tool to create simulated reads around a VCF.
It didn't completely work before (it was hard-coded for a particular long-lost data set) but it should work now.
Since I thought that it might prove useful to others, I moved it to protected and added integration tests.

GERALDINE: NEW TOOL ALERT!
2014-01-15 13:49:31 -05:00
Geraldine Van der Auwera edf5880022 Updated SAMPileup codec and pileup-related docs
Problem: the codec was written to take in consensus pileups produced with pileup -c option (which consists of 10 or 13 fields per line depending on the variant type) but errored out on the basic pileup format (which only has 6 fields per line). This was inconsistent and confusing to users.

	Solution: I added a switch in the parsing to recognize and handle both cases more appropriately, and updated related docs. While I was at it I also improved error messages in CheckPileup, which now emits User Error: Bad Input exceptions when reporting mismatches. Which may not be the best thing to do (ultimately they're not really errors, they're just reporting unwelcome results) but it beats emitting Runtime Exceptions.

	Tested by CheckPileupIntegrationTest which tests both format cases.
2014-01-14 09:14:16 -05:00
Eric Banks 16ecc53749 Merge pull request #469 from broadinstitute/gg_gatkdoc_fixes
Assorted fixes and improvements to gatkdocs
2014-01-14 05:56:07 -08:00
droazen 347fab4717 Merge pull request #471 from broadinstitute/eb_output_log_info_for_tim
Adding more meta information about the user to the GATK logging output, per Tim F's request.
2014-01-13 17:48:40 -08:00
Geraldine Van der Auwera bdb3954eb3 removed maxRuntime minValue 2014-01-13 20:45:43 -05:00
Geraldine Van der Auwera 8fcad6680b Assorted fixes and improvements to gatkdocs
-Added docs for ERC mode in HC
 -Move RecalibrationPerformance walker since to private since it is experimental and unsupported
 -Updated VR docs and restored percentBad/numBad (but @Hidden) to enable deprecation alert if users try to use them
 -Improved error msg for conflict between per-interval aggregation and -nt
 -Minor clean up in exception docs
 -Added Toy Walkers category for devs and dev supercat (to build out docs for developers)
 -Added more detailed info to GenotypeConcordance doc based on Chris forum post
 -Added system to include min/max argument values in gatkdocs (build gatkdocs with 'ant gatkdocs' to test it, see engine and DoC args for in situ examples)
 -Added tentative min/max argument annotations to DepthOfCoverage and CommandLineGATK arguments (and improved docs while at it)
 -Added gotoDev annotation to GATKDocumentedFeature to track who is the go-to person in GSA for questions & issues about specific walkers/tools (now discreetly indicated in each gatkdoc)
2014-01-13 17:46:22 -05:00
Eric Banks 851ec67bdc Adding more meta information about the user to the GATK logging output, per Tim F's request. 2014-01-13 14:36:02 -05:00
droazen 7cd304fb41 Merge pull request #470 from broadinstitute/mf_new_RBP
Mf new rbp
2014-01-13 08:46:27 -08:00
Eric Banks 0323caefc8 Added some bug fixes to the gVCF merging code after finally getting some real data to play with.
Still under construction, awaiting more test data from Valentin.
2014-01-08 08:34:35 -05:00
Eric Banks f172c349f6 Adding the functionality to enable users to input a file of VCFs for -V.
To do this I have added a RodBindingCollection which can represent either a VCF or a
file of VCFs.  Note that e.g. SelectVariants allows a list of RodBindingCollections so
that one can intermix VCFs and VCF lists.

For VariantContext tags with a list, by default the tags for the -V argument are applied
unless overridden by the individual line.  In other words, any given line can have either
one token (the file path) or two tokens (the new tags and the file path).  For example:
foo.vcf
VCF,name=bar bar.vcf

Note that a VCF list file name must end with '.list'.

Added this functionality to CombineVariants, CombineReferenceCalculationVariants, and VariantRecalibrator.
2014-01-08 00:45:00 -05:00
Menachem Fromer d1275651ae Merge remote-tracking branch 'origin/master' into mf_new_RBP 2014-01-03 01:13:40 -05:00
Ami Levy-Moonshine 6da53aea09 Write a new tool for spliting reads that have N cigar string.
For example, this tool can be used for processing bowtie RNA-seq data.
Each read with k N-cigar elemments is plit to k+1 reads. The split is done by hard clipping the bases rest of the bases.

In order to do it, few changes were introduced to some other clipping methods:
- make a segnificant change in ClippingOp.hardClip() that prevent the spliting of read with cigar: 1M2I1N1M3I.
- change getReadCoordinateForReferenceCoordinate in ReadUtil to recognize Ns

create unitTests for that walker:
- change ReadClipperTestUtils to be more general in order to use its code and avoid code duplication
- move some useful methods from ReadClipperTestUtils to CigarUtils

create integration test for that class

small change in a comment in FullProcessingPipeline

last commit:

Address review comments:
- move to protected under walkers/rnaseq
- change the read splitting methods to be more readable and more efficiant
- change (minor changes) some methods in ReadClipper to allow the changes in split reads
- add (minor change) one method to CigarUtils to allow the changes in split reads
- change ReadUtils.getReadCoordinateForReferenceCoordinate to include possible N in the cigar
- address the rest of the review comments (minor changes)

- fix ReadUtilsUnitTest.testReadWithNs acoording to the defult behaviour of getReadCoordinateForReferenceCoordinate (in case of refernce index that fall into deletion, return the read index of the base before the deletion).
- add another test to ReadUtilsUnitTest.testReadWithNs

- Allow the user to print the split positions (not working proparly currently)
2014-01-01 22:21:36 -05:00
Mauricio Carneiro d1febb89c8 Better documentation for ReadClippingStats walker
* add overall walker GATKDocs
* add explanation for skip parameter and make it advanced
* reverse the logic on exculding unmapped reads for clarity
* fix read length  calculation to no longer include indels

ps: I am not sure how useful this walker is (I didn't write it) but the skip logic is poor and
calculates the entire statistic for the reads it is eventually going to skip. This would be an easy
fix, but only worth our time if people actually use this.
2014-01-01 14:26:26 -05:00
Eric Banks f82a7c3f4c Updating variant jar.
The update contains:
1. documentation changes for VariantContext and Allele (which used to discuss the now obsolete null allele)
2. better error messages for VCFs containing complex rearrangements with breakends
3. instead of failing badly on format field lists with '.'s, just ignore them
Also, there is a trivial change to use a more efficient method to remove a bunch of attributes from a VC.

Delivers PT#s 59675378, 59496612, and 60524016.
2013-12-31 22:48:29 -05:00
Eric Banks 5a1564d1f2 Merge pull request #456 from broadinstitute/eb_unify_hc_combination_steps
Created a new walker to do the full combination of N gVCFs from the HC single-sample ref calc pipeline.
2013-12-31 18:57:27 -08:00
Eric Banks 83e09b1f64 Created a new walker to do the full combination of N gVCFs from the HC single-sample ref calc pipeline.
Basically, it does 3 things (as opposed to having to call into 3 separate walkers):
1. merge the records at any given position into a single one with all alleles and appropriate PLs
2. re-genotype the record using the exact AF calculation model
3. re-annotate the record using the VariantAnnotatorEngine

In the course of this work it became clear that we couldn't just use the simpleMerge() method used
by CombineVariants; combining HC-based gVCFs is really a complicated process.  So I added a new
utility method to handle this merging and pulled any related code out of CombineVariants.  I tried
to clean up a lot of that code, but ultimately that's out of the scope of this project.

Added unit tests for correctness testing.
Integration tests cannot be used yet because the HC doesn't output correct gVCFs.
2013-12-31 12:07:56 -05:00
Menachem Fromer 48ef7a1a2f Merge remote-tracking branch 'origin/master' into mf_new_RBP 2013-12-19 10:42:20 -05:00
David Roazen 4a79831adc Add ability to specify min/max required/recommended values for numeric arguments in the @Argument annotation
-You can now add "minValue", "maxValue", "minRecommendedValue", and "maxRecommendedValue" attributes
 to @Argument annotations for command-line arguments

-"minValue" and "maxValue" specify hard limits that generate an exception if violated

-"minRecommendedValue" and "maxRecommendedValue" specify soft limits that generate a warning if violated

-Works only for numeric arguments (int, double, etc.) with @Argument annotations

-Only considers values actually specified by the user on the command line, not default values
 assigned in the code

As requested by Geraldine
2013-12-18 18:09:08 -05:00
Eric Banks 400e7c1404 Fixed bug in the filtering of lifted over variants where a deletion at the end of a contig could cause it to error out.
Added a unit test.
2013-12-11 14:07:18 -05:00
Eric Banks 418fbdfbab Added HC trio calls and NA12878 KB snapshot to resource bundle.
Also, don't touch the current link until the resources are finished being produced.
2013-12-07 22:08:34 -05:00
David Roazen 932cd3ada7 Fix 3rd-party library dependency issues in the HC/PairHMM tests
In general, test classes cannot use 3rd-party libraries that are not
also dependencies of the GATK proper without causing problems when,
at release time, we test that the GATK jar has been packaged correctly
with all required dependencies.

If a test class needs to use a 3rd-party library that is not a GATK
dependency, write wrapper methods in the GATK utils/* classes, and
invoke those wrapper methods from the test class.
2013-12-06 13:16:55 -05:00
David Roazen 0e65296efb Rev picard, sam-jdk, tribble, and variant jars to 1.104.1628
-update VariantFiltration to work with new Lazy wrapper around the
 JexlEngine in VariantContextUtils
2013-12-05 12:45:32 -05:00
Joel Thibault 5fe0531b4d Throw a GVCFIndexException when the user doesn't specify the optimal indexing strategy 2013-12-03 23:12:14 -05:00
Joel Thibault 8571a641bf Add @Advanced to variant_index_type and variant_index_parameter 2013-12-03 23:12:14 -05:00
Joel Thibault fd0a02e52e New VCF engine arguments to specify an alternate IndexCreator
- CatVariants updates to use custom VCF indices
- Scala scripts for VCF index testing
2013-12-03 13:31:02 -05:00
Joel Thibault 42f78bdb3a Add a class-based DataProvider 2013-12-03 13:31:01 -05:00
Joel Thibault cd3ee2ae7e whitespace 2013-12-03 13:31:01 -05:00
Eric Banks 6bee6a1b53 Change the behavior of SelectVariants for PL/AD when it encounters a record that has lost one or more alternate alleles.
Previously, we would strip out the PLs and AD values since they were no longer accurate.  However, this is not ideal because
then that information is just lost and 1) users complain on the forum and post it as a bug and 2) it gives us problems in both
the current and future (single sample) calling pipelines because we subset samples/alleles all the time and lose info.

Now the PLs and AD get correctly selected down.

While I was in there I also refactored some related code in subsetDiploidAlleles().  There were no real changes there - I just
broke it out into smaller chunks as per our best practices.

Added unit tests and updated integration tests.
Addressed reviews.
2013-12-03 09:23:03 -05:00
Valentin Ruano-Rubio 0f99778a59 Adding Graph-based likelihood ratio calculation to HC
To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line.

New HC Options (both Advanced and Hidden):
==========================================

  --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM)

Specifies what engine should be used to generate read vs haplotype likelihoods.

  PairHMM : standard full-PairHMM approach.
  GraphBased : using the assembly graph to accelarate the process.
  Random : generate random likelihoods - used for benchmarking purposes only.

  --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN)

It idicates how to merge haplotypes produced using different kmerSizes.
Only has effect when used in combination with (--likelihooCalculationEngine GraphBased)

  COMBO_MIN : use the smallest kmerSize with all haplotypes.
  COMBO_MAX : use the larger kmerSize with all haplotypes.
  MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it.
  MAX_ONLY : use the larger kmerSize with haplotypes asembled using it.

Major code changes:
===================

 * Introduce multiple likelihood calculation engines (before there was just one).

 * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class.

 * Added yet another PairHMM implementation with a different API in order to spport
   local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype).

Major components:
================

 * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations

 * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution
     of the graph-based likelihood approach.

 * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals
     to calcualte the likelihoods using the graph as an scafold.

 * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one
     used by GraphBasedLikelihoodCalculationEngineInstance to do its work.

 * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is
     used by GraphBasedLikelihoodCalcuationEngineInstance to do its work.

Remove mergeCommonChains from HaplotypeGraph creation

Fixed bamboo issues with HaplotypeGraphUnitTest

Fixed probrems with HaplotypeCallerIntegrationTest

Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest

Fixed ReadThreadingLikelihoodCalculationEngine issues

Moved event-block iteration outside GraphBased*EngineInstance

Removed unecessary parameter from ReadAnchoring constructor.
Fixed test problem

Added a bit more documentation to EventBlockSearchEngine

Fixing some private - protected dependency issues

Further refactoring making GraphBased*Instance and HaplotypeGraph slimmer. Addressed last pull request commit comments

Fixed FastLoglessPairHMM public -> protected dependency

Fixed probrem with HaplotypeGraph unit test

Adding Graph-based likelihood ratio calculation to HC

  To active this feature add '--likelihoodCalculationEngine GraphBased' to the HC command line.

New HC Options (both Advanced and Hidden):
==========================================

  --likelihoodCalculationEngine PairHMM/GraphBased/Random (default PairHMM)

Specifies what engine should be used to generate read vs haplotype likelihoods.

  PairHMM : standard full-PairHMM approach.
  GraphBased : using the assembly graph to accelarate the process.
  Random : generate random likelihoods - used for benchmarking purposes only.

  --heterogeneousKmerSizeResolution COMBO_MIN/COMBO_MAX/MAX_ONLY/MIN_ONLY (default COMBO_MIN)

It idicates how to merge haplotypes produced using different kmerSizes.
Only has effect when used in combination with (--likelihooCalculationEngine GraphBased)

  COMBO_MIN : use the smallest kmerSize with all haplotypes.
  COMBO_MAX : use the larger kmerSize with all haplotypes.
  MIN_ONLY : use the smallest kmerSize with haplotypes assembled using it.
  MAX_ONLY : use the larger kmerSize with haplotypes asembled using it.

Major code changes:
===================

 * Introduce multiple likelihood calculation engines (before there was just one).

 * Assembly results from different kmerSies are now packed together using the AssemblyResultSet class.

 * Added yet another PairHMM implementation with a different API in order to spport
   local PairHMM calculations, (e.g. a segment of the read vs a segment of the haplotype).

Major components:
================

 * FastLoglessPairHMM: New pair-hmm implemtation using some heuristic to speed up partial PairHMM calculations

 * GraphBasedLikelihoodCalculationEngine: delegates onto GraphBasedLikelihoodCalculationEngineInstance the exectution
     of the graph-based likelihood approach.

 * GraphBasedLikelihoodCalculationEngineInstance: one instance per active-region, implements the graph traversals
     to calcualte the likelihoods using the graph as an scafold.

 * HaplotypeGraph: haplotype threading graph where build from the assembly haplotypes. This structure is the one
     used by GraphBasedLikelihoodCalculationEngineInstance to do its work.

 * ReadAnchoring and KmerSequenceGraphMap: contain information as how a read map on the HaplotypeGraph that is
     used by GraphBasedLikelihoodCalcuationEngineInstance to do its work.

Remove mergeCommonChains from HaplotypeGraph creation

Fixed bamboo issues with HaplotypeGraphUnitTest

Fixed probrems with HaplotypeCallerIntegrationTest

Fixed issue with GraphLikelihoodVsLoglessAccuracyIntegrationTest

Fixed ReadThreadingLikelihoodCalculationEngine issues

Moved event-block iteration outside GraphBased*EngineInstance

Removed unecessary parameter from ReadAnchoring constructor.
Fixed test problem

Added a bit more documentation to EventBlockSearchEngine

Fixing some private - protected dependency issues

Further refactoring making GraphBased*Instance and HaplotypeGraph slimmer. Addressed last pull request commit comments

Fixed FastLoglessPairHMM public -> protected dependency

Fixed probrem with HaplotypeGraph unit test
2013-12-02 19:37:19 -05:00
Chris Hartl 1f777c4898 Introducing the latest-and-greatest in genotyping: CalculatePosteriors.
CalculatePosteriors enables the user to calculate genotype likelihood posteriors (and set genotypes accordingly) given one or more panels containing allele counts (for instance, calculating NA12878 genotypes based on 1000G EUR frequencies). The uncertainty in allele frequency is modeled by a Dirichlet distribution (parameters being the observed allele counts across each allele), and the genotype state is modeled by assuming independent draws (Hardy-Weinberg Equilibrium). This leads to the Dirichlet-Multinomial distribution.

Currently this is implemented only for ploidy=2. It should be straightforward to generalize. In addition there's a parameter for "EM" that currently does nothing but throw an exception -- another extension of this method is to run an EM over the Maximum A-Posteriori (MAP) allele count in the input sample as follows:
 while not converged:
  * AC = [external AC] + [sample AC]
  * Prior = DirichletMultinomial[AC]
  * Posteriors = [sample GL + Prior]
  * sample AC = MLEAC(Posteriors)

This is more useful for large callsets with small panels than for small callsets with large panels -- the latter of these being the more common usecase.

Fully unit tested.

Reviewer (Eric) jumped in to address many of his own comments plus removed public->protected dependencies.
2013-11-27 13:00:45 -05:00
Geraldine Van der Auwera 429582589f Set SAMFileWriter to create index in ReadUtils to fix SplitSamFile issue 2013-11-26 15:54:47 -05:00
Geraldine Van der Auwera 25bc6e64ae Patched Queue extensions lacking a main class definition 2013-11-22 14:57:09 -05:00
Ami Levy-Moonshine 6ad841cec5 Rewrite ReadLengthDistribution to count the read lengths into a hash table first and only at the end to produce a GATK report table.
Before that fix, the tool was couldn't work with more then one RG before.
- Address all review comments
2013-11-18 17:29:31 -05:00
Ami Levy-Moonshine 9c1023c933 fix a (ugly) weird error from last commit that changed all the scala files to end with MoleculoPipeline.scala 2013-11-18 11:44:24 -05:00
MauricioCarneiro 7f08250870 Merge pull request #417 from broadinstitute/bt_pairhmm_api_cleanup2
Improve the PairHMM API for better FPGA integration
2013-11-14 10:47:07 -08:00
bradtaylor e40a07bb58 Improve the PairHMM API for better FPGA integration
Motivation:
The API was different between the regular PairHMM and the FPGA-implementation
via CnyPairHMM. As a result, the LikelihoodCalculationEngine had
to use account for this. The goal is to change the API to be the same
for all implementations, and make it easier to access.

PairHMM
PairHMM now accepts a list of reads and a map of alleles/haplotpes and returns a PerReadAlleleLikelihoodMap.
Added a new primary method that loops the reads and haplotypes, extracts qualities,
and passes them to the computeReadLikelihoodGivenHaplotypeLog10 method.
Did not alter that method, or its subcompute method, at all.
PairHMM also now handles its own (re)initialization, so users don't have to worry about that.

CnyPairHMM
Added that same new primary access method to this FPGA class.
Method overrides the default implementation in PairHMM. Walks through a list of reads.
Individual-read quals and the full haplotype list are fed to batchAdd(), as before.
However, instead of waiting for every read to get added, and then walking through the reads
again to extract results, we just get the haplotype-results array for each read as soon as it
is generated, and pack it into a perReadAlleleLikelihoodMap for return.
The main access method is now the same no matter whether the FPGA CnyPairHMM is used or not.

LikelihoodCalculationEngine
The functionality to loop through the reads and haplotypes and get individual log10-likelihoods
was moved to the PairHMM, and so removed from here. However, this class does need to retain
the ability to pre-process the reads, and post-process the resulting likelihoods map.
Those features were separated from running the HMM and refactored into their own methods
Commented out the (unused) system for finding best N haplotypes for genotyping.

PairHMMIndelErrorModel
Similar changes were made as to the LCE. However, in this case the haplotypes are modified
based on each individual read, so the read-list we feed into the HMM only has one read.
2013-11-14 09:45:33 -05:00
Geraldine Van der Auwera f22ab033f6 Merge pull request #424 from broadinstitute/gg_yetanothergatkdocfix
Yet another gatkdoc fix
2013-11-13 11:35:59 -08:00
Geraldine Van der Auwera dac3dbc997 Improved gatkdocs for InbreedingCoefficient, ReduceReads, ErrorRatePerCycle
Clarified caveat for InbreedingCoefficient
Cleaned up docstrings for ReduceReads
Brushed up doc for ErrorRatePerCycle
2013-11-13 14:33:04 -05:00
Phillip Dexheimer 296bcc7fb1 Changed name of jobs submitted to cluster job runners
-- Added 'jobRunnerJobName' definition to QFunction, defaults to value of shortDescription
-- Edited Lsf and Drmaa JobRunners to use this string instead of description for naming jobs in the scheduler

Signed-off-by: Joel Thibault <thibault@broadinstitute.org>
2013-11-12 14:34:56 -05:00
Mauricio Carneiro 725656ae7e Generalizing the FullProcessingPipeline Qscript
We have generalized the processing script to be able to handle multiple scenarios. Originally it was
designed for PCR free data only, we added all the steps necessary to start from fastq and process
RNA-seq as well as non-human data. This is our go to script in TechDev.

   * add optional "starting from fastq" path to the pipeline
   * add mark duplicates (optionally) to the pipeline
   * add an option to run with the mouse data (without dbsnp and with single ended fastq)
   * add option to process RNA-seq data from topHat (add RG and reassign mapping quality if necessary)
   * add option to filter or include reads with N in the cigar string
   * add parameter to allow keeping the intermediate files
2013-11-07 16:34:29 -05:00
Eric Banks f15355856a Merge pull request #418 from broadinstitute/eb_fix_liftover_script
Fixing the liftover script to not require strict VCF header validation.
2013-11-07 06:04:56 -08:00
Eric Banks 2fc40a0aed Fixing the liftover script to not require strict VCF header validation.
Apparently no one has used the liftover script for a while (which I guess is a good thing)...
2013-11-07 09:02:17 -05:00
Eric Banks 0e3d83d1ef Merge pull request #413 from broadinstitute/rp_qd_and_qual_updates_in_ref_model_pipeline
Improvements to the reference model pipeline.
2013-11-05 06:33:17 -08:00
Eric Banks 09dfaf1a68 Merge pull request #416 from broadinstitute/mc_quick_fixes_to_cser_pipeline
Add interpretation to QualifyMissingIntervals
2013-11-05 06:08:13 -08:00
Eric Banks 96024403bf Update the dbsnp version in the bundle from 137 to 138; resolves PT #59771004. 2013-11-04 10:01:22 -05:00
Ryan Poplin b22c9c2cb4 Improvements to the reference model pipeline.
-- We use the RegenotypeVariants walker to recompute the qual field. (instead of the discussed idea of adding this functionality to CombineVariants)
-- QualByDepth will now be recomputed even if the stratified contexts are missing. This greatly improves the QD estimate for this pipeline. Doesn't work for multi-allelics since the qual can't be recomputed.
2013-11-01 17:58:25 -04:00
Eric Banks cafcb34855 Merge pull request #411 from broadinstitute/eb_add_exome_intervals_to_bundle_script
Updated the GATK bundle script to:
2013-10-29 07:38:44 -07:00
Eric Banks 209f2a61aa Updated the GATK bundle script to:
1. Include exome target list for b37
2. Not delete the 'current' link unless -run is applied to the command line!  (sorry, Ryan)
2013-10-29 10:33:51 -04:00
Louis Bergelson 9498950b1c Adding more specific error message when one of the scripts doesn't exist.
--Previously it gave a cryptic message:
----IO error while decoding blarg.script with UTF-8
----Please try specifying another one using the -encoding option
2013-10-21 14:57:42 -04:00
David Roazen 5a2ef37ead Tweak dcov documentation to help prevent user confusion
Geraldine-approved!
2013-10-16 15:24:33 -04:00
Mauricio Carneiro efbfdb64fe Qscript to Downsample and analyze an exome BAM
this script downsamples an exome BAM several times and makes a coverage distribution
analysis (of bases that pass filters) as well as haplotype caller calls with a NA12878
Knowledge Base assessment with comparison against multi-sample calling
with the UG.

This script was used for the "downsampling the exome" presentation
2013-10-10 14:37:33 -04:00
Chris Hartl 55bab9fa87 Merged bug fix from Stable into Unstable 2013-10-10 13:01:12 -04:00
Chris Hartl 06d28c7f8b VariantsToBinaryPed: Move .fam file writing to initialize to ensure ordering matches the ordering of the VCF. Change the documentation to clarify that the fam files are not directly copied, but subset and re-ordered. 2013-10-10 12:53:15 -04:00
Mauricio Carneiro 5d6421494b Fix mismatching number of columns in report
Quick fix the missing column header in the QualifyMissingIntervals
report.

Adding a QScript for the tool as well as a few minor updates to the
GATKReportGatherer.
2013-10-09 14:38:15 -04:00
Ryan Poplin f3a67edc24 Merge pull request #402 from broadinstitute/gg_dcov_docs
Improvements to gatkdocs related to downsampling
2013-09-27 07:07:21 -07:00
kshakir a29f1f84bf Merge pull request #397 from lbergelson/lb_scala_2.10.2
Update scala from 2.9 to 2.10.2
2013-09-26 21:51:43 -07:00
Geraldine Van der Auwera 66d0235efc Minor clarifications & formatting tweaks for dcov docs 2013-09-26 14:28:22 -04:00
Michael McCowan 5113e21437 Bug fix: annotation values ar parsed as Doubles when they should be parsed as Integers due to implicit conversion.
* Updated expected test data in which an integer annotation (MQ0) was formatted as a double.
2013-09-25 13:12:02 -04:00
Louis Bergelson c05208ecec Resolving warnings
--specifying exception types in cases where none was already specified
----mostly changed to catch Exception instead of Throwable
----EmailMessage has a point where it should only be expecting a RetryException but was catching everything

--changing build.xml so that it prints scala feature warning details

--added necessary imports needed to remove feature warnings

--updating a newly deprecated enum declaration to match the new syntax
2013-09-23 12:42:22 -04:00
Louis Bergelson b32ad99d3f Changing from scala 2.9.2 to 2.10.2.
--modified ivy dependencies
--modified scala classpath in build.xml to include scala-reflect

--changed imports to point to the new scala scala.reflect.internal.util

--set the bootclasspath in QScriptManager as well as the classpath variable.

--removing Set[File] <-> Set[String] conversions
----Set is invariant now and the conversions broke
--removing unit tests for Set[File] <-> Set[String] conversions
2013-09-23 12:42:22 -04:00
chapmanb 2f5064dd1d Provide close methods to clean up resources used while creating AlignmentContexts from BAM file regions. Allows utilization of CoveredLocusView via the API
Signed-off-by: David Roazen <droazen@broadinstitute.org>
2013-09-10 15:32:54 -04:00
Geraldine Van der Auwera 292426b504 Merge pull request #390 from broadinstitute/mc_update_clipreads
Added REVERT SOFTCLIPPED bases to ClipReads
2013-09-09 16:43:03 -07:00
Geraldine Van der Auwera 8b829255e7 Clarified docs on using clipping options 2013-09-09 19:40:03 -04:00
MauricioCarneiro 014bc4269e Merge pull request #361 from broadinstitute/bt_pairhmm_array_implementation
Add Array Logless PairHMM
2013-09-08 20:16:53 -07:00
Ryan Poplin 3503050a39 Created a single sample calling pipeline which leverages the reference model calculation mode of the HaplotypeCaller
-- Adding changes to CombineVariants to work with the Reference Model mode of the HaplotypeCaller.
-- Added -combineAnnotations mode to CombineVariants to merge the info field annotations by taking the median
-- Added new StrandBiasBySample genotype annotation for use in computing strand bias from single sample input vcfs
-- Bug fixes to calcGenotypeLikelihoodsOfRefVsAny, used in isActive() as well as the reference model
-- Added active region trimming capabilities to the reference model mode, not perfect yet, turn off with --dontTrimActiveRegions
-- We only realign reads in the reference model if there are non-reference haplotypes, a big time savings
-- We only realign reads in the reference model if the read is informative for a particular haplotype over another
-- GVCF blocks will now track and output the minimum PLs over the block

-- MD5 changes!
-- HC tests: from bug fixes in calcGenotypeLikelihoodsOfRefVsAny
-- GVCF tests: from HC changes above and adding in active region trimming
2013-09-06 16:56:34 -04:00
Mauricio Carneiro b6c3ed0295 Added REVERT SOFTCLIPPED bases to ClipReads 2013-09-06 09:30:01 -04:00