gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Valentin Ruano-Rubio	18deeec6b0	Addresses issue with strict validation on GVCF files. More concretelly Picard's strict VCF validation does not like that there is alternative alleles that are not participating in any genotype call across samples. This is an issue with GVCF in the single-sample pipeline where this is certainly expected with <NON_REF> and other relative unlikely alleles. To solve this issue we allow the user to exclude some of the strict validations using a new argument --validationTypeToExclude. In order to avoid the validation issue with GVCF the user needs to add the following to the command line: '--validationTypeToExclude ALLELES' Story: https://www.pivotaltracker.com/story/show/68725164 Changes: - Added validateTypeToExclude argument to ValidateVariants walker. - Implemented the selective exclusion of validation types. - Added new info and improved existing documentation of the ValidateVariants walker. Tests: - ValidateVariantsIntegrationTest#testUnusedAlleleError - ValidateVariantsIntegrationTest#testUnusedAlleleFix	2014-04-04 14:37:10 -04:00
Eric Banks	7174f8cfeb	IndelRealigner throws a user error when it encounters reads with I operators greater than the number of read bases. Added test to ensure it works.	2014-04-03 18:16:24 -04:00
Eric Banks	a3d55b3341	Make sure to fail in all cases where the BAM being used was created by ReduceReads. In some cases, the program records were being removed from the BAM headers by the GATK engine before we applied the check for reduced reads (so we did not fail appropriately). Pushed up the check to happen before the PG tags are modified and added a unit test to ensure it stays that way. It turns out that some UG tests still used reduced bams so I switched to use different ones. Based on reviewer feedback, made it more generic so that it's easy to add new unsupported tools.	2014-04-03 16:52:41 -04:00
Eric Banks	0b73573abc	Slightly modifying the way to use the IUPAC ambiguity codes in the FastaAlternateReferenceMaker. Previously it required you to create a single sample VCF and then to pass that in to the tool, but Geraldine convinced me that this was a pain for users (because they usually have multi-sample VCFs). Instead now you can pass in a multi-sample VCF and specify which sample's genotypes should be used for the IUPAC encoding. Therefore the argument changed from '--useIUPAC' to '--use_IUPAC_sample NA12878'.	2014-04-02 21:34:25 -04:00
Valentin Ruano-Rubio	84711b8e90	Fixed bug using GraphBased due to infinite likelihoods resulting from the calculation of alignment cost of very long insertion or deletions (done in linear scale) Stories: https://www.pivotaltracker.com/story/show/66263868 Bug: The problem was due to the way we were calculating the fix penalty of a large deletion or insertion. In this case we calculate the alignment likelihood of the portion or read or haplotype deletion as the penalty of that deletion/insertion without going through the full pair-hmm process. For large events this resulted in a 0 in in linear scale computations that ins transformed into an infinity in log scale. Changes: - Change to use log10 scale for calculate those penalties. - Minor addition of .gitignore to hide ./public/external-example/target which is generated by the building process.	2014-04-01 16:14:52 -04:00
Eric Banks	821fbe7260	Merge pull request #582 from broadinstitute/vrr_hc_bugfixes_dangling_heads Fix loss of key alternative haplotypes due to a change on threading star...	2014-03-31 10:42:08 -04:00
Joel Thibault	2049eb1658	Rev Picard 1.110.1763 - SamPairUtils migrated in Picard r1737 - Revert IndelRealigner changes made in commit 4f4b85 -- Those changes were based on Picard revision 1722 to net/sf/picard/sam/SamPairUtil.java -- Picard revision 1723 reverts these changes, so we also revert to match	2014-03-30 09:33:57 -04:00
Valentin Ruano-Rubio	258b2bce28	Fix loss of key alternative haplotypes due to a change on threading start policy required when recovering dangling heads. Story: - https://www.pivotaltracker.com/story/show/67601310 Change: - Unless recover-danging-heads is active, the threading starting location policy is the original one. i.e. just at already existing unique kmer vertices. Tests: - HaplotypeCallerIntegrationTest#testMissingKeyAlternativeHaplotypesBugFix	2014-03-29 22:40:26 -04:00
Ryan Poplin	6566dd6ca9	Fix for dropping of reference sample depth in the DP annotation. -- In the case of hierarchical merge we can't assume that we have only one genotype. -- Removed use of deprecated VC annotation access functions.	2014-03-24 14:01:50 -04:00
Eric Banks	32a96e3ab3	Fix for reads that are all insertions (e.g. 50I) and causing the IndelRealigner to error out.	2014-03-21 15:01:34 -04:00
Eric Banks	7c8ce3cd6a	Several improvements to GenotypeGVCFs: --includeNonVariantSites now actually works and we propagate AD to hom ref samples	2014-03-20 00:35:54 -04:00
Eric Banks	824983af1d	Enable CombineGVCFs to process gVCFs that were created with basepair resolution.	2014-03-19 19:23:05 -04:00
Eric Banks	3b1c337401	Have CombineVariants throw a UserError when trying to combine GVCFs from the HaplotypeCaller. Was previously throwing an IllegalArgumentException (in the wrong place in the code). Error message tells users to use CombineGVCFs.	2014-03-19 19:11:40 -04:00
Valentin Ruano-Rubio	905b6066b2	Reduce runtime of very long integration test	2014-03-18 21:48:13 -04:00
David Roazen	2d8653f493	Update pom versions to mark the start of GATK 3.2 development	2014-03-18 01:18:59 -04:00
David Roazen	a6a41c777c	Update pom versions for 3.1	2014-03-18 01:09:29 -04:00
Alec Wysoker	0369f93b24	GATK changes to conform to Tribble refactoring as part improving Tabix support in Tribble (among other things). 1. Enable on-the-fly indexing for vcf.gz. 2. Handle on-the-fly indexing where file to be indexed is not a regular file, thus index should not be created. 3. Add method setProgressLogger to all SAMFileWriter implementations. 4. Revved picard to 1.109.1722 5. IndelRealigner md5s change because the MC tag is added to records now. Fixed up and signed off by ebanks.	2014-03-17 11:56:22 -04:00
Eric Banks	34c697bf12	Merge pull request #554 from broadinstitute/bh_SOR_new_annotation Bh sor new annotation	2014-03-17 10:58:13 -04:00
Laura Gauthier	40c13d446a	Added documentation category for CalculateGenotypePosteriors	2014-03-17 10:36:19 -04:00
Valentin Ruano-Rubio	2e964c59b4	Improved criteria to select best haplotypes out from the assembly graph. Currently the best haplotypes are those that accumulate the largest ABSOLUTE edge multiplicity sum across their path in the assembly graph. The edge mulitplicity is equal to the number of reads that expand through that edge, i.e. have a kmer that uniquely map to some vertex up-stream from the edge and the following base calls extend across that edge to vertices downstream from it. Despite that it is obvious that higher multiplicties correlated with haplotype probability this criterion fails short in some regards of which the most relevant is: As it is evaluated in condensed seq-graph (as supposed to uncompressed read-threading-graphs) it is bias to haplotypes that have more short-sequence vetices ( -> ATGC -> CA -> has worse score than -> A -> T -> G -> C -> C -> A ->). This is partly result of how we modify the edge multiplicities when we merge vertices from a linear chain. This pull-request addresses the problem by changing to a new scoring schema based in likelihood estimates: Each haplotype's likelihood can be calculated as the multiplication of the likelihood of "taking" its edges in the assembly graph. The likelihood of "taking" an edge in the assembly graph is calculated as its multiplicity divide by the sum of multiplicity of edges that share the same source vertex. This pull-request addresses the following stories: https://www.pivotaltracker.com/story/show/66691418 https://www.pivotaltracker.com/story/show/64319760 Change Summary: 1. Change to the new scoring schema. 2. Added a graph DOT printing code to KBestHaplotypeFinder in order to diagnose scoring. 3. Graph transformation have been modified in order to generate no 0-multiplicity edges. (Nevertheless the schema above should work with 0 edges assuming that they are in fact 0.5)	2014-03-14 18:37:01 -04:00
Bertrand Haas	82108d110f	New abstract class StrandBiasTest() with old sub-class FisherStrand() and new sub-class StrandOddsRatio(). Latter is test based on symmetric odds ratio more appropriate than Fisher exact test when number of samples is large. https://www.pivotaltracker.com/story/show/66087886	2014-03-14 18:33:21 -04:00
Eric Banks	7c7ff90266	Merge pull request #558 from broadinstitute/rp_vqsr_nondeterminism_fix Fix for non-determinism in the VQSR with very large data sets	2014-03-12 14:35:51 -04:00
Eric Banks	ffaf92f871	Added new functionality to the FastaAlternateReferenceMaker to have it output IUPAC codes for het sites. Enable it with the new --useIUPAC argument. Added both unit and integration tests for the new functionality - and fixed up the exising tests once I was in there.	2014-03-12 14:31:57 -04:00
Ryan Poplin	907d1d6160	Fix for non-determinism in the VQSR with very large data sets	2014-03-12 10:25:12 -04:00
ldgauthier	4e74e77e74	Merge pull request #555 from broadinstitute/eb_add_option_to_CGVCFs_for_all_sites_GVCF Added an option to CombineGVCFs to create basepair resolution gVCFs from...	2014-03-12 10:01:18 -04:00
David Roazen	c67ced5f3b	Emit a warning whenever the VectorLoglessPairHMM is used	2014-03-12 09:55:35 -04:00
Eric Banks	d697e0144f	Added an option to CombineGVCFs to create basepair resolution gVCFs from banded ones. Use the --convertToBasePairResolution argument to enable this functionality.	2014-03-12 01:32:51 -04:00
Ryan Poplin	34d11fe40c	Added the consensus mode used for the 1000 Genomes Project to the HaplotypeCaller. -- All the provided alleles are added to the assembly graph as potential haplotypes but they aren't forcibly genotyped like in GGA mode. -- Added integration test for this mode	2014-03-11 09:56:35 -04:00
droazen	8b53567dc7	Merge pull request #553 from broadinstitute/dr_rename_pipeline_tests Rename existing PipelineTests to QueueTests to prepare for upcoming push of new pipeline tests	2014-03-10 21:36:45 -04:00
David Roazen	78562c14bb	Rename existing PipelineTests to QueueTests to prepare for upcoming push of new pipeline tests -These tests are really integration tests for Queue rather than generalized pipeline tests, so it makes sense to call them QueueTests. -Rename test classes and maven build targets, and update shell scripts to reflect new naming.	2014-03-10 21:24:03 -04:00
David Roazen	7c34f05082	Merge remote-tracking branch 'origin/master' into intel	2014-03-10 14:07:36 -04:00
David Roazen	5a6aa54673	Revert "Update HaplotypeCaller and VariantAnnotator test MD5s" This reverts commit 7faa44d576b06d7aef29562e82590a7855f216f4.	2014-03-10 14:06:51 -04:00
David Roazen	e7d6db033b	Revert "Revert "Change default HaplotypeCaller PairHMM implementation back to LOGLESS_CACHING"" This reverts commit c8a34749e631b92214a57bba162c6e0d849425f1.	2014-03-10 14:05:51 -04:00
David Roazen	f070583f29	Update HaplotypeCaller and VariantAnnotator test MD5s There are a few innocuous test failures on this branch -- updating MD5s after reviewing the differences in output	2014-03-07 10:54:27 -05:00
Karthik Gururaj	6e98e9e589	Removed g_haplotype* global variables in native code so that it works with multi-threading in Java. Modified VectorLoglessPairHMM.java so that jniInitializeRegion and jniFinalizeRegion are empty	2014-03-06 22:08:35 -08:00
David Roazen	3f3df90412	Revert "Change default HaplotypeCaller PairHMM implementation back to LOGLESS_CACHING" This reverts commit cef03f089fb3f131f3a77664b71feaec51a74cc8.	2014-03-06 10:15:35 -05:00
David Roazen	9df59bd8cc	Update pom versions to mark the start of GATK 3.1 development	2014-03-06 00:05:58 -05:00
David Roazen	34edcb8ddf	Update pom versions for the 3.0 release	2014-03-05 23:37:21 -05:00
David Roazen	53895e15cd	Change default HaplotypeCaller PairHMM implementation back to LOGLESS_CACHING	2014-03-05 19:26:37 -05:00
Eric Banks	d3de6413c9	Move warnings to debug logging status because they will definitely scare users	2014-03-05 15:05:21 -05:00
Karthik Gururaj	51b8ea5d59	Reset version	2014-03-05 11:19:08 -08:00
Karthik Gururaj	b9afe800ae	Merge correction	2014-03-05 10:06:45 -08:00
Karthik Gururaj	8fcbf9272c	Merge branch 'intel_pairhmm' of /data/broad/gsa-unstable into intel_pairhmm Conflicts: protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java public/VectorPairHMM/src/main/c++/Sandbox.java	2014-03-05 09:35:50 -08:00
Intel Repocontact	d81116eb1d	Added vectorized PairHMM implementation by Mohammad and Mustafa into the Maven build of GATK. C++ code has PAPI calls for reading hardware counters Followed Khalid's suggestion for packing libVectorLoglessCaching into the jar file with Maven Native library part of git repo 1. Renamed directory structure from public/c++/VectorPairHMM to public/VectorPairHMM/src/main/c++ as per Khalid's suggestion 2. Use java.home in public/VectorPairHMM/pom.xml to pass environment variable JRE_HOME to the make process. This is needed because the Makefile needs to compile JNI code with the flag -I<JRE_HOME>/../include (among others). Assuming that the Maven build process uses a JDK (and not just a JRE), the variable java.home points to the JRE inside maven. 3. Dropped all pretense at cross-platform compatibility. Removed Mac profile from pom.xml for VectorPairHMM Moved JNI_README 1. Added the catch UnsatisfiedLinkError exception in PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING in case the native library could not be loaded. Made VECTOR_LOGLESS_CACHING as the default implementation. 2. Updated the README with Mauricio's comments 3. baseline.cc is used within the library - if the machine supports neither AVX nor SSE4.1, the native library falls back to un-vectorized C++ in baseline.cc. 4. pairhmm-1-base.cc: This is not part of the library, but is being heavily used for debugging/profiling. Can I request that we keep it there for now? In the next release, we can delete it from the repository. 5. I agree with Mauricio about the ifdefs. I am sure you already know, but just to reassure you the debug code is not compiled into the library (because of the ifdefs) and will not affect performance. 1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java 2. Committing the right set of files after rebase Added public license text to all C++ files Added license to Makefile Add package info to Sandbox.java Conflicts: protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/DebugJNILoglessPairHMM.java protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/JNILoglessPairHMM.java protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/VectorLoglessPairHMM.java public/VectorPairHMM/src/main/c++/.gitignore public/VectorPairHMM/src/main/c++/LoadTimeInitializer.cc public/VectorPairHMM/src/main/c++/LoadTimeInitializer.h public/VectorPairHMM/src/main/c++/Makefile public/VectorPairHMM/src/main/c++/Sandbox.cc public/VectorPairHMM/src/main/c++/Sandbox.h public/VectorPairHMM/src/main/c++/Sandbox.java public/VectorPairHMM/src/main/c++/Sandbox_JNIHaplotypeDataHolderClass.h public/VectorPairHMM/src/main/c++/Sandbox_JNIReadDataHolderClass.h public/VectorPairHMM/src/main/c++/baseline.cc public/VectorPairHMM/src/main/c++/define-double.h public/VectorPairHMM/src/main/c++/define-float.h public/VectorPairHMM/src/main/c++/define-sse-double.h public/VectorPairHMM/src/main/c++/define-sse-float.h public/VectorPairHMM/src/main/c++/headers.h public/VectorPairHMM/src/main/c++/jnidebug.h public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.cc public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.h public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.cc public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.h public/VectorPairHMM/src/main/c++/pairhmm-template-kernel.cc public/VectorPairHMM/src/main/c++/pairhmm-template-main.cc public/VectorPairHMM/src/main/c++/run.sh public/VectorPairHMM/src/main/c++/shift_template.c public/VectorPairHMM/src/main/c++/utils.cc public/VectorPairHMM/src/main/c++/utils.h public/VectorPairHMM/src/main/c++/vector_function_prototypes.h	2014-03-05 09:30:29 -08:00
Laura Gauthier	43fdd38342	Add error handling to CalculateGenotypePosteriors to catch multiallelic variants with wrong number of ACs -- throws UserException; added tests in PosteriorLikelihoodsUtilsUnitTests Add error handling to CalculateGenotypePosteriors for cases where MLEAC>AN; add tests in PosteriorLikelihoodsUtilsUnitTests Add unit tests to confirm that CalculateGenotypePosteriors has the ability to switch genotypes for four cases	2014-03-05 12:03:18 -05:00
Laura Gauthier	7f9f58dbd1	Added hidden flag to GenotypeConcordance to output sites of discordant genotypes (to System.out) Revised ConcondanceMetrics tests to adapt to change Added comments to PosteriorLikelihoodsUtils	2014-03-05 12:03:18 -05:00
Karthik Gururaj	2648b41398	Added vectorized PairHMM implementation by Mohammad and Mustafa into the Maven build of GATK. C++ code has PAPI calls for reading hardware counters Followed Khalid's suggestion for packing libVectorLoglessCaching into the jar file with Maven Native library part of git repo 1. Renamed directory structure from public/c++/VectorPairHMM to public/VectorPairHMM/src/main/c++ as per Khalid's suggestion 2. Use java.home in public/VectorPairHMM/pom.xml to pass environment variable JRE_HOME to the make process. This is needed because the Makefile needs to compile JNI code with the flag -I<JRE_HOME>/../include (among others). Assuming that the Maven build process uses a JDK (and not just a JRE), the variable java.home points to the JRE inside maven. 3. Dropped all pretense at cross-platform compatibility. Removed Mac profile from pom.xml for VectorPairHMM Moved JNI_README 1. Added the catch UnsatisfiedLinkError exception in PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING in case the native library could not be loaded. Made VECTOR_LOGLESS_CACHING as the default implementation. 2. Updated the README with Mauricio's comments 3. baseline.cc is used within the library - if the machine supports neither AVX nor SSE4.1, the native library falls back to un-vectorized C++ in baseline.cc. 4. pairhmm-1-base.cc: This is not part of the library, but is being heavily used for debugging/profiling. Can I request that we keep it there for now? In the next release, we can delete it from the repository. 5. I agree with Mauricio about the ifdefs. I am sure you already know, but just to reassure you the debug code is not compiled into the library (because of the ifdefs) and will not affect performance. 1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java 2. Committing the right set of files after rebase Added public license text to all C++ files Added license to Makefile Add package info to Sandbox.java	2014-03-05 08:31:24 -08:00
Intel Repocontact	6aa67a2585	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2014-03-05 08:14:32 -08:00
Intel Repocontact	1de2d2546e	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2014-03-05 00:04:21 -08:00
Valentin Ruano-Rubio	69bf2b3247	Added a more efficient implementation of the KBest haplotype finder code (CONT.) Changes: 1. Addressed review comments on new K-best haplotype assembly graph finder. 2. Generalize KBestHaplotypeFinder to deal with multiple source and sink vertices. 3. Updated test to use KBestHaplotypeFinder instead of KBestPaths 4. Retired KBestPaths to the archive. 5. Small improvements to the code and documentation.	2014-03-04 23:22:27 -05:00
Valentin Ruano-Rubio	7acf2eb0e7	Added a more efficient implementation of the KBest haplotype finder code. Story: https://www.pivotaltracker.com/story/show/66238286 Changes: 1. Created a new k-best haplotype search implementation in class KBestHaplotypeFinder. 2. Changed HC code to use the new implementation. This seems to fix the original problem without causing significant changes in outputs using some empirical data test cases 3. Moved haplotype's cigar calculation code from Path to CigarUtils; need that in order to gain independence from Path in some parts of the code. In any case that seems like a more natural location for that functionality.	2014-03-04 12:22:14 -05:00
Eric Banks	b99bf85ec8	Fixed bug where dangling tail merging occasionally created a cycle in the graph. Added unit tests to cover this case. Delivers PT#66690470.	2014-03-03 22:42:56 -05:00
Eric Banks	4d69af189e	Minor change: make the --dontUseSoftClippedBases @Advanced instead of @Hidden	2014-03-03 15:59:32 -05:00
Eric Banks	fa65716fe9	Added code to retrieve dangling heads from the read threading graph (previously we were rescuing just the tails). The purpose of this is to be able to call SNPs that fall at the beginning of a capture region (or exon). Before, the read threading code would only start threading from the first kmer that matched the reference. But that means that, in the case of a SNP at the beginning of an exome, it wouldn't start threading the read until after the SNP position - so we'd lose the SNP. For now, this is still very experimental. It works well for RNAseq data, but does introduce FPs in normal exomes. I know why this is and how to fix it, but it requires a much larger fix to the HC: the HC needs to pass all reads and bases to the annotation engine (like UG does) instead of just the high quality ones. So for now, the head merging is disabled by default. As per reviewer comments, I moved the head and tail merging code out into their own class.	2014-03-03 15:59:26 -05:00
amilev	cecdd2f2c5	Merge pull request #539 from broadinstitute/eb_hard_clip_exon_overhangs_for_ami Add the capability to the N-cigar splitter to also hard-clip off overhan...	2014-03-03 12:23:11 -05:00
Eric Banks	6c872308d8	Add the capability to the N-cigar splitter to also hard-clip off overhangs based on observed split positions. We use a "manager" to keep track of observed splits and previous reads. This can be extended/modified in the future to try to salvage those overhangs instead of hard-clipping them and/or try other possible strategies. Added unit tests and more integration tests.	2014-03-02 21:10:34 -05:00
Eric Banks	22ad18b919	Moving Reduce Reads to the archive. The GATK now fails with a user error if you try to run with a reduced bam. (I added a unit test for that; everything else here is just the removal of all traces of RR)	2014-03-02 02:03:14 -05:00
Karthik Gururaj	1b395a871a	1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java 2. Committing the right set of files after rebase	2014-02-28 16:08:28 -08:00
Karthik Gururaj	37526dfad5	1. Added the catch UnsatisfiedLinkError exception in PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING in case the native library could not be loaded. Made VECTOR_LOGLESS_CACHING as the default implementation. 2. Updated the README with Mauricio's comments 3. baseline.cc is used within the library - if the machine supports neither AVX nor SSE4.1, the native library falls back to un-vectorized C++ in baseline.cc. 4. pairhmm-1-base.cc: This is not part of the library, but is being heavily used for debugging/profiling. Can I request that we keep it there for now? In the next release, we can delete it from the repository. 5. I agree with Mauricio about the ifdefs. I am sure you already know, but just to reassure you the debug code is not compiled into the library (because of the ifdefs) and will not affect performance.	2014-02-28 08:59:55 -08:00
Karthik Gururaj	0fe843bfd9	Followed Khalid's suggestion for packing libVectorLoglessCaching into the jar file with Maven	2014-02-26 11:47:42 -08:00
Karthik Gururaj	15fe244e4b	Now has PAPI values	2014-02-26 11:47:42 -08:00
Intel Repocontact	e32e9e6af6	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2014-02-26 11:47:01 -08:00
Intel Repocontact	ff2a972ab5	Merge branch 'master' of github.com:broadinstitute/gsa-unstable Conflicts: .gitignore	2014-02-25 20:56:28 -08:00
Khalid Shakir	f02ce6eca7	Added tests for cleaning up scattered .bai files, and using the log directory. Re-added import java.io.File for BamGatherFunction. Other cleanup to resolve scala syntax warnings from intellij. Moved Example UG script to from protected to public.	2014-02-26 02:11:28 +08:00
Eric Banks	0f30df0356	Stopgap procedure to rescue Fisher Strand for cases where there's lots of data. This commit consists of 2 main changes: 1. When the strand table gets too large, we normalize it down to values that are more reasonable. 2. We don't include a particular sample's contribution unless the total ref and alt counts are at least 2 each; this is a heuristic method for dealing only with hets. MD5s change as expected. Hopefully we'll have a more robust implementation for GATK 3.1.	2014-02-25 01:04:27 -05:00
droazen	e8ea9f58d3	Merge pull request #531 from broadinstitute/ks_build_patches Build patches	2014-02-24 15:13:16 -05:00
Valentin Ruano-Rubio	0b3a70b8c1	Fix for a bug a bug in (Assembly Graph) Routes. The slicePrefix method functionality was broken. Story: https://www.pivotaltracker.com/story/show/64595624 Changes: 1. Fixed the bug. 2. Added unit test to check on the method functionality. 3. Added a integration test to verify the bug has been fixed in a empirical data reprudible case.	2014-02-24 10:54:39 -05:00
Khalid Shakir	7e516b294f	Replaced local drmaa and Jama artifacts with versions from maven central. Removed unused caliper binary from local repo.	2014-02-22 01:21:35 +08:00
Valentin Ruano-Rubio	463af7143f	Activate reverse allele trimming in GVCF Story: https://www.pivotaltracker.com/s/projects/1007536 Changes: 1. HC's GenotypingEngine now invokes reverseAlleleTrimming on GVCF variant output lines. 2. GenotypeGVCFs also reverse trim after regenotyping as some alt. alleles are dropped (observed in real-data).	2014-02-20 03:17:24 -05:00
Eric Banks	53a7d5cbae	Fixing a bug in the GVCF writer. The writer was never resetting the pointer to the end of the last non-ref VariantContext that it saw. This was fine except when it jumped to a new contig - and a lower position on that contig - where it thought that it was still part of that previous non-ref VariantContext so wouldn't emit a reference block. Therefore, ref blocks were missing from the beginnings of all chromosomes (except chr1). Added unit test to cover this case.	2014-02-20 02:33:43 -05:00
Valentin Ruano-Rubio	c167fb5fdf	Fixing GenotypesGVCF. Bug uncovered by some untrimmed alleles in the single sample pipeline output. Notice however does not fix the untrimmed alleles in general. Story: https://www.pivotaltracker.com/story/show/65481104 Changes: 1. Fixed the bug itself. 2. Fixed non-working tests (sliently skipped due to exception in dataProvider).	2014-02-19 14:20:39 -05:00
Ryan Poplin	43c20264b0	Initial commit of the random forest classifier.	2014-02-17 13:07:27 -05:00
droazen	688792c5b0	Merge pull request #520 from broadinstitute/jt_fix_failing_tests_post_maven Fix for the Array Out of Bounds test error	2014-02-14 14:02:17 -05:00
Eric Banks	3724d4e5f3	Various small fixes for CalculateGenotypePosteriors based on feedback from guys in Ben Neale's group. Note that this tool is still a work in progress and very experimental, so isn't 100% stable. Most of the features are untested (both by people and by unit/integration tests) because Chris Hartl implemented it right before he left, and we're going to need to add tests at some point soon. I added a first integration test in this commit, but it's just a start. The fixes include: 1. Stop having the genotyping code strip out AD values. It doesn't make sense that it should do this so I don't know why it was doing that at all. Updated GenotypeGVCFs so that it doesn't need to manually recover them anymore. This also helps CalculateGenotypePosteriors which was losing the AD values. Updated code in LeftAlignAndTrimVariants to strip out PLs and AD, since it wasn't doing that before. Updated the integration test for that walker to include such data. 2. Chris was calling Math.pow directly on the normalized posteriors which isn't safe. Instead, the normalization routine itself can revert back to log scale in a safe manner so let's use it. Also, renamed the variable to posteriorProbabilities (and not likelihoods). 3. Have CGP update the AC/AF/AN counts after fixing GTs.	2014-02-14 13:48:14 -05:00
Joel Thibault	cb7ad01202	Re-enable the relevant tests	2014-02-14 12:34:08 -05:00
Joel Thibault	c8a5007c85	Add a comment to the method where the error appears	2014-02-14 11:40:22 -05:00
Joel Thibault	ec16439387	Clear the ReadCovariates keysCache before runs of individual Unit Tests - normal runs have a constant covariate count, so this is not necessary	2014-02-14 10:41:28 -05:00
Eric Banks	7095a60c8e	Merge pull request #516 from broadinstitute/dr_reenable_tests_failing_due_to_java_update Re-enable tests that were failing post-maven due to changes in Java's Math.pow() implementation	2014-02-13 21:05:18 -05:00
David Roazen	4b4b93ad1b	Re-enable tests that were failing post-maven due to changes in Java's Math.pow() implementation After extensive detective work, Joel determined that these tests were failing due to changes in the implementation of Math.pow() in newer versions of Java 1.7. All GSA members should ensure that they're using a JDK that is at least as current as the one in the Java-1.7 dotkit on the Broad servers (build 1.7.0_51-b13).	2014-02-12 16:08:16 -05:00
Joel Thibault	cc9477aedb	Minimal test for the multi-allelic reordering bug	2014-02-12 13:38:32 -05:00
Eric Banks	300b474c96	Several improvements to the single sample combining steps. 1. updated QualByDepth not to use AD-restricted depth if it is zero. Added unit test this change. 2. Fixed small bug in CombineGVCFs where spanning deletions were not being treated consistently throughout. Added test for this situation. 3. Make sure GenotypeGVCFs puts in the required headers. Updated test files to make sure this is covered. 4. Have GenotypeGVCFs propagate up the MLEAC/AF (which were getting clobbered out). Tests updated to account for this.	2014-02-12 10:15:12 -05:00
David Roazen	95e1402d21	Add ability to run *KnowledgeBaseTests to maven Run with: mvn verify -Dsting.knowledgebasetests.skipped=false	2014-02-11 14:08:24 -05:00
Eric Banks	303a60c8c6	Adding smarts to the QD annotation: when the AD annotation is present for a given genotype then we only use its depth for QD if the variant depth > 1. Added new unit tests for QualByDepth.	2014-02-11 12:56:49 -05:00
Eric Banks	2e36dd9001	Refactoring of CombineGVCFs to make it run a lot faster. Creating new VariantContexts each time we broke up a block was very expensive because we break up blocks so often. Also, calling into GATKVariantContextUtils.simpleMerge was really hurting performance. MD5 changes because we no longer propogate any INFO fields (except for END) for reference blocks; the tests have the now unused BLOCK_SIZE field that now get dropped.	2014-02-11 03:18:52 -05:00
Eric Banks	abef6cfcb6	Removing parameters that were incorrectly copied over from RegenotypeVariants.	2014-02-08 23:44:32 -05:00
Eric Banks	659a9f0e79	Removing the test for BLOCK_SIZE since we no longer emit it	2014-02-08 21:28:07 -05:00
Valentin Ruano-Rubio	bf630abe88	Fixed nocall (./.) without PLs bug in GVCF output Story: https://www.pivotaltracker.com/story/show/65388246 Additional changes and notes: 1. The fix consist in forcing the output of all PLs by setting the standard flag for that '-allSitePLs'. 2. BP_RESOLUTION was handled differently to GVCF in some aspect that should be common. That has been fixed.	2014-02-07 19:30:26 -05:00
Karthik Gururaj	20a46e4098	Check only for SSE 4.1 (rather than SSE 4.2) when trying to use the SSE implementation of PairHMM	2014-02-07 15:19:55 -08:00
Karthik Gururaj	dc44b64ad8	1. Added support for building the PairHMM vector library into build.xml. The library is compiled using makefile and copied into the directory: build/java/classes/org/broadinstitute/sting/utils/pairhmm/ 2. Bundled the library into StingUtils.jar. Unpacked and loaded at runtime without the need to set java.library.path Caveats: Platform independence has probably been thrown out of the window. Assumptions: a. make command exists at /usr/bin/make b. rsync command exists at /usr/bin/rsync c. icc is in the PATH of the user	2014-02-07 13:13:59 -08:00
Eric Banks	d689f61005	Fixed up some of the genotype-level annotations being propogated in the single sample HC pipeline. 1. AD values now propogate up (they weren't before). 2. MIN_DP gets transferred over to DP and removed. 3. SB gets removed after FS is calculated. Also, added a bunch of new integration tests for GenotypeGVCFs.	2014-02-07 12:47:54 -05:00
Eric Banks	67ed0d2403	The UG engine can return a null VC if there are tons of alt alleles, causing Tim's merge jobs to fail. Pushing the null check up so that it doesn't error out in such cases.	2014-02-07 12:41:20 -05:00
Valentin Ruano-Rubio	4a3c8e68fa	Fixed out of order non-variant gVCF entries when trimming is active. Story: https://www.pivotaltracker.com/story/show/65319564	2014-02-07 11:03:26 -05:00
Eric Banks	eb463b505d	Remove a whole bunch of unused annotations from gVCF output. AC,AF,AN,FS,QD - they'll all be recomputed later. BLOCK_SIZE and MIN_GQ were not necessary. I also made the StrandBiasBySample annotation forced on when in gVCF mode. It turns out that its output wasn't compatible with BCF so I patched it (and the variant jar too).	2014-02-07 08:49:36 -05:00
Eric Banks	2648219c42	Implementation of a hierarchical merger for gVCFs, called CombineGVCFs. This tool will take any number of gVCFs and create a merged gVCF (as opposed to GenotypeGVCFs which produces a standard VCF). Added unit/integration tests and fixed up GATK docs.	2014-02-07 08:49:18 -05:00
Eric Banks	71b47a6148	Rename CombineReferenceCalculationVariants to GenotypeGVCFs	2014-02-06 15:46:19 -05:00
Khalid Shakir	3848159086	Added a set of serial tests to gatk/queue packages, which runs all tests under their package in one TestNG execution. New properties to disable regenerating example resources artifact when each parallel test runs under packagetest. Moved collection of packagetest parameters from shell scripts into maven profiles. Fixed necessity of test-utils jar by removing incorrect dependenciesToScan element during packagetests. When building picard libraries, run clean first. Fixed tools jar dependency in picard pom. Integration tests properly use the ant-bridge.sh test.debug.port variable, like unit tests.	2014-02-06 08:25:38 -05:00
Valentin Ruano Rubio	988e3b4890	Merge pull request #487 from broadinstitute/vrr_reference_model_with_trimming Get gVCF to work without --dontTrimActiveRegions	2014-02-05 22:52:17 -05:00
Valentin Ruano-Rubio	98ffcf6833	Get gVCF to work without --dontTrimActiveRegions Story: https://www.pivotaltracker.com/story/show/65048706 https://www.pivotaltracker.com/story/show/65116908 Changes: ActiveRegionTrimmer in now an argument collection and it returns not only the trimmed down active region but also the non-variant containing flanking regions HaplotypeCaller code has been simplified significantly pushing some functionality two other classes like ActiveRegion and AssemblyResultSet. Fixed a problem with the way the trimming was done causing some gVCF non-variant records no have conservative 0,0,0 PLs	2014-02-05 22:50:45 -05:00
Ryan Poplin	693bfac341	Bug fix for missing annotations in CombineReferenceCalculationVariants. They were being dropped in the handoff between engines in a couple of places. -- Updated single sample pipeline test data using Valentin's files and re-enabled CRCV tests	2014-02-05 12:58:48 -05:00
Eric Banks	91bdf069d3	Some updates to CRCV. 1. Throw a user error when the input data for a given genotype does not contain PLs. 2. Add VCF header line for --dbsnp input 3. Need to check that the UG result is not null 4. Don't error out at positions with no gVCFs (which is possible when using a dbSNP rod)	2014-02-05 10:12:37 -05:00

1 2 3 4 5 ...

1109 Commits (bffc9fbabd12eddaed5deaad600e68ba7e9084e1)