gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Phillip Dexheimer	b73e9d506a	Added GATKVCFConstants and GATKVCFHeaderLines to consolidate the GATK-specific VCF annotations * Removed unused annotations (CCC and HWP) * Renamed one of the two GC annotations to "IGC" (for Interval GC) * Revved picard & htsjdk (GATK constants are now removed from htsjdk) * PT 82046038	2015-01-13 21:32:09 -05:00
Laura Gauthier	6b2bd5ed09	Address user-reported bug featuring "trio" family with two children, one parent Add test to cover case with family of one parent, two children	2015-01-13 18:35:44 -05:00
Ryan Poplin	2e5f9db758	Raising per-sample limits on the number of reads in ART and HC. -- Active Region Traversal was using per sample limits on the number of reads that were too low, especially now that we are running one sample at a time. This caused issues with high confidence variants being dropped in high coverage data. -- HaplotypeCallerGVCFIntegrationTest PL/annotation changes due to using more reads in those tests -- Removed a CountReadsInActiveRegionsIntegrationTest test for excessive coverage because the read coverage no longer goes over the limits in ART	2015-01-09 11:21:42 -05:00
rpoplin	03203e249e	Merge pull request #792 from broadinstitute/rhl_pairhmm_log_stderr Rhl pairhmm log stderr	2015-01-07 12:41:10 -05:00
Valentin Ruano-Rubio	aae04b6122	Fixes explicit limitation of the maximum ploidy of the reference-confidence model Story: ===== - https://www.pivotaltracker.com/story/show/83803796 Changes: ======= - From a fix maximum ploidy indel RCM likelihood cache to a dynamically resizable one. - Used the occassion to removed an unused and deprecated method from ReferenceConfidenceModel Testing: ======= - Added integration test to check on ploidies larger than the previous limit of 20.	2015-01-07 10:43:22 -05:00
Ron Levine	b4fda38922	Use logging system instead of stderr	2015-01-05 14:04:10 -05:00
Laura Gauthier	88b6f3aa50	Change []-type arrays to lists so argument parsing works in VCF header commandline output	2015-01-05 10:21:06 -05:00
rpoplin	3240b3538a	Merge pull request #794 from broadinstitute/rhl_read_backed_phasing Rhl read backed phasing	2015-01-05 09:47:25 -05:00
Ron Levine	c6840124fe	clean up, add final	2015-01-04 23:01:24 -05:00
Ron Levine	85dc703461	Add TestMergeIntoMNP() and TestReallyMergeIntoMNP()	2015-01-01 09:51:20 -05:00
Ron Levine	bb94833750	Add more tests	2014-12-30 22:45:44 -05:00
Ron Levine	714d575e3b	correct reference file name	2014-12-25 14:00:39 -05:00
Ron Levine	a7fba5c209	restructure and add more tests	2014-12-25 13:57:54 -05:00
Ron Levine	64375f6341	Messages that were going to stdout now going to stderr Make PairHMM outputs go to stderr instead of stdout Change output from stdout to stderr in close() Updated lib with output going to stderr	2014-12-23 11:03:29 -05:00
Ron Levine	069398ad46	Added more tests and documentation	2014-12-19 12:57:43 -05:00
Laura Gauthier	a9694951d2	Add error handling for genotypes that are called but have no PLs	2014-12-18 15:03:20 -05:00
Geraldine Van der Auwera	b0e615251b	Updated VQSR tool docs	2014-12-18 12:59:37 -05:00
rpoplin	4a2ac38308	Merge pull request #790 from broadinstitute/rp_nsubtil_fix-snp-detection BQSR bug fix from @nsubtil	2014-12-18 09:19:53 -05:00
Ron Levine	08790e1dab	Fix mmultiallelic info field annotation for VariantAnnotator Add multi-allele test for info field annotations Fix to process all types of INFO annotations roll back to previous version, removes INFO and FORMAT Correct @return for VariantAnnotatorEngine.getNonReferenceAlleles() Enhance comments and clean up multi-allelic logic, handle header info number = R only parse counts of A & R Add INFO for AC update MD5 Performance enhancement, only parse multiallelic with a count A or R Make argument final in getNonReferenceAlleles() Code cleanup, add exceptions for bad expression/allele size mismatch and missing header info for an expression Change exception to warning for expression value/number of alleles check remove adevertised exceptions	2014-12-17 22:21:00 -05:00
Ron Levine	ba949389c5	matchHaplotypeAlleles() no longer calls alleleSegregationIsKnown(), added a TODO to investigate	2014-12-17 14:02:24 -05:00
Ryan Poplin	d84970ff75	BQSR bug fix from @nsubtil -- Ignore SNP matches that lie outside the clipped read window -- This fixes an issue where GATK would skip the entire read if a SNP is entirely contained within a sequencing adapter.	2014-12-17 10:04:37 -05:00
Ron Levine	56f8e4f9cf	Add comments, alleleSegregationIsKnown() check is added to matchHaplotypeAlleles()	2014-12-17 03:25:26 -05:00
Laura Gauthier	011843c569	Fixed huge bug from 9895005a (CombineGVCFs used to stop after the first contig)	2014-12-16 12:43:32 -05:00
rpoplin	bcc6b73e9b	Merge pull request #786 from broadinstitute/pd_variantstotable_sma Fix VariantsToTable output of FORMAT record lists when -SMA is specified	2014-12-16 10:37:22 -05:00
Valentin Ruano-Rubio	736a857e82	Fixing CombineGVCFs that writes out the wrong REF allele Story: ===== - https://www.pivotaltracker.com/story/show/83259038 Changes: ======= - Done minimal changes to make the fix after an arduous attempt to understand CombineGVCFs code. Test: ==== - Added a integration test to explicitly test for the bug. - Updated a md5 changes as the bug was actually affecting one of the existing integration tests.	2014-12-13 22:38:24 -05:00
Phillip Dexheimer	71bdfbe465	Fix VariantsToTable output of FORMAT record lists when -SMA is specified * PT 84242218 * Note that FORMAT fields behave the same as INFO fields - if the annotation has a count of A (one entry per Alt Allele), it is split across the multiple output lines. Otherwise, the entire list is output with each field	2014-12-10 21:41:15 -05:00
rpoplin	bf2911d62c	Merge pull request #783 from broadinstitute/pd_splitsamfile Fix NPE in SplitSamFile	2014-12-08 09:39:03 -05:00
Valentin Ruano-Rubio	385186e11b	Makes GQ of Hom-Ref Blocks in GVCF output to be consistent with PLs Story: ----- - https://www.pivotaltracker.com/story/show/83800586 Changes: ------- - In GVCFWriter GQ is now recalculated out of the fianl PL array for the block. Testing: ------- - Updated affected integration test md5s	2014-12-07 16:45:32 -05:00
Phillip Dexheimer	a5dee8a42e	Fix NPE in SplitSamFile * PT 82892316 * Added integration test * Fixed similar error in debug output of HC	2014-12-07 10:37:30 -05:00
Ron Levine	c9175eeee8	Renamed PhasingUtilitiesUnitTest to PhasingUtilsUnitTest	2014-12-02 18:20:12 -05:00
Ron Levine	b8f0f3fdd2	Add argument for loading the vector HMM library once	2014-12-02 10:13:56 -05:00
Ron Levine	386aeda022	Add HaplotypeCaller argument so integration tests can specify the hardware dependent PairHMM sub-implementation	2014-11-25 21:53:53 -05:00
Ron Levine	34241a62f6	Use a publicly accessible sequence file	2014-11-24 11:18:21 -05:00
Ron Levine	6ff698c556	Added HP and non-HP tests for matchHaplotypeAlleles(), added a nominal test for mergeIntoMNPvalidationCheck()	2014-11-24 11:08:04 -05:00
Ron Levine	61e1a3ecd1	Added the framework for testing the PhasingUtilies methods matchHaplotypeAlleles() and reallyMergeIntoMNP()	2014-11-22 22:01:39 -05:00
Menachem Fromer	9b73c8a841	Fix MNP merging bugs	2014-11-21 06:42:51 -05:00
rpoplin	00027e1555	Merge pull request #774 from broadinstitute/ldg_makeSelectVariantsTrimAlleles Add -trim argument to SelectVariants to trim alleles to minimal represen...	2014-11-13 13:58:13 -05:00
Ron Levine	67656bab23	Resolved conflict during rebasing Add more logging to annotators, change loggers from info to warn Add comments to testStrandBiasBySample() Clarify comments in testStrandBiasBySample remove logic for not prcossing an indel if strand bias (SB) was not computed remove per variant warnings in annotate() Log warnings if using the wrong annotator or missing a pedgree file Log test failures once in annotate(), because HaplotypeCaller does not call initialize(). Avoid using exceptions Fix so only log once in annotate(), Hardey-Weinberg does not require pedigree files, fix test MD5s so pass Check if founderIds == null Update MD5s from HaplotypeCaller integrations tests and clean up code Change logic so SnpEff does not throw excpetions, change engine to utils in imports Update test MD5s, return immediately if cannot annotate in SnpEff.initialization() Post peer review, add more logging warnings Update MD5 for testHaplotypeCallerMultiSampleComplex1, return null if PossibleDeNovo.annotate() is not called by VariantAnnotator	2014-11-12 02:45:49 -05:00
Laura Gauthier	783a4fd651	Change default behavior of SelectVariants to trim remaining alleles when samples are subset. -noTrim argument preserves original alleles. Add test for trimming.	2014-11-11 16:32:25 -05:00
Valentin Ruano-Rubio	c5977e5c8f	Correct wrong left-alignment of reads in HC bamout Story: ----- https://www.pivotaltracker.com/story/show/80684230 Changes: ------- - Corrected the bug: AlignmentUtils#createReadAlignedToRef was not realigning against the reference but the best haplotype for the read. Test: ---- - Added integration test in HaplotypeCallerIntegrationTest to check that the bug has been fixed. - Fixed md5s modified by this change; these are cause due to small changes in the state of the random-number generator and read vs variant site overlapping.	2014-11-10 10:09:58 -05:00
Laura Gauthier	c09667a20d	Fix bug in CombineGVCFs so now sample 2 variants occuring within sample 1 deletions get merged properly. CombineGVCFs now outputs ref conf for the duration of deletions so that SNPs occuring in other samples aligned with those deletions will be genotyped correctly	2014-11-05 09:11:47 -05:00
Khalid Shakir	0092a0b9eb	Faster builds, with updates to documentation generation. Reading the multiple GATKText files as a single stream, especially with new top level target executable jar files pointing to a lib folder. Don't dirty the build with a new GATKText.properties if input files are unmodified. Stop warning on undocumented abstract classes. Fixed ClassNotFoundException/NoClassDefFoundError by fixing ResourceBundleExtractorDoclet artifact. Excluding Exceptions from documentation. Removed custom log4j dependency from ResourceBundleExtractorDoclet. Stop generating the dependency reduced pom during shade. Stop regenerating gsalib when the files are already up to date. Disabled mvn site generation from external-example.	2014-11-05 00:32:23 +08:00
Khalid Shakir	1cb4b99548	Added faster built executable, non-packaged jars. Moved top level target symlinks to package jar files to under target/package. Executable jar files are placed under target/executable with the new target[/lib] directories. Under top level target, symlinks to either the package or the executable jars replace what was a symlink to the package jar path. Allow disabling of the shade package. ant-bridge.sh by default only builds executable jars, and doesn't package by default, as did the old ant build.xml. Added a new package_path.sh utility script for other scripts to use instead of anything in the target folder.	2014-11-05 00:30:46 +08:00
Phillip Dexheimer	10f99cbe04	Added StrandAlleleCountsBySample annotation This annotation outputs the number of reads supporting each allele, stratified by sample and read strand. Addresses PT 76958712	2014-11-03 21:35:58 -05:00
Khalid Shakir	8b81031bf8	Disabling tests for Lsf706 specific functionality.	2014-11-04 01:31:18 +08:00
Phillip Dexheimer	bcfd9ce19a	Moved platform flow information into NGSPlatform * Explicitly added a type for rarely used platforms * PT 81767718	2014-10-31 22:27:34 -04:00
rpoplin	c84805c402	Merge pull request #768 from broadinstitute/pd_bcf_failures Fix BCF writing when FORMAT annotations contain arrays	2014-10-31 15:30:56 -04:00
rpoplin	eecb56e0ae	Merge pull request #766 from broadinstitute/ldg_StrandBiasForMultiallelics Calculate StrandBiasBySample using all alternate alleles as ref vs. any ...	2014-10-31 15:26:07 -04:00
Phillip Dexheimer	fc67e50faa	Revved Picard/htsjdk Removed inefficient array->List conversion in AlleleCountBySample	2014-10-30 21:16:25 -04:00
Laura Gauthier	bc7202fff7	Calculate StrandBiasBySample using all alternate alleles as ref vs. any alt	2014-10-30 11:52:06 -04:00
Khalid Shakir	5c9fe1a06d	Split all imports of tools\|engine from utils, and all tools from engine. Second of two commits, modifying actual files.	2014-10-24 20:59:46 +08:00
Khalid Shakir	bb7151192a	Split all imports of tools\|engine from utils, and all tools from engine. First of two commits, renaming files only.	2014-10-24 20:59:45 +08:00
Geraldine Van der Auwera	b69b256003	Update pom versions to mark the start of GATK 3.4 development	2014-10-23 22:31:44 -04:00
Geraldine Van der Auwera	eee94ec81f	Update pom versions for the 3.3 release	2014-10-23 22:25:17 -04:00
Geraldine Van der Auwera	3ba94b987c	Minor documentation clarifications	2014-10-22 17:54:11 -04:00
rpoplin	0f89d1a362	Merge pull request #755 from broadinstitute/sc_Annotation_Docs_73647570 Improvements to documentation of variant annotations	2014-10-22 13:41:00 -04:00
Sheila Chandran	b3c5ed4414	Improvements to documentation of variant annotations - Added or modified explanations for majority of variant annotations - Generalized NBaseCount to include all tech platforms (not just SOLiD)	2014-10-21 18:20:04 -04:00
Geraldine Van der Auwera	895b8c5931	Minor fix for missing INFO key definition in VCF header	2014-10-21 16:50:37 -04:00
rpoplin	c4fcd70a88	Merge pull request #754 from broadinstitute/rhl_variant_array_exception Do not process a variant if it is too large (> readLength), and log an e...	2014-10-21 12:01:52 -04:00
rpoplin	bcf6be0b08	Merge pull request #753 from broadinstitute/ldg_HCzeroDepth Fix GenotypeGVCF bugs in -allSites mode	2014-10-21 12:00:04 -04:00
Laura Gauthier	2b848ad859	Variants that become hom-ref after regenotyping in GenotypeGVCFs are now getting output in -allSites mode.	2014-10-21 08:21:53 -04:00
Laura Gauthier	5465e4484e	For GenotypeGVCFs -allSites mode, make genotypes no-call if depth is zero.	2014-10-21 08:21:43 -04:00
Ron Levine	239151ac7b	Do not process a variant if it is too large (> readLength), and log an error remove final keyword before refMap and altMap, constructHaplotype() changes their values return ArtificialHaplotype from constructHaplotype instaed of passing as an argument Add logic so arraycopy does not throw an IndexOutOfBoundsException, add test for a long insert	2014-10-20 15:51:32 -04:00
Phillip Dexheimer	b348ce8f25	Added -disableOptimizations argument to HaplotypeCaller. * This argument is intended to be used in conjunction with -bamout, and disable early-exit optimizations to allow reference regions to be contained in the output bam * Also forcibly includes the reference haplotype in the set of haplotypes given to the BAMWriter * Made -dontTrimActiveRegions visible, as it is likely also desirable in this use case * Addresses PT 77731660	2014-10-16 21:11:20 -04:00
Laura Gauthier	0f08065ebc	Throw UserException if input VCFs have duplicate samples but no genotypemergeoption is specified	2014-10-15 16:03:10 -04:00
Laura Gauthier	81482138ca	Decrease interval on CGP integration test to reduce test execution time	2014-10-15 11:28:27 -04:00
Geraldine Van der Auwera	e7e8052f84	Updated license information - Updated license files (private/protected) for version, address and a couple of legal clauses - Updated license snippet throught the codebase	2014-10-14 17:10:12 -04:00
Ron Levine	36c27155af	Made the threshold for the probability of a state being active a command line argument remove TODO comment after activeProbThreshold recover static ACTIVE_PROB_THRESHOLD for unit tests Add min/max values for active_probability_threshold parameter Move activeProbThreshold parameter to GATKArguemtnCollection define ACTIVE_PROB_THRESHOLD in unit tests add construction of argCollection in in ctor Move arguments from GATKArgumentCollection to ActiveRegionWalker Throw exception if threshold < 0 or > 1 in ActivityProfile ctor max propogation distance parameter to ActiveRegionWalker for AcrtivityProfile Use polymorphic getMaxProbPropagationDistance() so BandPassActivityProfile computes the crrect region size cutoff Get the maxProbPropagationDistance from the super class's method, instead of directly, this is safer Removed extraneous command line imports and make maxProbPropagationDistance a hidden argument remove limit check for activeProbThreshold, not necessary because the check is made when imput as a command line arg Remove extra 'region' in the doxygen param description for maxProbPropagationDistance	2014-10-10 10:36:02 -04:00
Ron Levine	645d418015	Changed hardcoded downsampling max/min coverage values to parameters Rename parameters using camel case and add to integration test Correct documentation for maxReadsInRegionPerSample and minReadsPerAlignmentStart Change the argument--minReadsPerAlignmentStart in the integration test from 50 to 5 'each genomic location' only pertains to minReadsPerAlignmentStart, not maxReadsInRegionPerSample	2014-10-09 17:09:26 -04:00
Valentin Ruano-Rubio	a3ad6f63bd	Reduce execution time of various integration tests Story: https://www.pivotaltracker.com/story/show/79461912	2014-09-30 13:28:55 -04:00
rpoplin	329bd081b7	Merge pull request #736 from broadinstitute/rhl_remove_line removed an unneed import that broke maven	2014-09-29 15:03:55 -04:00
Ron Levine	1c9d60c9a0	removed an unneed import that broke maven	2014-09-29 12:57:33 -04:00
Valentin Ruano-Rubio	311b6815b3	Fixed the QUAL calculation of the EXACT_INDEPENDENT. The QUAL value calculated by this Exact AF Calculator is very underestimated when there are more than one alternative allele (non-biallelic sites). The reason is that the QUAL was roughly calculated by adding the QUALs resulting of each alternative alleles vs all other alleles, reference and alts, collapsed. This is ok for MLEAC calculations but not for QUAL. Now, for calculating the QUAL we collapse all the alternatives as only one. This change improves sensitivy with a cost of additional false positives, but this is naturally expected. The resulting QUAL column is much closer to the one returned by the reference implementation. Story: https://www.pivotaltracker.com/story/show/75926368. Changes: Changed the QUAL calculation as described above. Updated MD5s. Fixed MD5s	2014-09-29 11:04:52 -04:00
Valentin Ruano-Rubio	0e52b8ba5a	Fixed MLEAC and QUAL inaccuracy in GeneralPloidyExactAFCalculator. The problem whas that the MLE table calculation aborted "unlikely" genotype combinations to aggresively. This also uncovered another bug where GeneralPloidyExactAFCalculation makes a slightly different use of StateTracker as compared to DiploidExactAFCalculation. We have changed StateTracker generalizing it to be able to work with both using code behaviors. Story: ----- * https://www.pivotaltracker.com/story/show/78920568 Changes: ------- * Fixes in GeneralPloidyExactAFCalculator. * Needed changes in StateTracker API and its consequences in DiploidExactAFCalculation. * Updated affected integrated tests' MD5s after fixing the GeneralPloidyExactAF.	2014-09-23 15:40:54 -04:00
Valentin Ruano-Rubio	f6cb83d476	Renamed AFCalc to AFCalculator for a better class naming	2014-09-12 14:59:58 -04:00
Valentin Ruano-Rubio	95b45443ae	Updated test according to changes in the AF calculator framework. Changes: ------- * Updated current unit and integration test to use the new API components. * Added unit tests for new classes AFPriorProvider and AFCalculatorProviders. * Added integration test for mixed ploidy GenotypeGVCFs and CombineGVCFs	2014-09-12 14:59:47 -04:00
Valentin Ruano-Rubio	3cdeab6e9e	GenotypingEngines and walkers now use AFCalc(ulator) providers rathern than instanciate their own (fixed) calculators directly. Changes: ------- * GenotypingEngine uses now a AFCalc provider instead of its own thread-local with one-time initialized and fixed AF calculator. * All walkers that use a GenotypingEngine now are passing the appropiate AF calculator provider. For now most just use a fix calculator (FixedAFCalculatorProvider) except GenotypeGVCFs as this one now can cope with mixture of ploidies failing-over to a general-ploidy calculator when the preferred implementation is not capable to handle a site's analysis.	2014-09-12 14:25:09 -04:00
Valentin Ruano-Rubio	935bd1394b	AFCalculatorProvider components to allow for dynamic instantiation of different AFCalc(ulators) to cope with dynamic ploidy and max-alt-allele counts (the latter not used for now).	2014-09-12 14:23:45 -04:00
Valentin Ruano-Rubio	ce8e93fa51	Made the AF prior probability distribution dynamic respect to the total-ploidy (added ploidy accross samples). Changes: -------- * Instead of calculate a fixed log10 prior array with a fix total likelihood we use a new component, the AFPriorProvider to generate the priors for different total plodies on demand; these are cached however so there is no unecessary recompute involved.	2014-09-12 14:23:37 -04:00
Valentin Ruano-Rubio	31e58ae4ec	Refactored AFCalc to remove unecessary capability limits allowing to deal with mixed ploidies and max-alt-allele number changes dynamically. Changes: -------- * Moved the AFCalcFactory.Calculation enum in a top level class AFCalculatorImplementation. * Given more reponsabilities to the enum like resolving the constructor method once per implementation and the best-model selection algorithm. * Removed test-code only fields and methods from AFCalc; just used to perform unit-testing and not any actual functionality of this component. * Removed the fixed ploidy constraint of GeneralPloidyExactAFCalc implementation... now can deal with mixed ploidies that may change per site and sample. * Removed the fixed maxAltAllele restriction by allowing resizing of the stateTracker structures. * Due to previous two points now call the the AFCalc object are passed the default-ploidy to assume in case some genotype in the input VC does not have it and the max-alt-allele. * Also due to those changes, removed the now totally useless 3 int parameters from all AFCalc constructors. * Cleaned the code a bit from no further used components and methods.	2014-09-12 14:17:36 -04:00
Ryan Poplin	48252897b4	Added ignore all filters options to VQSR walkers	2014-09-11 15:11:41 -04:00
Eric Banks	31cea25c36	Merge pull request #730 from broadinstitute/eb_inbreeding_coeff_unit_test Cleaned up and fleshed out unit tests for the Inbreeding Coefficient annotation class	2014-09-10 09:32:49 -04:00
Eric Banks	5e490362ca	Cleaned up and fleshed out unit tests for the Inbreeding Coefficient annotation class.	2014-09-08 11:40:39 -04:00
Eric Banks	cc175bad40	Improve the accuracy of dangling head merging in the HC assembler. Dangling head merging (like with tails) in now enabled by default. The --recoverDanglingHeads argument is now deprecated so that users know not to use it anymore. We now also allow the user to set the minimum branch length for merging. This will be different for exomes and RNA (see below). The other changes in the code itself: 1. We no longer allow an arbitrarily large number of mismatches in the dangling head for merging 2. The max number of mismatches allowed in a dangling head is proportional to the kmer size There will be a difference in the RNA calling pipeline. Instead of invoking '--recoverDanglingHeads' the user will instead want to use '--minDanglingBranchLength 0'. Below are the knowledgebase results of the master branch vs. this one. For NA12878 DNA Exome: master SNPS TRUE_POSITIVE 36722 master SNPS CALLED_NOT_IN_DB_AT_ALL 2699 master SNPS REASONABLE_FILTERS_WOULD_FILTER_FP_SITE 292 master SNPS FALSE_POSITIVE_SITE_IS_FP 70 branch SNPS TRUE_POSITIVE 36867 branch SNPS CALLED_NOT_IN_DB_AT_ALL 2952 branch SNPS REASONABLE_FILTERS_WOULD_FILTER_FP_SITE 387 branch SNPS FALSE_POSITIVE_SITE_IS_FP 94 As I discussed with Ryan in person, there are a good number of FPs that are called in the new code, but they nearly all have bad strand bias and should be easily filtered by VQSR. Note that there is no change for indels. For NA12878 RNA from Ami: master SNPS TRUE_POSITIVE 11055 master SNPS CALLED_NOT_IN_DB_AT_ALL 831 master SNPS REASONABLE_FILTERS_WOULD_FILTER_FP_SITE 44 master SNPS FALSE_POSITIVE_SITE_IS_FP 96 branch SNPS TRUE_POSITIVE 11113 branch SNPS CALLED_NOT_IN_DB_AT_ALL 874 branch SNPS REASONABLE_FILTERS_WOULD_FILTER_FP_SITE 47 branch SNPS FALSE_POSITIVE_SITE_IS_FP 92 Again, there's basically no change for indels.	2014-09-07 08:55:59 -04:00
Phillip Dexheimer	a35f5b8685	Moved arguments controlling options in output files into the engine * Arguments involved are --no_cmdline_in_header, --sites_only, and --bcf for VCF files and --bam_compression, --simplifyBAM, --disable_bam_indexing, and --generate_md5 for BAM files * PT 52740563 * Removed ReadUtils.createSAMFileWriterWithCompression(), replaced with ReadUtils.createSAMFileWriter(), which applies all appropriate engine-level arguments * Replaced hard-coded field names in ArgumentDefinitionField (Queue extension generator) with a Reflections-based lookup that will fail noisily during extension generation if there's an error	2014-09-05 21:18:11 -04:00
droazen	5c4a3eb89c	Merge pull request #727 from broadinstitute/ks_gatk_queue_package_test_updates Various fixes for package tests.	2014-09-05 10:17:32 -04:00
Ryan Poplin	a45acdfb89	StrandOddsRatio is now a standard annotation.	2014-09-05 08:33:37 -04:00
Khalid Shakir	376592f423	Various fixes for package tests. Explicitly including gatk/queue test-jar artifacts in package test classpaths. SelectVariantsIntegrationTest#testInvalidJexl now resets the JexlEngine silent flag that VariantFiltration.initialize() toggles. External example no longer tries to unpack nonexistent gatk artifact jars during package tests.	2014-09-04 15:30:31 -04:00
Ryan Poplin	1b809268d5	fixing a few small typos in the HaplotypeCaller and related classes	2014-09-04 14:48:27 -04:00
droazen	5c087a6e1f	Merge pull request #724 from broadinstitute/ks_remove_test_qscript_symbolic_links Removed symlink creation for tests and qscripts	2014-09-04 09:10:54 -04:00
Eric Banks	538537dbf1	Merge pull request #718 from broadinstitute/mf_rbp_fix Fix MNP merging code to work with explicit HP phase representation	2014-09-02 20:39:22 -04:00
Eric Banks	01e725cd1a	Merge pull request #723 from broadinstitute/eb_fix_rna_splitting_PT77878554 Make sure that the OverhangFixingManager (used for splitting RNA reads) ...	2014-09-02 20:39:01 -04:00
Menachem Fromer	10f9001738	Fix MNP merging code to work with explicit HP phase representation	2014-09-02 17:25:08 -04:00
Eric Banks	ff91ab8ba2	Make sure that the OverhangFixingManager (used for splitting RNA reads) handles unmapped reads.	2014-09-02 16:56:17 -04:00
Valentin Ruano Rubio	c7925f6e5c	Merge pull request #719 from broadinstitute/vrr_generalize_ploidy_in_genotype_gvcfs Adds support for omniploidy to GenotypeGVCFs and CombineGVCFs.	2014-09-02 16:51:02 -04:00
Valentin Ruano-Rubio	d363725b4b	Adds support for omniploidy to GenotypeGVCFs and CombineGVCFs. Same changes fixed the problem for GenotypeGVCFs and CombineGVCFs. Stories: - https://www.pivotaltracker.com/story/show/77626044 - https://www.pivotaltracker.com/story/show/77626854 Changes: - Generalized the code for the merging in GATKVariantContextUtils to cope with ploidy != 2. - GenotypeGVCFs now check that the input's ploidy conform to the '-ploidy' argument. - Moved out Refernce Confidence VC merging code from GATKVariantContextUtils so that we can keep new code in protected. Caveats: - GenotypeGVCFs only can deal with input files that have the same ploidy in all positions; the one that the user MUST indicate in the -ploidy argument (if different to the default 2). - CombineGVCFs won't necessarely complain if its passed mixed ploidy inputs but you won't be able to genotype it with GenotypeGVCFs. Test: - Removed deprecated unit tests for GATKVariantContextUtils. - Moved unit-tests regarding GVCF merging from GATKVariantContextUtilsUnitTest to ReferenceConfidenceVariantContextUtilsUnitTest. - Added unit test for new code for mapping genotype indices between allele index encoding in GenotypeLikelihoodCalculator. - GenotypeGVCFs and CombineGVCFs original integration test are unaffected by the change. - Added tetraploid run integration tests to check on non-diploid execution of GenotypeGVCFs and CombineGVCFs.	2014-09-02 15:06:47 -04:00
Khalid Shakir	fcb0eca203	Now passing in the path to the GATK directory to tests. Changed tests and scripts to use gatkdir full path instead of relative testdata/qscripts symbolic links. Although symlinks not created, left the symlink deletion script execution with a comment about future removal. Re-enabled example UG pipeline queue test. Replaced all hardcoded strings of {public,private}/testdata with BaseTest variables. Refactored temp list creation method from ListFileUtilsUnitTest to BaseTest.createTempListFile. Removed list files with hardcoded paths, now using createTempListFile instead with private test dir variable.	2014-09-02 01:40:59 +08:00
Khalid Shakir	2d28972c88	The 'after' files are @Input files and commited in git, so don't delete them after tests.	2014-08-30 03:04:54 +08:00
Eric Banks	5b087c9897	Changed the functionality of the physical phasing in the HC: now hom vars are output as 0\|1. We do this for technical reasons, mostly because we don't genotype in the HC anymore; it's all done downstream by GenotypeGVCFs so we can't be sure that the genotype will be hom var. Also, there are steps in the downstream pipeline where genotypes can change, so assuming anything in the HC is a bad idea, and if we have phasing info in the het state, we want to propagate that forward. Now, PGT tag fixing happens downstream in GenotypeGVCFs. While I was in there I also cleaned up the code a bit and fixed a bug where annotation was happening before genotype creation when using the --includeNonVariantSites argument. Added tests accordingly.	2014-08-25 21:40:14 -04:00
Valentin Ruano-Rubio	6dc5cf0be0	Fixes some missmerged md5 updates from a previous merge into master	2014-08-24 20:47:07 -04:00
Eric Banks	9009c1e996	Merge pull request #715 from broadinstitute/vrr_disable_physical_phasing_for_nondiploid_hc Disable physical phasing for non-diploid HC calling.	2014-08-23 20:58:51 -04:00
Valentin Ruano-Rubio	6695aeafd9	Disable physical phasing for non-diploid HC calling. Story: https://www.pivotaltracker.com/story/show/77452256 Changes: If ploidy != 2, disable physical phasing and log an info message to let the user know. Tests: Change md5s affected by this change.	2014-08-23 10:52:07 -04:00
Phillip Dexheimer	931890915f	Add the --sample_name argument to HaplotypeCaller * This is a shortcut for people who have multi-sample BAMs but would like to use GVCF mode. Rather than creating single-sample BAMs with PrintReads, one could use the --sample_name argument to HaplotypeCaller to specify the single sample to make calls on * Completes PT 73075482	2014-08-22 23:22:03 -04:00
Valentin Ruano-Rubio	fc5ce4b662	Created the stand-alone AC and AF annotation AlleleCountBySample Story: https://www.pivotaltracker.com/story/show/77250524 Changes: - Remove the annotating code in GeneralPloidyExactAFCalc (GPEAFC) class. - Added the asAlleleList to GenotypeAlleleCounts class and get (GPEAFC) to use that instead of implementing its own (nicer and more reusable code). - Removed the explicit addition of AlleleCountBySample fields to the VCF header by the walker initialize - Added utility methods in Utils to wrap and int[] array into a List<Integer>, and double[] array into a List<Double> efficiently. Test: - Added unit-testing for asAlleleList in GenotypeAlleleCountsUnitTest (within testFirst and testNext). - Added unit-testing for new methods in Utils : asList(int[]) and asList(double[]) - Changed UG General Ploidy test to add explicitly those annotations. - Non-trivial changes in integration tests involving non-diploid runs (namelly haploid and tetraploid) as they are not showing those annotations anylonger, so the MD5s have been changed accordingly.	2014-08-22 20:33:25 -04:00
Eric Banks	36bdfa3918	Merge pull request #712 from broadinstitute/eb_physical_phasing_bug_PT77248992 Fixing bug in the physical phasing code, found by Valentin.	2014-08-21 15:25:51 -04:00
Eric Banks	b1cb6196be	Fixing bug in the physical phasing code, found by Valentin. It turns out that there can be some really complex situations even with a single sample where there are lots of unphasable hets around a hom. Previously we were trying to phase each of the hets against the hom, but that wasn't correct. Instead we now detect that situation and don't attempt to phase anything. Added a unit test to cover this situation.	2014-08-21 15:24:09 -04:00
Laura Gauthier	9a5da41dd4	Add bells and whistles for Genotype Refinement Pipeline New annotation for low= and high-confidence de novos (only annotates biallelics) FamilyLikelihoodsUtils now add joint likelihood and joint posterior annotations Restrict population priors based on discovered allele count to be valid for 10 or more samples.	2014-08-21 11:20:40 -04:00
Valentin Ruano-Rubio	d31c5536aa	Fixed the bug first by indicating the actual possible number of alternatives alleles considering the extra <NON_REF> and second by resizing the StateTracker capacity when invoked by GeneralPloidyExactAFCalc deep within its implementation of computeLog10PNonRef which is ultimatelly what get rids of the exception. Story: https://www.pivotaltracker.com/story/show/74471252	2014-08-20 14:42:42 -04:00
Laura Gauthier	b512c7eac9	Refactor StrandBiasTest (using template method) and add warnings for when annotations may not be calculated successfully. VariantAnnotator/FS behavior changes slightly: VA used to output zeros for FS if there was no strand bias info, now skips FS output (but will still show FS in header)	2014-08-20 08:18:53 -04:00
Valentin Ruano-Rubio	8d9a55ae60	Moving new omniploidy likelihood calculation classes to their final package (as far as this pull-request is concerned) in org.broadinstitute.gatk.tools.walkers.genotyper	2014-08-19 11:54:29 -04:00
Valentin Ruano-Rubio	611b7f25ea	Adds unit-test and integration test for new omniploidy likelihood calculation components Added md5 to HaplotypeCallerIntegrationTest.testHaplotypeCallerSingleSampleWithDbsnp	2014-08-19 11:53:19 -04:00
Valentin Ruano-Rubio	9ee9da36bb	Generalize the calculation of the genotype likelihoods in HC to cope with haploid and multiploidy Changes in several walker to use new sample, allele closed lists and new GenotypingEngine constructors signatures Rebase adoption of new calculation system in walkers	2014-08-19 11:53:06 -04:00
Valentin Ruano-Rubio	f08dcbc160	Added the genotype likelihoods model interface and implementation for the random speciment sample from an infinite population with homogeneous ploidy accross samples.	2014-08-19 11:50:13 -04:00
Valentin Ruano-Rubio	4f993e8dbe	Added read-likelihoods array base structure to substitute existing Map-of-Map-of-Maps.	2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio	242cd0e58f	Added genotype allele counts and likelihood calculator utilities for arbitrary ploidy and number of alleles	2014-08-19 11:50:12 -04:00
Valentin Ruano-Rubio	b0a4cb9f0c	Added close sample and allele list data-structures and utility classes	2014-08-19 11:50:12 -04:00
Eric Banks	d3f06024f8	Updated the physical phasing in the Haplotype Caller to address requests from ATGU. 1. It is now turned on by default 2. It now phases homozygous variants 3. Most importantly, it also phases variants that are always on opposite haplotypes Changed the INFO keys to be PID and PGT, as described in the header.	2014-08-18 14:38:29 -04:00
Eric Banks	7e0c326e1c	Merge pull request #706 from broadinstitute/vrr_reduce_hc_integration_test_time Reduce intervals of integration tests in HaplotypeCallerIntegrationTest ...	2014-08-15 17:37:57 -04:00
Valentin Ruano-Rubio	2f79042dee	Reduce intervals of integration tests in HaplotypeCallerIntegrationTest class Story: https://www.pivotaltracker.com/story/show/74858854 Changes: Intervals have been shrunk so that the test run in 15s or less.	2014-08-15 14:20:10 -04:00
Eric Banks	eb84091702	Update the --keepOriginalAC functionality in SelectVariants to work for sites that lose alleles in the selection.	2014-08-14 15:34:09 -04:00
Ryan Poplin	3a9a78c785	Removing an assumption that ADs were in the same order if the number of alleles matched. This happens for example when one sample is C->T and another sample is C->G.	2014-08-13 13:26:40 -04:00
Eric Banks	27193c5048	Merge pull request #700 from broadinstitute/eb_phase_HC_variants_PT74816060 Initial implementation of functionality to add physical phasing informat...	2014-08-13 12:30:32 -04:00
Eric Banks	4512940e87	Initial implementation of functionality to add physical phasing information to the output of the HaplotypeCaller. If any pair of variants occurs on all used haplotypes together, then we propagate that information into the gVCF. Can be enabled with the --tryPhysicalPhasing argument.	2014-08-13 12:25:31 -04:00
Valentin Ruano-Rubio	b39508cd15	ReadLikelihoods class introduction final changes before merging Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM. Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set. Updated some integration test md5s.	2014-08-11 17:47:25 -04:00
Valentin Ruano-Rubio	9a9a68409e	ReadLikelihoods class introduction final changes before merging Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM. Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set. Updated some integration test md5s. Fixing GraphBased bugs with new master code Fixed ReadLikelihoods.changeReads difficult to spot bug. Changed PairHMM interface to fix a bug Fixed missing changes for various PairHMM implementations to get them to use the new structure. Fixed various bugs only detectable when running with full sample(s). Believe to have fixed the lack of annotations in UG runs Fixed integrationt test MD5s Updating some md5s Fixed yet another md5 probably left out by mistake	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	0b472f6bff	Added new test to verify the functionality of ReadLikelihoods.java and its use in HC. Updated existing integration test md5s. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652	2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio	2914ecb585	Change the Map-of-maps-of-maps for an array based implementation ReadLikelihoods to hold read likelihoods. The array structure should be faster to populate and query (no properly benchmarked) and reduce memory footprint considerably. Nevertheless removing PairHMM factor (using likelihoodEngine Random) it only achieves a speed up of 15% in some example WGS dataset i.e. there are other bigger bottle necks in the system. Bamboo tests also seem to run significantly faster with this change. Stories: https://www.pivotaltracker.com/story/show/70222086 https://www.pivotaltracker.com/story/show/67961652 Changes: - ReadLikelihoods added to substitute Map<String,PerSampleReadLikelihoods> - Operation that involve changes in full sets of ReadLikelihoods have been moved into that class. - Simplified a bit the code that handles the downsampling of reads based on contamination Caveats: - Still we keep Map<String,PerReadAlleleLikelihoodsMap> around to pass to annotators..., didn't feel like change the interface of so many public classes in this pull-request.	2014-08-11 17:46:28 -04:00
Ryan Poplin	c56e493f98	Merge pull request #622 from broadinstitute/ldg_SORanalysis Add StrandOddsRatio to default annotations produced by GenotypeGVCFs	2014-08-11 09:45:27 -04:00
Tim Fennell	5695f22da8	Changed the default GVCF Q Bands from 5,20,60 to be 1..60 by 1s, 60...90 by 10s and 99 in order to give finer resolution for homref PLs and ADs at lower confidences and somewhat higher resolution at higher confidences.	2014-08-08 14:31:35 -04:00
Laura Gauthier	35de598e4b	Modify StrandOddsRatio calculation to take on lower values in cases where reference +/- reads are skewed but alt reads are not. Add SOR to default annotations produced by GenotypeGVCFs. Add jitter to minimum SOR values	2014-08-07 12:09:19 -04:00
Laura Gauthier	f532f1f843	Fix nullPointerException	2014-08-07 10:13:17 -04:00
Laura Gauthier	74affcc077	Update inbreeding coefficient calculation to give a better estimate for multialleleic sites Add unit test for compound het and for multiallelic hets	2014-08-07 08:12:47 -04:00
Eric Banks	b9486f5b4d	Merge pull request #693 from broadinstitute/ldg_SORfromHC Allow SOR to be calculated from HC	2014-08-06 21:48:09 -04:00
Phillip Dexheimer	593663d9b6	Improved detection of missing argument values In particular, it was possible to specify arguments for Files or Compound types without values Added a special "none" value for annotations, since a bare "-A" is no longer allowed Delivers PT 71792842 and 59360374	2014-08-05 20:31:31 -04:00
Laura Gauthier	5533199402	Allow SOR to be calculated from HC Refactor StrandBiasTest classes	2014-08-01 20:47:58 -04:00
Ryan Poplin	63b3f7dfd3	Fixing typos in AnalyzeCovariates	2014-07-31 10:36:18 -04:00
Valentin Ruano-Rubio	750eb4b5a6	Add diploid only support message to HaplotypeCaller Story: https://www.pivotaltracker.com/story/show/73440292 Changes: - Just add the conditional in HaplotypeCaller#initialize Testing: - Nothing added, checked locally, trivial change that would eventually be removed anyway.	2014-07-29 17:05:36 -04:00
David Roazen	0798a4b768	Update pom versions to mark the start of GATK 3.3 development	2014-07-17 12:09:33 -04:00
David Roazen	323f22f852	Update pom versions for the 3.2 release	2014-07-17 12:06:22 -04:00
Eric Banks	98d88eb07e	Fixed IndexOutOfBounds error associated with tail merging. Don't expand out source nodes for tail merging, since that's a head merging action only. This shows up as a bug only because we now allow merging tails against non-reference paths.	2014-07-17 12:04:22 -04:00
Geraldine Van der Auwera	a6f632874b	Various documentation improvements - Edited intervals merging docs for correctness & clarity - Edited VQSR arg docs and made mode required (+added -mode SNP to VQSR tests) - Moved PaperGenotyper to Toy Walkers to declutter the actually useful docs - Moved GenotypeGVCFs to Variant Discovery category and clarified a few points - Clarified that the -resource argument depends on using the -V:tag format - Clarified how the pcr indel model works - Added caveat for -U ALLOW_N_CIGAR_READS - Added MathJax support for displaying equations in GATKDocs - Updated HC example commands and caveats	2014-07-14 12:03:03 -04:00
droazen	db53d096c9	Merge pull request #684 from broadinstitute/ks_add_cofoja_to_gatk_packages Added cofoja to the gatk packages for tests to pass.	2014-07-14 11:15:49 -04:00
Eric Banks	ecefcb383d	Disable the complex variant merging for now, as requested by ATGU	2014-07-11 17:27:40 -04:00
Khalid Shakir	c7e357eb59	Added cofoja to the gatk packages for tests to pass.	2014-07-11 23:19:42 +08:00
droazen	b8751ad598	Merge pull request #680 from broadinstitute/ldg_VQSRscript Update VQSR Rnd BQSR script generation code for compatibility with late...	2014-07-11 10:16:37 -04:00
Eric Banks	1d97b4a191	Improved tail merging: now tails can be merged to branches that are not entirely reference. This is useful for e.g. cases where there are SNPs on insertions. Before tails were forced to be merged (incorrectly) only to a reference node, but now they can be merged to any path in the graph from which they directly branch. Also, I've transferred over Ryan's code to refuse to process kmer sizes such that there are non-unique kmers in the reference sequence with them.	2014-07-10 08:57:01 -04:00
Ryan Poplin	5eee065133	Merge pull request #674 from broadinstitute/rp_improve_genotyping Improvements to genotyping accuracy.	2014-07-09 16:03:09 -04:00
Laura Gauthier	99026eb51b	Update VQSR Rnd BQSR script generation code for compatibility with latest ggplot version. Update queueJobReport.R and public/gsalib/src/R/R/gsa.variantqc.utils.R also	2014-07-09 15:36:58 -04:00
Ryan Poplin	74a7674d70	Improvements to genotyping accuracy. -- Global mismapping penalty was only applied to the reference haplotype. This led to problems with overlapping events, mostly STR haplotypes. Now the penalty is applied to every haplotype. -- We subset the reads down to only those which overlap the event (after assembly based realignment) for likelihood calculations.	2014-07-09 13:11:07 -04:00
David Roazen	719e685759	Remove junit imports in the test suite	2014-07-09 12:09:27 -04:00
Eric Banks	bad7865078	When converting a haplotype to a set of variants we now check for cases that are overly complex. In these cases, where the alignment contains multiple indels, we output a single complex variant instead of the multiple partial indels. We also re-enable dangling tail recovery by default.	2014-07-01 14:18:59 -04:00
Ryan Poplin	e14bff212d	SB tables should be created even if the ref or alt columns have no counts. This is so that FS/SOR will still be calculated when the variant is extremely high or low frequency. -- Removed long running HC integration test... sorry	2014-06-30 15:19:15 -04:00
Ryan Poplin	0127799cba	Reads are now realigned to the most likely haplotype before being used by the annotations. -- AD,DP will now correspond directly to the reads that were used to construct the PLs -- RankSumTests, etc. will use the bases from the realigned reads instead of the original alignments -- There is now no additional runtime cost to realign the reads when using bamout or GVCF mode -- bamout mode no longer sets the mapping quality to zero for uninformative reads, instead the read will not be given an HC tag	2014-06-30 10:35:50 -04:00
Phillip Dexheimer	06d619e9aa	Removed redundant SelectVariantsIntegrationTest, merged it's only test into protected version	2014-06-24 18:59:59 -04:00
Eric Banks	2df2a153e6	Merge pull request #658 from broadinstitute/ldg_PbyTwithPriors Updated CalculateGenotypePosteriors to compute genotype posteriors using...	2014-06-18 15:04:39 -04:00
Laura Gauthier	2356d5d63f	Updated CalculateGenotypePosteriors to compute genotype posteriors using likelihoods from all members of the trio. (Right now it only works if all members of the trio are called.) Takes posteriors as input, defaulting to PLs Added annotations for possible de novos for us in full genotype refinement pipeline Added family priors to CGP integration test. Changed CGP to use PP tag instead of GP tag because posteriors are Phred-scaled. Updated CGP integration test md5s to reflect change.	2014-06-18 11:17:15 -04:00
Phillip Dexheimer	2e78815055	Added missing arguments to GenotypeGVCFs - New arguments are nda, hets, indelHeterozygosity, stand_call_conf, stand_emit_conf, ploidy, and maxAltAlleles - Addresses PT 70110918 - To do this, moved those arguments out of the StandardCallerArgumentCollection into a new GenotypeCalculationArgumentCollection, which is now included as a member of SCAC	2014-06-16 08:10:54 -04:00
droazen	3079755b4c	Merge pull request #646 from broadinstitute/ks_disable_distribution_with_private Add maven -Pgsadev flag to build private jars only	2014-06-11 11:00:31 -04:00
Khalid Shakir	f082572593	If passed -Pgsadev, don't build the distribution package.	2014-06-10 23:33:33 -04:00
Valentin Ruano Rubio	db96891d4b	Merge pull request #638 from broadinstitute/vrr_createTempFile_testfix Changed File.createTempFile to BaseTest.createTempFile calls Test	2014-05-29 10:15:05 -04:00
Valentin Ruano-Rubio	07567fdae3	Removed debug code outputing files not removed after VM exists in ReadThreadingLikelihoodCalculationEngineUnitTest. Notice however that this should not be the cause of resent problems as the code was desactivated.	2014-05-28 19:03:25 -04:00
Valentin Ruano-Rubio	e0c221470c	Changed File.createTempFile to BaseTest.createTempFile	2014-05-28 18:59:48 -04:00
EvolvedMicrobe	ef7531d4a5	Merge pull request #640 from broadinstitute/IntegerSWImplementation Change SmithWaterman to use integers instead of doubles.	2014-05-28 15:10:05 -04:00
Nigel Delaney	cc45e62e8e	Change SmithWaterman to use integers instead of doubles.	2014-05-28 13:13:14 -04:00
droazen	ac52fa581a	Merge pull request #644 from broadinstitute/ks_queue_test_temp_fix Disabled ExampleUG Queue Tests, fixed internal extensions dependency.	2014-05-28 11:29:08 -04:00
Phillip Dexheimer	c15e6fcc0e	Refactored the static lookup arrays in MathUtils (log10Cache, log10FactorialCache, jacobianLogTable) -They are now only computed when necessary -Log10Cache is dynamically resizable, either by calling get() on an out-of-range value or by calling ensureCacheContains -Log10FactorialCache and JacobianLogTable are initialized to a fixed size on first access and are not resizable -Addresses PT 69124396	2014-05-27 22:27:57 -04:00
Eric Banks	b77589696e	Merge pull request #643 from broadinstitute/rp_remove_hwp Removing HWP from GenotypeSummaries because of integer overflow issues w...	2014-05-27 17:21:19 -04:00
Khalid Shakir	6c9e68ef41	Disabled ExampleUG Queue Tests, fixed internal extensions dependency. EUG tests disabled due to new protected qscript directory path, post GATK artifact splitting.	2014-05-27 16:16:53 -04:00
David Roazen	74b51c5c7a	Improve test suite tmp file cleanup -Make BaseTest.createTempFile() mark any possible corresponding index files for deletion on exit -Make WalkerTest mark shadow BCF files and auxiliary for deletion on exit -Make VariantRecalibrationWalkersIntegrationTest mark PDF files for deletion on exit	2014-05-27 13:41:44 -04:00
Ryan Poplin	b24cff780b	Removing HWP from GenotypeSummaries because of integer overflow issues with 91K samples. Removing CCC because it is redundant.	2014-05-27 10:14:49 -04:00
Ryan Poplin	ec7c4ea2ba	Unfortunately dangling tail recovery is dangerous in exome data. Turning it off by default for now. -- disabling HC+VA integration test because, as noted in the comments, it keeps switching PairHMM implementations and giving different results at a particular site used in that particular test	2014-05-23 14:33:44 -04:00
Valentin Ruano-Rubio	979ab0453e	Moved GlobalEdgeGreedySWPairwiseAlignment to the archive	2014-05-23 01:48:48 -04:00
Valentin Ruano-Rubio	7c8a1ae892	Fix for SW to make double comparisons with a tolerance Stories: - https://www.pivotaltracker.com/story/show/69577868 Changes: - Added a epsilon difference tolerance in weight comparisons. Tests: - Added HaplotypeCallerIntegrationTest#testDifferentIndelLocationsDueToSWExactDoubleComparisonsFix - Updated md5 due to minor likelihood changes. - Disabled a test for PathUtils.calculateCigar since does not work and is unclear what is causing the error (needs original author input)	2014-05-23 01:48:48 -04:00
Khalid Shakir	b7e98bdae9	Fixed GATK docs artifact, moved protected ExampleUG tests.	2014-05-22 21:03:55 -04:00
Ryan Poplin	581843d994	Minor updates to HC docs.	2014-05-20 10:01:11 -04:00
Khalid Shakir	88d7e23c44	After talking with Mauricio and Karthik, updated MD5s and added a note about PairHMM causing test variability.	2014-05-19 17:36:41 -04:00
Karthik Gururaj	972a82d386	Changed 'sting' to 'gatk' in the VectorLoglessPairHMM classes and the C++ code	2014-05-19 17:36:41 -04:00
Khalid Shakir	3939971d78	After renaming the packages, instead of updating the JNI library used for testing bwa, moving the classes to the archive. NOTE: The migrated READEME.md has been added that will allow others to possibly ressurect this code as needed.	2014-05-19 17:36:41 -04:00
Khalid Shakir	2c854e554a	Refactored maven directories and java packages replacing "sting" with "gatk". To reduce merge conflicts, this commit modifies contents of files, while file renamings are in previous commit. See previous commit message for list of changes.	2014-05-19 17:36:39 -04:00
Khalid Shakir	4e6d43d003	Refactored maven directories and java packages replacing "sting" with "gatk". To reduce merge conflicts, this commit only renames files, while file modifications are in next commit. Some updates/fixes here are actually included in the next commit. = Maven updates Moved artifacts to new package names: * private/queue-private -> private/gatk-queue-private * private/gatk-private -> private/gatk-tools-private * public/gatk-package -> protected/gatk-package-distribution * public/queue-package -> protected/gatk-queue-package-distribution * protected/gatk-protected -> protected/gatk-tools-protected * public/queue-framework -> public/gatk-queue * public/gatk-framework -> public/gatk-tools-public New poms for new artifacts and packages: * private/gatk-package-internal * private/gatk-queue-package-internal * private/gatk-queue-extensions-internal * protected/gatk-queue-extensions-distribution * public/gatk-engine Updated references to StingText.properties to GATKText.properties. Updated ant-bridge.sh to use gatk.* properties instead of sting.. = Engine updates Renaming files containing engine parts from o.b.gatk.tools to o.b.gatk.engine. Changed package references from tools to engine for CommandLineGATK, GenomeAnalysisEngine, ReadMetrics, ReadProperties, and WalkerManager. Changed package reference tools.phonehome to engine.phonehome. Renamed classes Sting* to GATK, such as ReviewedGATKException. = Test updates Moved gatk example resources. Moved test engine files from tools to engine packages. Moved resources for phonehome to proper package. Moved test classes under o.b.gatk into packages: * o.b.g.utils.{BaseTest,ExampleToCopyUnitTest,GATKTextReporter,MD5DB,MD5Mismatch,TestNGTestTransformer} * o.b.g.engine.walkers.WalkerTest Updated package names in DependencyAnalyzerOutputLoaderUnitTest's data. = Queue updates Moving queue scripts to location where generated extensions can be used. Renamed .q to .scala, updating licenses previously missed by git hooks. Moved queue extensions to new artifact gatk-queue-extensions. Fixed import statments frequently merge-conflicting on FullProcessingPipeline.scala. = BWA Added README on how to obtain and include bwa as a library. Updated libbwa build. Fixed packaged names under bwa/java implementation. Updated contents of BWCAligner native implementation. = Other fixes Don't duplicate the resource bundle entries by both unpacking and appending. (partial fix) Staged engine and utils poms to build GATKText.properties, once Utils random generator dependency on GATK engine is fixed. Re-enabled custom testng listeners/reporters and moved testng dependencies to the gatk-root. Updated comments referencing Sting with GATK. Moved a couple untangled classes from gatk-tools-public to gatk-utils and gatk-engine.	2014-05-19 16:43:47 -04:00
Khalid Shakir	67e44985b1	Java/Scala imports updated for new package names. Fourth of four commits for picard/htsjdk package rename.	2014-05-08 19:13:31 +08:00
Laura Gauthier	bf7b97393e	Add ability to output to a file discordant loci and their respective genotypes for each sample	2014-05-07 10:12:45 -04:00
MauricioCarneiro	f03a12263a	Merge pull request #625 from broadinstitute/intel_updateCell_inlined (Optional) Inlined the code from updateCell	2014-05-07 10:11:09 -04:00
Karthik Gururaj	d9c489f928	Removed scary warning messages for VectorPairHMM	2014-05-06 10:59:24 -07:00
Karthik Gururaj	fb8578ec8e	Inlined the code from updateCell - helps Java JIT to detect hotspots and produce good native code	2014-05-06 10:37:10 -07:00
Karthik Gururaj	f6ea25b4d1	Parallel version of the JNI for the PairHMM The JNI treats shared memory as critical memory and doesn't allow any parallel reads or writes to it until the native code finishes. This is not a problem per se it is the right thing to do, but we need to enable -nct when running the haplotype caller and with it have multiple native PairHMM running for each map call. Move to a copy based memory sharing where the JNI simply copies the memory over to C++ and then has no blocked critical memory when running, allowing -nct to work. This version is slightly (almost unnoticeably) slower with -nct 1, but scales better with -nct 2-4 (we haven't tested anything beyond that because we know the GATK falls apart with higher levels of parallelism * Make VECTOR_LOGLESS_CACHING the default implementation for PairHMM. * Changed version number in pom.xml under public/VectorPairHMM * VectorPairHMM can now be compiled using gcc 4.8.x * Modified define-* to get rid of gcc warnings for extra tokens after #undefs * Added a Linux kernel version check for AVX - gcc's __builtin_cpu_supports function does not check whether the kernel supports AVX or not. * Updated PairHMM profiling code to update and print numbers only in single-thread mode * Edited README.md, pom.xml and Makefile for users to pass path to gcc 4.8.x if necessary * Moved all cpuid inline assembly to single function Changed info message to clog from cinfo * Modified version in pom.xml in VectorPairHMM from 3.1 to 3.2 * Deleted some unnecessary code * Modified C++ sandbox to print per interval timing	2014-05-02 19:12:48 -04:00
Valentin Ruano-Rubio	d563072282	Fix for CombineGVCFs and GenotypeGVCFs recurrent exception about missing PLs Story: https://www.pivotaltracker.com/story/show/68220438 Changes: - PL-less input genotypes are now uncalled and so non-variant sites when combining GVCFs. - HC GVCF/BP_RESOLUTION Mode now outputs non-variant sites in sites covered by deletions. - Fixed existing tests Test: - HaplotypeCallerGVCFIntegrationTest - ReferenceConfidenceModelUnitTest - CombineGVCFsIntegrationTest	2014-05-02 09:21:06 -04:00
Ryan Poplin	41d3069213	When we subset PLs because Alleles are removed during genotyping we also need to subset AD.	2014-04-28 15:52:26 -04:00
Ryan Poplin	06dbe74a23	Merge pull request #609 from kcibul/kc_cancersimreads extended SimulateReadsForVariants to optionally use the AF field to indi...	2014-04-28 13:31:56 -04:00
Ami Levy-Moonshine	13dd755468	create a new read transformer that refactor NDN cigar elements to one N element. story: https://www.pivotaltracker.com/story/show/69648104 description: This read transformer will refactor cigar strings that contain N-D-N elements to one N element (with total length of the three refactored elements). This is intended primarily for users of RNA-Seq data handling programs such as TopHat2. Currently we consider that the internal N-D-N motif is illegal and we error out when we encounter it. By refactoring the cigar string of those specific reads, users of TopHat and other tools can circumvent this problem without affecting the rest of their dataset. edit: address review comments - change the tool's name and change the tool to be a readTransformer instead of read filter	2014-04-28 11:29:00 -04:00
Ryan Poplin	221b999cb0	GenotypeGVCF was pulling the headers from all input rods including DBsnp. Now it pulls from just the input variant rods.	2014-04-25 13:16:28 -04:00
Laura Gauthier	9f3cbb2ef1	Improvements to CalculateGenotypePosteriors and CalibrateGenotypeLikelihoods CalculateGenotypePosteriors now only computes posterior probs for SNP sites with SNP priors (other sites have flat priors applied) CalibrateGenotypeLikelihoods had originally applied HOM_REF/HET/HOM_VAR frequencies in callset as priors before empirical quality analysis. Now has option (-noPriors) to not apply/apply flat priors. Also takes in new external probabilities files, such as those generated by CGP, from which the genotype posterior probability qualities will be read. Integration test was changed to account for new SNP-only behavior and default behavior to not use missing priors. (Also, new numRefIfMissing is 0, which should only matter in cases using few samples when you probably don't want to be doing that anyway!)	2014-04-24 08:49:42 -04:00
Valentin Ruano-Rubio	e610373169	Fixed integration test problems from previous premature merge	2014-04-20 17:11:51 -04:00
Valentin Ruano-Rubio	4e5850966a	Reengineer engine constructors	2014-04-19 17:58:14 -04:00
Valentin Ruano-Rubio	7455ac9796	Addressed revisions	2014-04-19 16:48:48 -04:00
Kristian Cibulskis	6b9e38c8bb	incorporated comments from review, made variables final, made AF paramater hidden, and added bounds checking to AF value	2014-04-16 19:29:25 -04:00
Kristian Cibulskis	7115cadbd8	extended SimulateReadsForVariants to optionally use the AF field to indicate allele fraction of the simulated event, useful in cancer and other variable ploidy use cases	2014-04-16 16:20:02 -04:00
Valentin Ruano-Rubio	08203b516e	Disentangle UG and HC Genotyper engines. Description: Transforms a delegation dependency from HC to UG genotyping engine into a reusage by inhertance where HC and UG engines inherit from a common superclass GenotyperEngine that implements the common parts. A side-effect some of the code is now more clear and redundant code has been removed. Changes have a few consequence for the end user. HC has now a few more user arguments, those that control the functionality that HC was borrowing directly from UGE. Added -ploidy argument although it is contraint to be 2 for now. Added -out_mode EMIT_ALL_SITES\|EMIT_VARIANTS_ONLY ... Added -allSitePLs flag. Stories: https://www.pivotaltracker.com/story/show/68017394 Changes: - Moved (HC's) GenotyperEngine to HaplotypeCallerGenotyperEngine (HCGE). Then created a engine superclass class GenotypingEngine (GE) that contains common parts between HCGE and the UG counterpart 'UnifiedGenotypingEngine' (UGE). Simplified the code and applied the template pattern to accomodate for small diferences in behaviour between both caller engines. (There is still room for improvement though). - Moved inner classes and enums to top-level components for various reasons including making them shorter and simpler names to refer to them. - Create a HomoSpiens class for Human specific constants; even if they are good default for most users we need to clearly identify the human assumption across the code if we want to make GATK work with any species in general; i.e. any reference to HomoSapiens, except as a default value for a user argument, should smell. - Fixed a bug deep in the genotyping calculation we were taking on fixed values for snp and indel heterozygisity to be the default for Human ignoring user arguments. - GenotypingLikehooldCalculationCModel.Model to Gen.Like.Calc.*Model.Name; not a definitive solution though as names are used often in conditionals that perhaps should be member methods of the GenLikeCalc classes. - Renamed LikelihoodCalculationEngine to ReadLikelihoodCalculationEngine to distinguish them clearly from Genotype likelihood calculation engines. - Changed copy by explicity argument listing to a clone/reflexion solution for casting between genotypers argument collection classes. - Created GenotypeGivenAllelesUtils to collect methods needed nearly exclusively by the GGA mode. Tests : - StandardCallerArgumentCollectionUnitTest (check copy by cloning/reflexion). - All existing integration and unit tests for modified classes.	2014-04-13 03:09:55 -04:00
Khalid Shakir	a6b0754990	After comments from @nh13, updated latest picard and setMateInfo call.	2014-04-08 15:22:45 -04:00
Khalid Shakir	3047d6ff32	BQSRGatherer handles missing read groups from some input files. [#68720468 ]	2014-04-08 23:58:54 +08:00
Eric Banks	ad336375dc	Merge pull request #590 from broadinstitute/vrr_validate_variants_unused_alleles_fix Addresses issue with strict validation on GVCF files.	2014-04-07 22:10:49 -04:00
Valentin Ruano-Rubio	5afcc8e05f	Change in the command line interface of ValidateVariants. Following reviewers comments the command line interface has been simplified. All extra strict validations are performed by default (as before) and the user has to indicate which one he/she does not want to use with --validationTypeToExclude. Before he/she was able to indicate the only ones to apply with --validationType but that has been scrapped out. Stories: - https://www.pivotaltracker.com/story/show/68725164 Changes: - Removed validateType argument. - Improved documentation. - Added some warnning log message on suspicious argument combinations. Tests: - ValidateVariantsIntegrationTest#*	2014-04-07 16:27:11 -04:00
Ryan Poplin	f058224b3e	Adding GenotypeSummaries as INFO field annotations. -- This is needed so the ref model pipeline can cut down to sites-only files without losing these useful statistics. -- Added new unit test to test this info field annotation. -- GenotypeGVCF integration tests change because new annotations are present in the output	2014-04-06 11:50:10 -04:00
MauricioCarneiro	84861fa10a	Merge pull request #587 from broadinstitute/eb_actually_fail_on_reduced_bams Make sure to fail in all cases where the BAM being used was created by ReduceReads.	2014-04-04 17:27:57 -04:00
Valentin Ruano-Rubio	18deeec6b0	Addresses issue with strict validation on GVCF files. More concretelly Picard's strict VCF validation does not like that there is alternative alleles that are not participating in any genotype call across samples. This is an issue with GVCF in the single-sample pipeline where this is certainly expected with <NON_REF> and other relative unlikely alleles. To solve this issue we allow the user to exclude some of the strict validations using a new argument --validationTypeToExclude. In order to avoid the validation issue with GVCF the user needs to add the following to the command line: '--validationTypeToExclude ALLELES' Story: https://www.pivotaltracker.com/story/show/68725164 Changes: - Added validateTypeToExclude argument to ValidateVariants walker. - Implemented the selective exclusion of validation types. - Added new info and improved existing documentation of the ValidateVariants walker. Tests: - ValidateVariantsIntegrationTest#testUnusedAlleleError - ValidateVariantsIntegrationTest#testUnusedAlleleFix	2014-04-04 14:37:10 -04:00
Eric Banks	7174f8cfeb	IndelRealigner throws a user error when it encounters reads with I operators greater than the number of read bases. Added test to ensure it works.	2014-04-03 18:16:24 -04:00
Eric Banks	a3d55b3341	Make sure to fail in all cases where the BAM being used was created by ReduceReads. In some cases, the program records were being removed from the BAM headers by the GATK engine before we applied the check for reduced reads (so we did not fail appropriately). Pushed up the check to happen before the PG tags are modified and added a unit test to ensure it stays that way. It turns out that some UG tests still used reduced bams so I switched to use different ones. Based on reviewer feedback, made it more generic so that it's easy to add new unsupported tools.	2014-04-03 16:52:41 -04:00
Eric Banks	0b73573abc	Slightly modifying the way to use the IUPAC ambiguity codes in the FastaAlternateReferenceMaker. Previously it required you to create a single sample VCF and then to pass that in to the tool, but Geraldine convinced me that this was a pain for users (because they usually have multi-sample VCFs). Instead now you can pass in a multi-sample VCF and specify which sample's genotypes should be used for the IUPAC encoding. Therefore the argument changed from '--useIUPAC' to '--use_IUPAC_sample NA12878'.	2014-04-02 21:34:25 -04:00
Valentin Ruano-Rubio	84711b8e90	Fixed bug using GraphBased due to infinite likelihoods resulting from the calculation of alignment cost of very long insertion or deletions (done in linear scale) Stories: https://www.pivotaltracker.com/story/show/66263868 Bug: The problem was due to the way we were calculating the fix penalty of a large deletion or insertion. In this case we calculate the alignment likelihood of the portion or read or haplotype deletion as the penalty of that deletion/insertion without going through the full pair-hmm process. For large events this resulted in a 0 in in linear scale computations that ins transformed into an infinity in log scale. Changes: - Change to use log10 scale for calculate those penalties. - Minor addition of .gitignore to hide ./public/external-example/target which is generated by the building process.	2014-04-01 16:14:52 -04:00
Eric Banks	821fbe7260	Merge pull request #582 from broadinstitute/vrr_hc_bugfixes_dangling_heads Fix loss of key alternative haplotypes due to a change on threading star...	2014-03-31 10:42:08 -04:00
Joel Thibault	2049eb1658	Rev Picard 1.110.1763 - SamPairUtils migrated in Picard r1737 - Revert IndelRealigner changes made in commit 4f4b85 -- Those changes were based on Picard revision 1722 to net/sf/picard/sam/SamPairUtil.java -- Picard revision 1723 reverts these changes, so we also revert to match	2014-03-30 09:33:57 -04:00
Valentin Ruano-Rubio	258b2bce28	Fix loss of key alternative haplotypes due to a change on threading start policy required when recovering dangling heads. Story: - https://www.pivotaltracker.com/story/show/67601310 Change: - Unless recover-danging-heads is active, the threading starting location policy is the original one. i.e. just at already existing unique kmer vertices. Tests: - HaplotypeCallerIntegrationTest#testMissingKeyAlternativeHaplotypesBugFix	2014-03-29 22:40:26 -04:00
Ryan Poplin	6566dd6ca9	Fix for dropping of reference sample depth in the DP annotation. -- In the case of hierarchical merge we can't assume that we have only one genotype. -- Removed use of deprecated VC annotation access functions.	2014-03-24 14:01:50 -04:00
Eric Banks	32a96e3ab3	Fix for reads that are all insertions (e.g. 50I) and causing the IndelRealigner to error out.	2014-03-21 15:01:34 -04:00
Eric Banks	7c8ce3cd6a	Several improvements to GenotypeGVCFs: --includeNonVariantSites now actually works and we propagate AD to hom ref samples	2014-03-20 00:35:54 -04:00
Eric Banks	824983af1d	Enable CombineGVCFs to process gVCFs that were created with basepair resolution.	2014-03-19 19:23:05 -04:00
Eric Banks	3b1c337401	Have CombineVariants throw a UserError when trying to combine GVCFs from the HaplotypeCaller. Was previously throwing an IllegalArgumentException (in the wrong place in the code). Error message tells users to use CombineGVCFs.	2014-03-19 19:11:40 -04:00
Valentin Ruano-Rubio	905b6066b2	Reduce runtime of very long integration test	2014-03-18 21:48:13 -04:00
David Roazen	2d8653f493	Update pom versions to mark the start of GATK 3.2 development	2014-03-18 01:18:59 -04:00
David Roazen	a6a41c777c	Update pom versions for 3.1	2014-03-18 01:09:29 -04:00
Alec Wysoker	0369f93b24	GATK changes to conform to Tribble refactoring as part improving Tabix support in Tribble (among other things). 1. Enable on-the-fly indexing for vcf.gz. 2. Handle on-the-fly indexing where file to be indexed is not a regular file, thus index should not be created. 3. Add method setProgressLogger to all SAMFileWriter implementations. 4. Revved picard to 1.109.1722 5. IndelRealigner md5s change because the MC tag is added to records now. Fixed up and signed off by ebanks.	2014-03-17 11:56:22 -04:00
Eric Banks	34c697bf12	Merge pull request #554 from broadinstitute/bh_SOR_new_annotation Bh sor new annotation	2014-03-17 10:58:13 -04:00
Laura Gauthier	40c13d446a	Added documentation category for CalculateGenotypePosteriors	2014-03-17 10:36:19 -04:00
Valentin Ruano-Rubio	2e964c59b4	Improved criteria to select best haplotypes out from the assembly graph. Currently the best haplotypes are those that accumulate the largest ABSOLUTE edge multiplicity sum across their path in the assembly graph. The edge mulitplicity is equal to the number of reads that expand through that edge, i.e. have a kmer that uniquely map to some vertex up-stream from the edge and the following base calls extend across that edge to vertices downstream from it. Despite that it is obvious that higher multiplicties correlated with haplotype probability this criterion fails short in some regards of which the most relevant is: As it is evaluated in condensed seq-graph (as supposed to uncompressed read-threading-graphs) it is bias to haplotypes that have more short-sequence vetices ( -> ATGC -> CA -> has worse score than -> A -> T -> G -> C -> C -> A ->). This is partly result of how we modify the edge multiplicities when we merge vertices from a linear chain. This pull-request addresses the problem by changing to a new scoring schema based in likelihood estimates: Each haplotype's likelihood can be calculated as the multiplication of the likelihood of "taking" its edges in the assembly graph. The likelihood of "taking" an edge in the assembly graph is calculated as its multiplicity divide by the sum of multiplicity of edges that share the same source vertex. This pull-request addresses the following stories: https://www.pivotaltracker.com/story/show/66691418 https://www.pivotaltracker.com/story/show/64319760 Change Summary: 1. Change to the new scoring schema. 2. Added a graph DOT printing code to KBestHaplotypeFinder in order to diagnose scoring. 3. Graph transformation have been modified in order to generate no 0-multiplicity edges. (Nevertheless the schema above should work with 0 edges assuming that they are in fact 0.5)	2014-03-14 18:37:01 -04:00
Bertrand Haas	82108d110f	New abstract class StrandBiasTest() with old sub-class FisherStrand() and new sub-class StrandOddsRatio(). Latter is test based on symmetric odds ratio more appropriate than Fisher exact test when number of samples is large. https://www.pivotaltracker.com/story/show/66087886	2014-03-14 18:33:21 -04:00
Eric Banks	7c7ff90266	Merge pull request #558 from broadinstitute/rp_vqsr_nondeterminism_fix Fix for non-determinism in the VQSR with very large data sets	2014-03-12 14:35:51 -04:00
Eric Banks	ffaf92f871	Added new functionality to the FastaAlternateReferenceMaker to have it output IUPAC codes for het sites. Enable it with the new --useIUPAC argument. Added both unit and integration tests for the new functionality - and fixed up the exising tests once I was in there.	2014-03-12 14:31:57 -04:00
Ryan Poplin	907d1d6160	Fix for non-determinism in the VQSR with very large data sets	2014-03-12 10:25:12 -04:00
ldgauthier	4e74e77e74	Merge pull request #555 from broadinstitute/eb_add_option_to_CGVCFs_for_all_sites_GVCF Added an option to CombineGVCFs to create basepair resolution gVCFs from...	2014-03-12 10:01:18 -04:00
David Roazen	c67ced5f3b	Emit a warning whenever the VectorLoglessPairHMM is used	2014-03-12 09:55:35 -04:00
Eric Banks	d697e0144f	Added an option to CombineGVCFs to create basepair resolution gVCFs from banded ones. Use the --convertToBasePairResolution argument to enable this functionality.	2014-03-12 01:32:51 -04:00
Ryan Poplin	34d11fe40c	Added the consensus mode used for the 1000 Genomes Project to the HaplotypeCaller. -- All the provided alleles are added to the assembly graph as potential haplotypes but they aren't forcibly genotyped like in GGA mode. -- Added integration test for this mode	2014-03-11 09:56:35 -04:00
droazen	8b53567dc7	Merge pull request #553 from broadinstitute/dr_rename_pipeline_tests Rename existing PipelineTests to QueueTests to prepare for upcoming push of new pipeline tests	2014-03-10 21:36:45 -04:00
David Roazen	78562c14bb	Rename existing PipelineTests to QueueTests to prepare for upcoming push of new pipeline tests -These tests are really integration tests for Queue rather than generalized pipeline tests, so it makes sense to call them QueueTests. -Rename test classes and maven build targets, and update shell scripts to reflect new naming.	2014-03-10 21:24:03 -04:00
David Roazen	7c34f05082	Merge remote-tracking branch 'origin/master' into intel	2014-03-10 14:07:36 -04:00
David Roazen	5a6aa54673	Revert "Update HaplotypeCaller and VariantAnnotator test MD5s" This reverts commit 7faa44d576b06d7aef29562e82590a7855f216f4.	2014-03-10 14:06:51 -04:00
David Roazen	e7d6db033b	Revert "Revert "Change default HaplotypeCaller PairHMM implementation back to LOGLESS_CACHING"" This reverts commit c8a34749e631b92214a57bba162c6e0d849425f1.	2014-03-10 14:05:51 -04:00
David Roazen	f070583f29	Update HaplotypeCaller and VariantAnnotator test MD5s There are a few innocuous test failures on this branch -- updating MD5s after reviewing the differences in output	2014-03-07 10:54:27 -05:00
Karthik Gururaj	6e98e9e589	Removed g_haplotype* global variables in native code so that it works with multi-threading in Java. Modified VectorLoglessPairHMM.java so that jniInitializeRegion and jniFinalizeRegion are empty	2014-03-06 22:08:35 -08:00
David Roazen	3f3df90412	Revert "Change default HaplotypeCaller PairHMM implementation back to LOGLESS_CACHING" This reverts commit cef03f089fb3f131f3a77664b71feaec51a74cc8.	2014-03-06 10:15:35 -05:00
David Roazen	9df59bd8cc	Update pom versions to mark the start of GATK 3.1 development	2014-03-06 00:05:58 -05:00
David Roazen	34edcb8ddf	Update pom versions for the 3.0 release	2014-03-05 23:37:21 -05:00
David Roazen	53895e15cd	Change default HaplotypeCaller PairHMM implementation back to LOGLESS_CACHING	2014-03-05 19:26:37 -05:00
Eric Banks	d3de6413c9	Move warnings to debug logging status because they will definitely scare users	2014-03-05 15:05:21 -05:00
Karthik Gururaj	51b8ea5d59	Reset version	2014-03-05 11:19:08 -08:00
Karthik Gururaj	b9afe800ae	Merge correction	2014-03-05 10:06:45 -08:00
Karthik Gururaj	8fcbf9272c	Merge branch 'intel_pairhmm' of /data/broad/gsa-unstable into intel_pairhmm Conflicts: protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java public/VectorPairHMM/src/main/c++/Sandbox.java	2014-03-05 09:35:50 -08:00
Intel Repocontact	d81116eb1d	Added vectorized PairHMM implementation by Mohammad and Mustafa into the Maven build of GATK. C++ code has PAPI calls for reading hardware counters Followed Khalid's suggestion for packing libVectorLoglessCaching into the jar file with Maven Native library part of git repo 1. Renamed directory structure from public/c++/VectorPairHMM to public/VectorPairHMM/src/main/c++ as per Khalid's suggestion 2. Use java.home in public/VectorPairHMM/pom.xml to pass environment variable JRE_HOME to the make process. This is needed because the Makefile needs to compile JNI code with the flag -I<JRE_HOME>/../include (among others). Assuming that the Maven build process uses a JDK (and not just a JRE), the variable java.home points to the JRE inside maven. 3. Dropped all pretense at cross-platform compatibility. Removed Mac profile from pom.xml for VectorPairHMM Moved JNI_README 1. Added the catch UnsatisfiedLinkError exception in PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING in case the native library could not be loaded. Made VECTOR_LOGLESS_CACHING as the default implementation. 2. Updated the README with Mauricio's comments 3. baseline.cc is used within the library - if the machine supports neither AVX nor SSE4.1, the native library falls back to un-vectorized C++ in baseline.cc. 4. pairhmm-1-base.cc: This is not part of the library, but is being heavily used for debugging/profiling. Can I request that we keep it there for now? In the next release, we can delete it from the repository. 5. I agree with Mauricio about the ifdefs. I am sure you already know, but just to reassure you the debug code is not compiled into the library (because of the ifdefs) and will not affect performance. 1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java 2. Committing the right set of files after rebase Added public license text to all C++ files Added license to Makefile Add package info to Sandbox.java Conflicts: protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/DebugJNILoglessPairHMM.java protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/JNILoglessPairHMM.java protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/VectorLoglessPairHMM.java public/VectorPairHMM/src/main/c++/.gitignore public/VectorPairHMM/src/main/c++/LoadTimeInitializer.cc public/VectorPairHMM/src/main/c++/LoadTimeInitializer.h public/VectorPairHMM/src/main/c++/Makefile public/VectorPairHMM/src/main/c++/Sandbox.cc public/VectorPairHMM/src/main/c++/Sandbox.h public/VectorPairHMM/src/main/c++/Sandbox.java public/VectorPairHMM/src/main/c++/Sandbox_JNIHaplotypeDataHolderClass.h public/VectorPairHMM/src/main/c++/Sandbox_JNIReadDataHolderClass.h public/VectorPairHMM/src/main/c++/baseline.cc public/VectorPairHMM/src/main/c++/define-double.h public/VectorPairHMM/src/main/c++/define-float.h public/VectorPairHMM/src/main/c++/define-sse-double.h public/VectorPairHMM/src/main/c++/define-sse-float.h public/VectorPairHMM/src/main/c++/headers.h public/VectorPairHMM/src/main/c++/jnidebug.h public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.cc public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.h public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.cc public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.h public/VectorPairHMM/src/main/c++/pairhmm-template-kernel.cc public/VectorPairHMM/src/main/c++/pairhmm-template-main.cc public/VectorPairHMM/src/main/c++/run.sh public/VectorPairHMM/src/main/c++/shift_template.c public/VectorPairHMM/src/main/c++/utils.cc public/VectorPairHMM/src/main/c++/utils.h public/VectorPairHMM/src/main/c++/vector_function_prototypes.h	2014-03-05 09:30:29 -08:00
Laura Gauthier	43fdd38342	Add error handling to CalculateGenotypePosteriors to catch multiallelic variants with wrong number of ACs -- throws UserException; added tests in PosteriorLikelihoodsUtilsUnitTests Add error handling to CalculateGenotypePosteriors for cases where MLEAC>AN; add tests in PosteriorLikelihoodsUtilsUnitTests Add unit tests to confirm that CalculateGenotypePosteriors has the ability to switch genotypes for four cases	2014-03-05 12:03:18 -05:00
Laura Gauthier	7f9f58dbd1	Added hidden flag to GenotypeConcordance to output sites of discordant genotypes (to System.out) Revised ConcondanceMetrics tests to adapt to change Added comments to PosteriorLikelihoodsUtils	2014-03-05 12:03:18 -05:00

... 3 4 5 6 7 ...

1463 Commits (e34ec0fbbba2dc1d424449ed30dc3b53e4ce6f6f)