gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	2df2a153e6	Merge pull request #658 from broadinstitute/ldg_PbyTwithPriors Updated CalculateGenotypePosteriors to compute genotype posteriors using...	2014-06-18 15:04:39 -04:00
Laura Gauthier	2356d5d63f	Updated CalculateGenotypePosteriors to compute genotype posteriors using likelihoods from all members of the trio. (Right now it only works if all members of the trio are called.) Takes posteriors as input, defaulting to PLs Added annotations for possible de novos for us in full genotype refinement pipeline Added family priors to CGP integration test. Changed CGP to use PP tag instead of GP tag because posteriors are Phred-scaled. Updated CGP integration test md5s to reflect change.	2014-06-18 11:17:15 -04:00
Phillip Dexheimer	2e78815055	Added missing arguments to GenotypeGVCFs - New arguments are nda, hets, indelHeterozygosity, stand_call_conf, stand_emit_conf, ploidy, and maxAltAlleles - Addresses PT 70110918 - To do this, moved those arguments out of the StandardCallerArgumentCollection into a new GenotypeCalculationArgumentCollection, which is now included as a member of SCAC	2014-06-16 08:10:54 -04:00
droazen	3079755b4c	Merge pull request #646 from broadinstitute/ks_disable_distribution_with_private Add maven -Pgsadev flag to build private jars only	2014-06-11 11:00:31 -04:00
Khalid Shakir	f082572593	If passed -Pgsadev, don't build the distribution package.	2014-06-10 23:33:33 -04:00
Valentin Ruano Rubio	db96891d4b	Merge pull request #638 from broadinstitute/vrr_createTempFile_testfix Changed File.createTempFile to BaseTest.createTempFile calls Test	2014-05-29 10:15:05 -04:00
Valentin Ruano-Rubio	07567fdae3	Removed debug code outputing files not removed after VM exists in ReadThreadingLikelihoodCalculationEngineUnitTest. Notice however that this should not be the cause of resent problems as the code was desactivated.	2014-05-28 19:03:25 -04:00
Valentin Ruano-Rubio	e0c221470c	Changed File.createTempFile to BaseTest.createTempFile	2014-05-28 18:59:48 -04:00
EvolvedMicrobe	ef7531d4a5	Merge pull request #640 from broadinstitute/IntegerSWImplementation Change SmithWaterman to use integers instead of doubles.	2014-05-28 15:10:05 -04:00
Nigel Delaney	cc45e62e8e	Change SmithWaterman to use integers instead of doubles.	2014-05-28 13:13:14 -04:00
droazen	ac52fa581a	Merge pull request #644 from broadinstitute/ks_queue_test_temp_fix Disabled ExampleUG Queue Tests, fixed internal extensions dependency.	2014-05-28 11:29:08 -04:00
Phillip Dexheimer	c15e6fcc0e	Refactored the static lookup arrays in MathUtils (log10Cache, log10FactorialCache, jacobianLogTable) -They are now only computed when necessary -Log10Cache is dynamically resizable, either by calling get() on an out-of-range value or by calling ensureCacheContains -Log10FactorialCache and JacobianLogTable are initialized to a fixed size on first access and are not resizable -Addresses PT 69124396	2014-05-27 22:27:57 -04:00
Eric Banks	b77589696e	Merge pull request #643 from broadinstitute/rp_remove_hwp Removing HWP from GenotypeSummaries because of integer overflow issues w...	2014-05-27 17:21:19 -04:00
Khalid Shakir	6c9e68ef41	Disabled ExampleUG Queue Tests, fixed internal extensions dependency. EUG tests disabled due to new protected qscript directory path, post GATK artifact splitting.	2014-05-27 16:16:53 -04:00
David Roazen	74b51c5c7a	Improve test suite tmp file cleanup -Make BaseTest.createTempFile() mark any possible corresponding index files for deletion on exit -Make WalkerTest mark shadow BCF files and auxiliary for deletion on exit -Make VariantRecalibrationWalkersIntegrationTest mark PDF files for deletion on exit	2014-05-27 13:41:44 -04:00
Ryan Poplin	b24cff780b	Removing HWP from GenotypeSummaries because of integer overflow issues with 91K samples. Removing CCC because it is redundant.	2014-05-27 10:14:49 -04:00
Ryan Poplin	ec7c4ea2ba	Unfortunately dangling tail recovery is dangerous in exome data. Turning it off by default for now. -- disabling HC+VA integration test because, as noted in the comments, it keeps switching PairHMM implementations and giving different results at a particular site used in that particular test	2014-05-23 14:33:44 -04:00
Valentin Ruano-Rubio	979ab0453e	Moved GlobalEdgeGreedySWPairwiseAlignment to the archive	2014-05-23 01:48:48 -04:00
Valentin Ruano-Rubio	7c8a1ae892	Fix for SW to make double comparisons with a tolerance Stories: - https://www.pivotaltracker.com/story/show/69577868 Changes: - Added a epsilon difference tolerance in weight comparisons. Tests: - Added HaplotypeCallerIntegrationTest#testDifferentIndelLocationsDueToSWExactDoubleComparisonsFix - Updated md5 due to minor likelihood changes. - Disabled a test for PathUtils.calculateCigar since does not work and is unclear what is causing the error (needs original author input)	2014-05-23 01:48:48 -04:00
Khalid Shakir	b7e98bdae9	Fixed GATK docs artifact, moved protected ExampleUG tests.	2014-05-22 21:03:55 -04:00
Ryan Poplin	581843d994	Minor updates to HC docs.	2014-05-20 10:01:11 -04:00
Khalid Shakir	88d7e23c44	After talking with Mauricio and Karthik, updated MD5s and added a note about PairHMM causing test variability.	2014-05-19 17:36:41 -04:00
Karthik Gururaj	972a82d386	Changed 'sting' to 'gatk' in the VectorLoglessPairHMM classes and the C++ code	2014-05-19 17:36:41 -04:00
Khalid Shakir	3939971d78	After renaming the packages, instead of updating the JNI library used for testing bwa, moving the classes to the archive. NOTE: The migrated READEME.md has been added that will allow others to possibly ressurect this code as needed.	2014-05-19 17:36:41 -04:00
Khalid Shakir	2c854e554a	Refactored maven directories and java packages replacing "sting" with "gatk". To reduce merge conflicts, this commit modifies contents of files, while file renamings are in previous commit. See previous commit message for list of changes.	2014-05-19 17:36:39 -04:00
Khalid Shakir	4e6d43d003	Refactored maven directories and java packages replacing "sting" with "gatk". To reduce merge conflicts, this commit only renames files, while file modifications are in next commit. Some updates/fixes here are actually included in the next commit. = Maven updates Moved artifacts to new package names: * private/queue-private -> private/gatk-queue-private * private/gatk-private -> private/gatk-tools-private * public/gatk-package -> protected/gatk-package-distribution * public/queue-package -> protected/gatk-queue-package-distribution * protected/gatk-protected -> protected/gatk-tools-protected * public/queue-framework -> public/gatk-queue * public/gatk-framework -> public/gatk-tools-public New poms for new artifacts and packages: * private/gatk-package-internal * private/gatk-queue-package-internal * private/gatk-queue-extensions-internal * protected/gatk-queue-extensions-distribution * public/gatk-engine Updated references to StingText.properties to GATKText.properties. Updated ant-bridge.sh to use gatk.* properties instead of sting.. = Engine updates Renaming files containing engine parts from o.b.gatk.tools to o.b.gatk.engine. Changed package references from tools to engine for CommandLineGATK, GenomeAnalysisEngine, ReadMetrics, ReadProperties, and WalkerManager. Changed package reference tools.phonehome to engine.phonehome. Renamed classes Sting* to GATK, such as ReviewedGATKException. = Test updates Moved gatk example resources. Moved test engine files from tools to engine packages. Moved resources for phonehome to proper package. Moved test classes under o.b.gatk into packages: * o.b.g.utils.{BaseTest,ExampleToCopyUnitTest,GATKTextReporter,MD5DB,MD5Mismatch,TestNGTestTransformer} * o.b.g.engine.walkers.WalkerTest Updated package names in DependencyAnalyzerOutputLoaderUnitTest's data. = Queue updates Moving queue scripts to location where generated extensions can be used. Renamed .q to .scala, updating licenses previously missed by git hooks. Moved queue extensions to new artifact gatk-queue-extensions. Fixed import statments frequently merge-conflicting on FullProcessingPipeline.scala. = BWA Added README on how to obtain and include bwa as a library. Updated libbwa build. Fixed packaged names under bwa/java implementation. Updated contents of BWCAligner native implementation. = Other fixes Don't duplicate the resource bundle entries by both unpacking and appending. (partial fix) Staged engine and utils poms to build GATKText.properties, once Utils random generator dependency on GATK engine is fixed. Re-enabled custom testng listeners/reporters and moved testng dependencies to the gatk-root. Updated comments referencing Sting with GATK. Moved a couple untangled classes from gatk-tools-public to gatk-utils and gatk-engine.	2014-05-19 16:43:47 -04:00
Khalid Shakir	67e44985b1	Java/Scala imports updated for new package names. Fourth of four commits for picard/htsjdk package rename.	2014-05-08 19:13:31 +08:00
Laura Gauthier	bf7b97393e	Add ability to output to a file discordant loci and their respective genotypes for each sample	2014-05-07 10:12:45 -04:00
MauricioCarneiro	f03a12263a	Merge pull request #625 from broadinstitute/intel_updateCell_inlined (Optional) Inlined the code from updateCell	2014-05-07 10:11:09 -04:00
Karthik Gururaj	d9c489f928	Removed scary warning messages for VectorPairHMM	2014-05-06 10:59:24 -07:00
Karthik Gururaj	fb8578ec8e	Inlined the code from updateCell - helps Java JIT to detect hotspots and produce good native code	2014-05-06 10:37:10 -07:00
Karthik Gururaj	f6ea25b4d1	Parallel version of the JNI for the PairHMM The JNI treats shared memory as critical memory and doesn't allow any parallel reads or writes to it until the native code finishes. This is not a problem per se it is the right thing to do, but we need to enable -nct when running the haplotype caller and with it have multiple native PairHMM running for each map call. Move to a copy based memory sharing where the JNI simply copies the memory over to C++ and then has no blocked critical memory when running, allowing -nct to work. This version is slightly (almost unnoticeably) slower with -nct 1, but scales better with -nct 2-4 (we haven't tested anything beyond that because we know the GATK falls apart with higher levels of parallelism * Make VECTOR_LOGLESS_CACHING the default implementation for PairHMM. * Changed version number in pom.xml under public/VectorPairHMM * VectorPairHMM can now be compiled using gcc 4.8.x * Modified define-* to get rid of gcc warnings for extra tokens after #undefs * Added a Linux kernel version check for AVX - gcc's __builtin_cpu_supports function does not check whether the kernel supports AVX or not. * Updated PairHMM profiling code to update and print numbers only in single-thread mode * Edited README.md, pom.xml and Makefile for users to pass path to gcc 4.8.x if necessary * Moved all cpuid inline assembly to single function Changed info message to clog from cinfo * Modified version in pom.xml in VectorPairHMM from 3.1 to 3.2 * Deleted some unnecessary code * Modified C++ sandbox to print per interval timing	2014-05-02 19:12:48 -04:00
Valentin Ruano-Rubio	d563072282	Fix for CombineGVCFs and GenotypeGVCFs recurrent exception about missing PLs Story: https://www.pivotaltracker.com/story/show/68220438 Changes: - PL-less input genotypes are now uncalled and so non-variant sites when combining GVCFs. - HC GVCF/BP_RESOLUTION Mode now outputs non-variant sites in sites covered by deletions. - Fixed existing tests Test: - HaplotypeCallerGVCFIntegrationTest - ReferenceConfidenceModelUnitTest - CombineGVCFsIntegrationTest	2014-05-02 09:21:06 -04:00
Ryan Poplin	41d3069213	When we subset PLs because Alleles are removed during genotyping we also need to subset AD.	2014-04-28 15:52:26 -04:00
Ryan Poplin	06dbe74a23	Merge pull request #609 from kcibul/kc_cancersimreads extended SimulateReadsForVariants to optionally use the AF field to indi...	2014-04-28 13:31:56 -04:00
Ami Levy-Moonshine	13dd755468	create a new read transformer that refactor NDN cigar elements to one N element. story: https://www.pivotaltracker.com/story/show/69648104 description: This read transformer will refactor cigar strings that contain N-D-N elements to one N element (with total length of the three refactored elements). This is intended primarily for users of RNA-Seq data handling programs such as TopHat2. Currently we consider that the internal N-D-N motif is illegal and we error out when we encounter it. By refactoring the cigar string of those specific reads, users of TopHat and other tools can circumvent this problem without affecting the rest of their dataset. edit: address review comments - change the tool's name and change the tool to be a readTransformer instead of read filter	2014-04-28 11:29:00 -04:00
Ryan Poplin	221b999cb0	GenotypeGVCF was pulling the headers from all input rods including DBsnp. Now it pulls from just the input variant rods.	2014-04-25 13:16:28 -04:00
Laura Gauthier	9f3cbb2ef1	Improvements to CalculateGenotypePosteriors and CalibrateGenotypeLikelihoods CalculateGenotypePosteriors now only computes posterior probs for SNP sites with SNP priors (other sites have flat priors applied) CalibrateGenotypeLikelihoods had originally applied HOM_REF/HET/HOM_VAR frequencies in callset as priors before empirical quality analysis. Now has option (-noPriors) to not apply/apply flat priors. Also takes in new external probabilities files, such as those generated by CGP, from which the genotype posterior probability qualities will be read. Integration test was changed to account for new SNP-only behavior and default behavior to not use missing priors. (Also, new numRefIfMissing is 0, which should only matter in cases using few samples when you probably don't want to be doing that anyway!)	2014-04-24 08:49:42 -04:00
Valentin Ruano-Rubio	e610373169	Fixed integration test problems from previous premature merge	2014-04-20 17:11:51 -04:00
Valentin Ruano-Rubio	4e5850966a	Reengineer engine constructors	2014-04-19 17:58:14 -04:00
Valentin Ruano-Rubio	7455ac9796	Addressed revisions	2014-04-19 16:48:48 -04:00
Kristian Cibulskis	6b9e38c8bb	incorporated comments from review, made variables final, made AF paramater hidden, and added bounds checking to AF value	2014-04-16 19:29:25 -04:00
Kristian Cibulskis	7115cadbd8	extended SimulateReadsForVariants to optionally use the AF field to indicate allele fraction of the simulated event, useful in cancer and other variable ploidy use cases	2014-04-16 16:20:02 -04:00
Valentin Ruano-Rubio	08203b516e	Disentangle UG and HC Genotyper engines. Description: Transforms a delegation dependency from HC to UG genotyping engine into a reusage by inhertance where HC and UG engines inherit from a common superclass GenotyperEngine that implements the common parts. A side-effect some of the code is now more clear and redundant code has been removed. Changes have a few consequence for the end user. HC has now a few more user arguments, those that control the functionality that HC was borrowing directly from UGE. Added -ploidy argument although it is contraint to be 2 for now. Added -out_mode EMIT_ALL_SITES\|EMIT_VARIANTS_ONLY ... Added -allSitePLs flag. Stories: https://www.pivotaltracker.com/story/show/68017394 Changes: - Moved (HC's) GenotyperEngine to HaplotypeCallerGenotyperEngine (HCGE). Then created a engine superclass class GenotypingEngine (GE) that contains common parts between HCGE and the UG counterpart 'UnifiedGenotypingEngine' (UGE). Simplified the code and applied the template pattern to accomodate for small diferences in behaviour between both caller engines. (There is still room for improvement though). - Moved inner classes and enums to top-level components for various reasons including making them shorter and simpler names to refer to them. - Create a HomoSpiens class for Human specific constants; even if they are good default for most users we need to clearly identify the human assumption across the code if we want to make GATK work with any species in general; i.e. any reference to HomoSapiens, except as a default value for a user argument, should smell. - Fixed a bug deep in the genotyping calculation we were taking on fixed values for snp and indel heterozygisity to be the default for Human ignoring user arguments. - GenotypingLikehooldCalculationCModel.Model to Gen.Like.Calc.*Model.Name; not a definitive solution though as names are used often in conditionals that perhaps should be member methods of the GenLikeCalc classes. - Renamed LikelihoodCalculationEngine to ReadLikelihoodCalculationEngine to distinguish them clearly from Genotype likelihood calculation engines. - Changed copy by explicity argument listing to a clone/reflexion solution for casting between genotypers argument collection classes. - Created GenotypeGivenAllelesUtils to collect methods needed nearly exclusively by the GGA mode. Tests : - StandardCallerArgumentCollectionUnitTest (check copy by cloning/reflexion). - All existing integration and unit tests for modified classes.	2014-04-13 03:09:55 -04:00
Khalid Shakir	a6b0754990	After comments from @nh13, updated latest picard and setMateInfo call.	2014-04-08 15:22:45 -04:00
Khalid Shakir	3047d6ff32	BQSRGatherer handles missing read groups from some input files. [#68720468 ]	2014-04-08 23:58:54 +08:00
Eric Banks	ad336375dc	Merge pull request #590 from broadinstitute/vrr_validate_variants_unused_alleles_fix Addresses issue with strict validation on GVCF files.	2014-04-07 22:10:49 -04:00
Valentin Ruano-Rubio	5afcc8e05f	Change in the command line interface of ValidateVariants. Following reviewers comments the command line interface has been simplified. All extra strict validations are performed by default (as before) and the user has to indicate which one he/she does not want to use with --validationTypeToExclude. Before he/she was able to indicate the only ones to apply with --validationType but that has been scrapped out. Stories: - https://www.pivotaltracker.com/story/show/68725164 Changes: - Removed validateType argument. - Improved documentation. - Added some warnning log message on suspicious argument combinations. Tests: - ValidateVariantsIntegrationTest#*	2014-04-07 16:27:11 -04:00
Ryan Poplin	f058224b3e	Adding GenotypeSummaries as INFO field annotations. -- This is needed so the ref model pipeline can cut down to sites-only files without losing these useful statistics. -- Added new unit test to test this info field annotation. -- GenotypeGVCF integration tests change because new annotations are present in the output	2014-04-06 11:50:10 -04:00
MauricioCarneiro	84861fa10a	Merge pull request #587 from broadinstitute/eb_actually_fail_on_reduced_bams Make sure to fail in all cases where the BAM being used was created by ReduceReads.	2014-04-04 17:27:57 -04:00

1 2 3 4 5 ...

1109 Commits (bffc9fbabd12eddaed5deaad600e68ba7e9084e1)