Commit Graph

1109 Commits (bffc9fbabd12eddaed5deaad600e68ba7e9084e1)

Author SHA1 Message Date
Eric Banks 2df2a153e6 Merge pull request #658 from broadinstitute/ldg_PbyTwithPriors
Updated CalculateGenotypePosteriors to compute genotype posteriors using...
2014-06-18 15:04:39 -04:00
Laura Gauthier 2356d5d63f Updated CalculateGenotypePosteriors to compute genotype posteriors using likelihoods from all members of the trio.
(Right now it only works if all members of the trio are called.)
Takes posteriors as input, defaulting to PLs
Added annotations for possible de novos for us in full genotype refinement pipeline
Added family priors to CGP integration test.
Changed CGP to use PP tag instead of GP tag because posteriors are Phred-scaled. Updated CGP integration test md5s to reflect change.
2014-06-18 11:17:15 -04:00
Phillip Dexheimer 2e78815055 Added missing arguments to GenotypeGVCFs
- New arguments are nda, hets, indelHeterozygosity, stand_call_conf, stand_emit_conf, ploidy, and maxAltAlleles
 - Addresses PT 70110918
 - To do this, moved those arguments out of the StandardCallerArgumentCollection into a new GenotypeCalculationArgumentCollection, which is now included as a member of SCAC
2014-06-16 08:10:54 -04:00
droazen 3079755b4c Merge pull request #646 from broadinstitute/ks_disable_distribution_with_private
Add maven -Pgsadev flag to build private jars only
2014-06-11 11:00:31 -04:00
Khalid Shakir f082572593 If passed -Pgsadev, don't build the distribution package. 2014-06-10 23:33:33 -04:00
Valentin Ruano Rubio db96891d4b Merge pull request #638 from broadinstitute/vrr_createTempFile_testfix
Changed File.createTempFile to BaseTest.createTempFile calls Test
2014-05-29 10:15:05 -04:00
Valentin Ruano-Rubio 07567fdae3 Removed debug code outputing files not removed after VM exists in ReadThreadingLikelihoodCalculationEngineUnitTest.
Notice however that this should not be the cause of resent problems as the code was desactivated.
2014-05-28 19:03:25 -04:00
Valentin Ruano-Rubio e0c221470c Changed File.createTempFile to BaseTest.createTempFile 2014-05-28 18:59:48 -04:00
EvolvedMicrobe ef7531d4a5 Merge pull request #640 from broadinstitute/IntegerSWImplementation
Change SmithWaterman to use integers instead of doubles.
2014-05-28 15:10:05 -04:00
Nigel Delaney cc45e62e8e Change SmithWaterman to use integers instead of doubles. 2014-05-28 13:13:14 -04:00
droazen ac52fa581a Merge pull request #644 from broadinstitute/ks_queue_test_temp_fix
Disabled ExampleUG Queue Tests, fixed internal extensions dependency.
2014-05-28 11:29:08 -04:00
Phillip Dexheimer c15e6fcc0e Refactored the static lookup arrays in MathUtils (log10Cache, log10FactorialCache, jacobianLogTable)
-They are now only computed when necessary
 -Log10Cache is dynamically resizable, either by calling get() on an out-of-range value or by calling ensureCacheContains
 -Log10FactorialCache and JacobianLogTable are initialized to a fixed size on first access and are not resizable
 -Addresses PT 69124396
2014-05-27 22:27:57 -04:00
Eric Banks b77589696e Merge pull request #643 from broadinstitute/rp_remove_hwp
Removing HWP from GenotypeSummaries because of integer overflow issues w...
2014-05-27 17:21:19 -04:00
Khalid Shakir 6c9e68ef41 Disabled ExampleUG Queue Tests, fixed internal extensions dependency.
EUG tests disabled due to new protected qscript directory path, post GATK artifact splitting.
2014-05-27 16:16:53 -04:00
David Roazen 74b51c5c7a Improve test suite tmp file cleanup
-Make BaseTest.createTempFile() mark any possible corresponding index files for deletion on exit

-Make WalkerTest mark shadow BCF files and auxiliary for deletion on exit

-Make VariantRecalibrationWalkersIntegrationTest mark PDF files for deletion on exit
2014-05-27 13:41:44 -04:00
Ryan Poplin b24cff780b Removing HWP from GenotypeSummaries because of integer overflow issues with 91K samples. Removing CCC because it is redundant. 2014-05-27 10:14:49 -04:00
Ryan Poplin ec7c4ea2ba Unfortunately dangling tail recovery is dangerous in exome data. Turning it off by default for now.
-- disabling HC+VA integration test because, as noted in the comments, it keeps switching PairHMM implementations and giving different results at a particular site used in that particular test
2014-05-23 14:33:44 -04:00
Valentin Ruano-Rubio 979ab0453e Moved GlobalEdgeGreedySWPairwiseAlignment to the archive 2014-05-23 01:48:48 -04:00
Valentin Ruano-Rubio 7c8a1ae892 Fix for SW to make double comparisons with a tolerance
Stories:

  - https://www.pivotaltracker.com/story/show/69577868

Changes:

  - Added a epsilon difference tolerance in weight comparisons.

Tests:

  - Added HaplotypeCallerIntegrationTest#testDifferentIndelLocationsDueToSWExactDoubleComparisonsFix
  - Updated md5 due to minor likelihood changes.
  - Disabled a test for PathUtils.calculateCigar since does not work and is unclear what is causing the error (needs original author input)
2014-05-23 01:48:48 -04:00
Khalid Shakir b7e98bdae9 Fixed GATK docs artifact, moved protected ExampleUG tests. 2014-05-22 21:03:55 -04:00
Ryan Poplin 581843d994 Minor updates to HC docs. 2014-05-20 10:01:11 -04:00
Khalid Shakir 88d7e23c44 After talking with Mauricio and Karthik, updated MD5s and added a note about PairHMM causing test variability. 2014-05-19 17:36:41 -04:00
Karthik Gururaj 972a82d386 Changed 'sting' to 'gatk' in the VectorLoglessPairHMM classes and the
C++ code
2014-05-19 17:36:41 -04:00
Khalid Shakir 3939971d78 After renaming the packages, instead of updating the JNI library used for testing bwa, moving the classes to the archive.
NOTE: The migrated READEME.md has been added that will allow others to possibly ressurect this code as needed.
2014-05-19 17:36:41 -04:00
Khalid Shakir 2c854e554a Refactored maven directories and java packages replacing "sting" with "gatk".
To reduce merge conflicts, this commit modifies contents of files, while file renamings are in previous commit.
See previous commit message for list of changes.
2014-05-19 17:36:39 -04:00
Khalid Shakir 4e6d43d003 Refactored maven directories and java packages replacing "sting" with "gatk".
To reduce merge conflicts, this commit only renames files, while file modifications are in next commit.
Some updates/fixes here are actually included in the next commit.
= Maven updates
Moved artifacts to new package names:
* private/queue-private -> private/gatk-queue-private
* private/gatk-private -> private/gatk-tools-private
* public/gatk-package -> protected/gatk-package-distribution
* public/queue-package -> protected/gatk-queue-package-distribution
* protected/gatk-protected -> protected/gatk-tools-protected
* public/queue-framework -> public/gatk-queue
* public/gatk-framework -> public/gatk-tools-public
New poms for new artifacts and packages:
* private/gatk-package-internal
* private/gatk-queue-package-internal
* private/gatk-queue-extensions-internal
* protected/gatk-queue-extensions-distribution
* public/gatk-engine
Updated references to StingText.properties to GATKText.properties.
Updated ant-bridge.sh to use gatk.* properties instead of sting.*.
= Engine updates
Renaming files containing engine parts from o.b.gatk.tools to o.b.gatk.engine.
Changed package references from tools to engine for CommandLineGATK, GenomeAnalysisEngine, ReadMetrics, ReadProperties, and WalkerManager.
Changed package reference tools.phonehome to engine.phonehome.
Renamed classes *Sting* to *GATK*, such as ReviewedGATKException.
= Test updates
Moved gatk example resources.
Moved test engine files from tools to engine packages.
Moved resources for phonehome to proper package.
Moved test classes under o.b.gatk into packages:
* o.b.g.utils.{BaseTest,ExampleToCopyUnitTest,GATKTextReporter,MD5DB,MD5Mismatch,TestNGTestTransformer}
* o.b.g.engine.walkers.WalkerTest
Updated package names in DependencyAnalyzerOutputLoaderUnitTest's data.
= Queue updates
Moving queue scripts to location where generated extensions can be used.
Renamed *.q to *.scala, updating licenses previously missed by git hooks.
Moved queue extensions to new artifact gatk-queue-extensions.
Fixed import statments frequently merge-conflicting on FullProcessingPipeline.scala.
= BWA
Added README on how to obtain and include bwa as a library.
Updated libbwa build.
Fixed packaged names under bwa/java implementation.
Updated contents of BWCAligner native implementation.
= Other fixes
Don't duplicate the resource bundle entries by both unpacking *and* appending.
(partial fix) Staged engine and utils poms to build GATKText.properties, once Utils random generator dependency on GATK engine is fixed.
Re-enabled custom testng listeners/reporters and moved testng dependencies to the gatk-root.
Updated comments referencing Sting with GATK.
Moved a couple untangled classes from gatk-tools-public to gatk-utils and gatk-engine.
2014-05-19 16:43:47 -04:00
Khalid Shakir 67e44985b1 Java/Scala imports updated for new package names.
Fourth of four commits for picard/htsjdk package rename.
2014-05-08 19:13:31 +08:00
Laura Gauthier bf7b97393e Add ability to output to a file discordant loci and their respective genotypes for each sample 2014-05-07 10:12:45 -04:00
MauricioCarneiro f03a12263a Merge pull request #625 from broadinstitute/intel_updateCell_inlined
(Optional) Inlined the code from updateCell
2014-05-07 10:11:09 -04:00
Karthik Gururaj d9c489f928 Removed scary warning messages for VectorPairHMM 2014-05-06 10:59:24 -07:00
Karthik Gururaj fb8578ec8e Inlined the code from updateCell - helps Java JIT to detect hotspots and
produce good native code
2014-05-06 10:37:10 -07:00
Karthik Gururaj f6ea25b4d1 Parallel version of the JNI for the PairHMM
The JNI treats shared memory as critical memory and doesn't allow any
parallel reads or writes to it until the native code finishes. This is
not a problem *per se* it is the right thing to do, but we need to
enable **-nct** when running the haplotype caller and with it have
multiple native PairHMM running for each map call.

Move to a copy based memory sharing where the JNI simply copies the
memory over to C++ and then has no blocked critical memory when running,
allowing -nct to work.

This version is slightly (almost unnoticeably) slower with -nct 1, but
scales better with -nct 2-4 (we haven't tested anything beyond that
because we know the GATK falls apart with higher levels of parallelism

* Make VECTOR_LOGLESS_CACHING the default implementation for PairHMM.
* Changed version number in pom.xml under public/VectorPairHMM
* VectorPairHMM can now be compiled using gcc 4.8.x
* Modified define-* to get rid of gcc warnings for extra tokens after #undefs
* Added a Linux kernel version check for AVX - gcc's __builtin_cpu_supports function does not check whether the kernel supports AVX or not.
* Updated PairHMM profiling code to update and print numbers only in single-thread mode
* Edited README.md, pom.xml and Makefile for users to pass path to gcc 4.8.x if necessary
* Moved all cpuid inline assembly to single function Changed info message to clog from cinfo
* Modified version in pom.xml in VectorPairHMM from 3.1 to 3.2
* Deleted some unnecessary code
* Modified C++ sandbox to print per interval timing
2014-05-02 19:12:48 -04:00
Valentin Ruano-Rubio d563072282 Fix for CombineGVCFs and GenotypeGVCFs recurrent exception about missing PLs
Story:

  https://www.pivotaltracker.com/story/show/68220438

Changes:

   - PL-less input genotypes are now uncalled and so non-variant sites when combining GVCFs.
   - HC GVCF/BP_RESOLUTION Mode now outputs non-variant sites in sites covered by deletions.
   - Fixed existing tests

Test:

   - HaplotypeCallerGVCFIntegrationTest
   - ReferenceConfidenceModelUnitTest
   - CombineGVCFsIntegrationTest
2014-05-02 09:21:06 -04:00
Ryan Poplin 41d3069213 When we subset PLs because Alleles are removed during genotyping we also need to subset AD. 2014-04-28 15:52:26 -04:00
Ryan Poplin 06dbe74a23 Merge pull request #609 from kcibul/kc_cancersimreads
extended SimulateReadsForVariants to optionally use the AF field to indi...
2014-04-28 13:31:56 -04:00
Ami Levy-Moonshine 13dd755468 create a new read transformer that refactor NDN cigar elements to one N element.
story:
https://www.pivotaltracker.com/story/show/69648104

description:
This read transformer will refactor cigar strings that contain N-D-N elements to one N element (with total length of the three refactored elements).
This is intended primarily for users of RNA-Seq data handling programs such as TopHat2.
Currently we consider that the internal N-D-N motif is illegal and we error out when we encounter it. By refactoring the cigar string of
those specific reads, users of TopHat and other tools can circumvent this problem without affecting the rest of their dataset.

edit: address review comments - change the tool's name and change the tool to be a readTransformer instead of read filter
2014-04-28 11:29:00 -04:00
Ryan Poplin 221b999cb0 GenotypeGVCF was pulling the headers from all input rods including DBsnp. Now it pulls from just the input variant rods. 2014-04-25 13:16:28 -04:00
Laura Gauthier 9f3cbb2ef1 Improvements to CalculateGenotypePosteriors and CalibrateGenotypeLikelihoods
CalculateGenotypePosteriors now only computes posterior probs for SNP sites with SNP priors
(other sites have flat priors applied)

CalibrateGenotypeLikelihoods had originally applied HOM_REF/HET/HOM_VAR frequencies in callset as priors before empirical quality analysis. Now has option (-noPriors) to not apply/apply flat priors. Also takes in new external probabilities files, such as those generated by CGP, from which the genotype posterior probability qualities will be read.

Integration test was changed to account for new SNP-only behavior and default behavior to not use missing priors.

(Also, new numRefIfMissing is 0, which should only matter in cases using few samples when you probably don't want to be doing that anyway!)
2014-04-24 08:49:42 -04:00
Valentin Ruano-Rubio e610373169 Fixed integration test problems from previous premature merge 2014-04-20 17:11:51 -04:00
Valentin Ruano-Rubio 4e5850966a Reengineer engine constructors 2014-04-19 17:58:14 -04:00
Valentin Ruano-Rubio 7455ac9796 Addressed revisions 2014-04-19 16:48:48 -04:00
Kristian Cibulskis 6b9e38c8bb incorporated comments from review, made variables final, made AF paramater hidden, and added bounds checking to AF value 2014-04-16 19:29:25 -04:00
Kristian Cibulskis 7115cadbd8 extended SimulateReadsForVariants to optionally use the AF field to indicate allele fraction of the simulated event, useful in cancer and other variable ploidy use cases 2014-04-16 16:20:02 -04:00
Valentin Ruano-Rubio 08203b516e Disentangle UG and HC Genotyper engines.
Description:

  Transforms a delegation dependency from HC to UG genotyping engine into a reusage by inhertance where HC and UG engines inherit from a common superclass GenotyperEngine
  that implements the common parts. A side-effect some of the code is now more clear and redundant code has been removed.

  Changes have a few consequence for the end user. HC has now a few more user arguments, those that control the functionality that HC was borrowing directly from UGE.

     Added -ploidy argument although it is contraint to be 2 for now.
     Added -out_mode EMIT_ALL_SITES|EMIT_VARIANTS_ONLY ...
     Added -allSitePLs flag.

Stories:

   https://www.pivotaltracker.com/story/show/68017394

Changes:

   - Moved (HC's) GenotyperEngine to HaplotypeCallerGenotyperEngine (HCGE). Then created a engine superclass class GenotypingEngine (GE) that contains common parts between HCGE and the UG counterpart 'UnifiedGenotypingEngine' (UGE). Simplified the code and applied the template pattern to accomodate for small diferences in behaviour between both caller
   engines. (There is still room for improvement though).

   - Moved inner classes and enums to top-level components for various reasons including making them shorter and simpler names to refer to them.

   - Create a HomoSpiens class for Human specific constants; even if they are good default for most users we need to clearly identify the human assumption across the code if we want to make
   GATK work with any species in general; i.e. any reference to HomoSapiens, except as a default value for a user argument, should smell.

   - Fixed a bug deep in the genotyping calculation we were taking on fixed values for snp and indel heterozygisity to be the default for Human ignoring user arguments.

   - GenotypingLikehooldCalculationCModel.Model to Gen.*Like.*Calc.*Model.Name; not a definitive solution though as names are used often in conditionals that perhaps should be member methods of the
     GenLikeCalc classes.

   - Renamed LikelihoodCalculationEngine to ReadLikelihoodCalculationEngine to distinguish them clearly from Genotype likelihood calculation engines.

   - Changed copy by explicity argument listing to a clone/reflexion solution for casting between genotypers argument collection classes.

   - Created GenotypeGivenAllelesUtils to collect methods needed nearly exclusively by the GGA mode.

Tests :

    - StandardCallerArgumentCollectionUnitTest (check copy by cloning/reflexion).
    - All existing integration and unit tests for modified classes.
2014-04-13 03:09:55 -04:00
Khalid Shakir a6b0754990 After comments from @nh13, updated latest picard and setMateInfo call. 2014-04-08 15:22:45 -04:00
Khalid Shakir 3047d6ff32 BQSRGatherer handles missing read groups from some input files. [#68720468] 2014-04-08 23:58:54 +08:00
Eric Banks ad336375dc Merge pull request #590 from broadinstitute/vrr_validate_variants_unused_alleles_fix
Addresses issue with strict validation on GVCF files.
2014-04-07 22:10:49 -04:00
Valentin Ruano-Rubio 5afcc8e05f Change in the command line interface of ValidateVariants.
Following reviewers comments the command line interface has been simplified.
All extra strict validations are performed by default (as before) and the
user has to indicate which one he/she does not want to use with --validationTypeToExclude.

Before he/she was able to indicate the only ones to apply with --validationType but that has been scrapped out.

Stories:

    - https://www.pivotaltracker.com/story/show/68725164

Changes:

    - Removed validateType argument.
    - Improved documentation.
    - Added some warnning log message on suspicious argument combinations.

Tests:

    - ValidateVariantsIntegrationTest#*
2014-04-07 16:27:11 -04:00
Ryan Poplin f058224b3e Adding GenotypeSummaries as INFO field annotations.
-- This is needed so the ref model pipeline can cut down to sites-only files without losing these useful statistics.
-- Added new unit test to test this info field annotation.
-- GenotypeGVCF integration tests change because new annotations are present in the output
2014-04-06 11:50:10 -04:00
MauricioCarneiro 84861fa10a Merge pull request #587 from broadinstitute/eb_actually_fail_on_reduced_bams
Make sure to fail in all cases where the BAM being used was created by ReduceReads.
2014-04-04 17:27:57 -04:00