Commit Graph

4585 Commits (ed7ff65b2ecbc1ccd7f02cf066ea3caa00cc6775)

Author SHA1 Message Date
Phillip Dexheimer 7e77875c81 Improvements to read-group filtering in PrintReads
- Read groups that are excluded by sample_name, platform, or read_group arguments no longer appear in the header
 - The performance penalty associated with filtering by read group has been essentially eliminated
 - Partial fulfillment of PT 73075482
2014-08-11 20:08:16 -04:00
Valentin Ruano-Rubio 9a9a68409e ReadLikelihoods class introduction final changes before merging
Stories:

        https://www.pivotaltracker.com/story/show/70222086
        https://www.pivotaltracker.com/story/show/67961652

Changes:

  Done some changes that I missed in relation with making sure that all PairHMM implentations use the same interface; as a consequence we were running always the standard PairHMM.
  Fixed some additional bugs detected when running it on full wgs single sample and exom multi sample data set.
  Updated some integration test md5s.

Fixing GraphBased bugs with new master code
Fixed ReadLikelihoods.changeReads difficult to spot bug.
Changed PairHMM interface to fix a bug
Fixed missing changes for various PairHMM implementations to get them to use the new structure.
Fixed various bugs only detectable when running with full sample(s).
Believe to have fixed the lack of annotations in UG runs
Fixed integrationt test MD5s
Updating some md5s
Fixed yet another md5 probably left out by mistake
2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio 0b472f6bff Added new test to verify the functionality of ReadLikelihoods.java and its use in HC. Updated existing integration test md5s.
Stories:

    https://www.pivotaltracker.com/story/show/70222086
    https://www.pivotaltracker.com/story/show/67961652
2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio 2914ecb585 Change the Map-of-maps-of-maps for an array based implementation ReadLikelihoods to hold read likelihoods.
The array structure should be faster to populate and query (no properly benchmarked) and reduce memory footprint considerably.
    Nevertheless removing PairHMM factor (using likelihoodEngine Random) it only achieves a speed up of 15% in some example WGS dataset
    i.e. there are other bigger bottle necks in the system. Bamboo tests also seem to run significantly faster with this change.

    Stories:

      https://www.pivotaltracker.com/story/show/70222086
      https://www.pivotaltracker.com/story/show/67961652

    Changes:

       - ReadLikelihoods added to substitute  Map<String,PerSampleReadLikelihoods>
       - Operation that involve changes in full sets of ReadLikelihoods have been moved into that class.
       - Simplified a bit the code that handles the downsampling of reads based on contamination

    Caveats:

       - Still we keep Map<String,PerReadAlleleLikelihoodsMap> around to pass to annotators..., didn't feel like change the interface of so many public classes in this pull-request.
2014-08-11 17:46:28 -04:00
Valentin Ruano-Rubio 09ac3779d6 Added ReadLikelihoods component to substitute Map<String,PerReadAlleleLikelihoodMap>.
It uses a more efficient java array[] based implementation and encapsulates operations perform with such a
read-likelihood collection such as marginalization, filtering by position, poor modeling or capping
worst likelihoods and so forth.

Stories:

          https://www.pivotaltracker.com/story/show/70222086
          https://www.pivotaltracker.com/story/show/67961652
2014-08-11 17:46:28 -04:00
Eric Banks 5f31e54d67 Merge pull request #696 from broadinstitute/pd_DoC_sorting
Fix sample sort order bug in DepthOfCoverage
2014-08-06 08:35:35 -04:00
Phillip Dexheimer b0c026e671 Fix sample sort order bug in DepthOfCoverage
Rare bug triggered by hash collision between sample names
 PT 66183936
2014-08-05 21:55:34 -04:00
Phillip Dexheimer 593663d9b6 Improved detection of missing argument values
In particular, it was possible to specify arguments for Files or Compound types without values
 Added a special "none" value for annotations, since a bare "-A" is no longer allowed
 Delivers PT 71792842 and 59360374
2014-08-05 20:31:31 -04:00
Phillip Dexheimer 359fe150c9 Documentation fix (closed HTML tag) 2014-08-04 23:19:16 -04:00
Laura Gauthier 4373922ee6 Update GATK to work with latest htsjdk
ValidationStringency was moved from htsjdk.samtools.SAMFileReader to htsjdk.samtools
samtools find BAM index file method was also moved (and made public!)
2014-07-30 12:05:14 -04:00
Eric Banks 84af1fc75f The copy constructor for a GATKSAMRecord (used for testing only) should use the actual read's contig index, not its mate's. 2014-07-23 15:31:03 -04:00
David Roazen 0798a4b768 Update pom versions to mark the start of GATK 3.3 development 2014-07-17 12:09:33 -04:00
David Roazen 323f22f852 Update pom versions for the 3.2 release 2014-07-17 12:06:22 -04:00
Geraldine Van der Auwera a6f632874b Various documentation improvements
- Edited intervals merging docs for correctness & clarity
- Edited VQSR arg docs and made mode required (+added -mode SNP to VQSR tests)
- Moved PaperGenotyper to Toy Walkers to declutter the actually useful docs
- Moved GenotypeGVCFs to Variant Discovery category and clarified a few points
- Clarified that the -resource argument depends on using the -V:tag format
- Clarified how the pcr indel model works
- Added caveat for -U ALLOW_N_CIGAR_READS
- Added MathJax support for displaying equations in GATKDocs
- Updated HC example commands and caveats
2014-07-14 12:03:03 -04:00
Eric Banks ecefcb383d Disable the complex variant merging for now, as requested by ATGU 2014-07-11 17:27:40 -04:00
droazen b8751ad598 Merge pull request #680 from broadinstitute/ldg_VQSRscript
Update VQSR Rnd BQSR  script generation code for compatibility with late...
2014-07-11 10:16:37 -04:00
Khalid Shakir 18f6d56b4c Revert "Using the base directory for each test run when outputting MD5DB mismatches."
This reverts commit f192f032a153755a84b1d682f6e652a7c6787fb9.
2014-07-11 01:11:25 +08:00
Khalid Shakir cc09ef9190 Revert "Appending to md5db in the gatkdir, with additional logging."
This reverts commit 0aa2884f7b006f5d48c325bf942b92c183e45074.
2014-07-11 01:11:20 +08:00
kshakir aecd34d274 Merge pull request #677 from broadinstitute/ks_md5_db_per_test_type
Appending to md5db in the gatkdir, with additional logging.
2014-07-10 17:53:24 +08:00
Khalid Shakir a7d1904c63 Appending to md5db in the gatkdir, with additional logging. 2014-07-10 03:58:47 +08:00
Laura Gauthier 99026eb51b Update VQSR Rnd BQSR script generation code for compatibility with latest ggplot version. Update queueJobReport.R and public/gsalib/src/R/R/gsa.variantqc.utils.R also 2014-07-09 15:36:58 -04:00
David Roazen 719e685759 Remove junit imports in the test suite 2014-07-09 12:09:27 -04:00
Khalid Shakir 2129aa05d8 Bug fix for poms missing package test artifacts. 2014-07-08 06:34:26 +08:00
Khalid Shakir e5be9c7073 Using the base directory for each test run when outputting MD5DB mismatches. 2014-07-08 06:34:25 +08:00
Eric Banks bad7865078 When converting a haplotype to a set of variants we now check for cases that are overly complex.
In these cases, where the alignment contains multiple indels, we output a single complex
variant instead of the multiple partial indels.

We also re-enable dangling tail recovery by default.
2014-07-01 14:18:59 -04:00
Ryan Poplin 0127799cba Reads are now realigned to the most likely haplotype before being used by the annotations.
-- AD,DP will now correspond directly to the reads that were used to construct the PLs
-- RankSumTests, etc. will use the bases from the realigned reads instead of the original alignments
-- There is now no additional runtime cost to realign the reads when using bamout or GVCF mode
-- bamout mode no longer sets the mapping quality to zero for uninformative reads, instead the read will not be given an HC tag
2014-06-30 10:35:50 -04:00
Khalid Shakir 7b5f88a49c Refactored DoC custom Queue wrappers to a non-package object.
Now, "mvn verify && mvn verify" should work again.
2014-06-26 00:59:18 +08:00
droazen b935ed0df1 Merge pull request #665 from broadinstitute/ks_force_delete_bad_symlinks
Executing a version of the delete_maven_links.sh
2014-06-25 00:13:05 -04:00
Phillip Dexheimer 06d619e9aa Removed redundant SelectVariantsIntegrationTest, merged it's only test into protected version 2014-06-24 18:59:59 -04:00
Khalid Shakir 45d819a00e For now, executing the delete_maven_links.sh just ahead of creating the symbolic links during the process-test-resources phase.
Better than running it during the "clean" phase, since these users may not run "mvn clean" before attempting to build.
2014-06-25 02:32:15 +08:00
Phillip Dexheimer 65eeb4a7ab Recast the "Invalid JEXL expression detected" error in SelectVariants from a RuntimeException to a UserException
- PT 68931448
2014-06-20 00:05:23 -04:00
Phillip Dexheimer da5e567b73 Added functionality to CatVariants to process .list files with -V
- Pivotal 70305712
2014-06-19 21:46:13 -04:00
Ryan Poplin da1dab6c32 Merge pull request #661 from broadinstitute/jw_allele_balance_gvcf
Enable AB annotation in reference model pipeline. Incorporates patches f...
2014-06-19 13:10:41 -04:00
Eric Banks 1092dd6e25 From Carlos Barroto: switch outputRoot in SplitSamFile to an empty string instead of null. 2014-06-19 11:06:55 -04:00
Eric Banks 9212edba41 From Carlos Barroto: made 'level' in Picard's CalculateHsMetrics Scala Queue extension an argument. 2014-06-19 11:06:50 -04:00
Ryan Poplin 8b75428a90 Enable AB annotation in reference model pipeline. Incorporates patches from John Wallace to public github account 2014-06-19 09:35:04 -04:00
Nigel Delaney 7570666f2a Merge pull request #655 from broadinstitute/nfd_mathutil_opts
Optimization of function to calculate the logged sum of exponentiated values
2014-06-17 17:07:42 -04:00
Nigel Delaney 5e258bfeff Minor optimization to function to calculate the log of exponentials.
* Avoids calling Math.Pow whenever possible (skips -Inf and 0 values),
leads to better performance.
2014-06-17 15:26:10 -04:00
Chris Whelan ba1d23e535 Created a new tool, SiblingIBD, which finds Identical-By-Descent regions in two siblings.
-When parental genotypes are available, implements an HMM on genotype observations in the quartet.
   -Outputs IBD regions as well as per-site posterior probabilities of being in each IBD state.
   -Includes an experimental heuristic based mode for when parental genotypes are not available.
   -Made a method in MendelianViolation public static to reuse code.
   -Added the mockito library to private/gatk-tools-private/pom.xml
2014-06-13 09:41:37 -04:00
Menachem Fromer a1868e8b82 For XHMM and Depth-of-Coverage Qscripts, add ability for user to input sample renaming file at the GATK level using existing GATK flag (--sample_rename_mapping_file) and custom pre-processing code. For XHMM Qscript, add scatter-gather for Discovery and Genotype stages. 2014-06-09 23:49:54 -04:00
Phillip Dexheimer 4eb9858461 Ensure that output files are specified in a writeable location
-PT 69579780
2014-06-02 21:13:59 -04:00
Valentin Ruano Rubio db96891d4b Merge pull request #638 from broadinstitute/vrr_createTempFile_testfix
Changed File.createTempFile to BaseTest.createTempFile calls Test
2014-05-29 10:15:05 -04:00
Valentin Ruano-Rubio 938172d7f0 Removed redundant overrride createTempFileFromBase (same code as super class) and added some finals to DepthOfCoverageB36IntegrationTest 2014-05-28 19:02:04 -04:00
Valentin Ruano-Rubio e0c221470c Changed File.createTempFile to BaseTest.createTempFile 2014-05-28 18:59:48 -04:00
EvolvedMicrobe ef7531d4a5 Merge pull request #640 from broadinstitute/IntegerSWImplementation
Change SmithWaterman to use integers instead of doubles.
2014-05-28 15:10:05 -04:00
Nigel Delaney cc45e62e8e Change SmithWaterman to use integers instead of doubles. 2014-05-28 13:13:14 -04:00
Eric Banks ff43b1f298 Merge pull request #636 from broadinstitute/pd_log10_refactor
Replaced the static, fixed MathUtils.log10Cache array with a dynamic Log...
2014-05-28 08:46:49 -04:00
Phillip Dexheimer 6122b2805d Legibility improvements to ProgressMeter
- Fields in the header are delimited with the pipe character
 - Header is now split into two lines to improve spacing
 - Field width in header and progress lines auto-adjusts to length of "processing units" label (sites, active regions, etc)
 - Addresses PT 69725930
2014-05-27 23:52:42 -04:00
Phillip Dexheimer c15e6fcc0e Refactored the static lookup arrays in MathUtils (log10Cache, log10FactorialCache, jacobianLogTable)
-They are now only computed when necessary
 -Log10Cache is dynamically resizable, either by calling get() on an out-of-range value or by calling ensureCacheContains
 -Log10FactorialCache and JacobianLogTable are initialized to a fixed size on first access and are not resizable
 -Addresses PT 69124396
2014-05-27 22:27:57 -04:00
David Roazen 74b51c5c7a Improve test suite tmp file cleanup
-Make BaseTest.createTempFile() mark any possible corresponding index files for deletion on exit

-Make WalkerTest mark shadow BCF files and auxiliary for deletion on exit

-Make VariantRecalibrationWalkersIntegrationTest mark PDF files for deletion on exit
2014-05-27 13:41:44 -04:00
Valentin Ruano-Rubio 7c8a1ae892 Fix for SW to make double comparisons with a tolerance
Stories:

  - https://www.pivotaltracker.com/story/show/69577868

Changes:

  - Added a epsilon difference tolerance in weight comparisons.

Tests:

  - Added HaplotypeCallerIntegrationTest#testDifferentIndelLocationsDueToSWExactDoubleComparisonsFix
  - Updated md5 due to minor likelihood changes.
  - Disabled a test for PathUtils.calculateCigar since does not work and is unclear what is causing the error (needs original author input)
2014-05-23 01:48:48 -04:00
Khalid Shakir b7e98bdae9 Fixed GATK docs artifact, moved protected ExampleUG tests. 2014-05-22 21:03:55 -04:00
Karthik Gururaj 972a82d386 Changed 'sting' to 'gatk' in the VectorLoglessPairHMM classes and the
C++ code
2014-05-19 17:36:41 -04:00
Khalid Shakir 3939971d78 After renaming the packages, instead of updating the JNI library used for testing bwa, moving the classes to the archive.
NOTE: The migrated READEME.md has been added that will allow others to possibly ressurect this code as needed.
2014-05-19 17:36:41 -04:00
Khalid Shakir 2c854e554a Refactored maven directories and java packages replacing "sting" with "gatk".
To reduce merge conflicts, this commit modifies contents of files, while file renamings are in previous commit.
See previous commit message for list of changes.
2014-05-19 17:36:39 -04:00
Khalid Shakir 4e6d43d003 Refactored maven directories and java packages replacing "sting" with "gatk".
To reduce merge conflicts, this commit only renames files, while file modifications are in next commit.
Some updates/fixes here are actually included in the next commit.
= Maven updates
Moved artifacts to new package names:
* private/queue-private -> private/gatk-queue-private
* private/gatk-private -> private/gatk-tools-private
* public/gatk-package -> protected/gatk-package-distribution
* public/queue-package -> protected/gatk-queue-package-distribution
* protected/gatk-protected -> protected/gatk-tools-protected
* public/queue-framework -> public/gatk-queue
* public/gatk-framework -> public/gatk-tools-public
New poms for new artifacts and packages:
* private/gatk-package-internal
* private/gatk-queue-package-internal
* private/gatk-queue-extensions-internal
* protected/gatk-queue-extensions-distribution
* public/gatk-engine
Updated references to StingText.properties to GATKText.properties.
Updated ant-bridge.sh to use gatk.* properties instead of sting.*.
= Engine updates
Renaming files containing engine parts from o.b.gatk.tools to o.b.gatk.engine.
Changed package references from tools to engine for CommandLineGATK, GenomeAnalysisEngine, ReadMetrics, ReadProperties, and WalkerManager.
Changed package reference tools.phonehome to engine.phonehome.
Renamed classes *Sting* to *GATK*, such as ReviewedGATKException.
= Test updates
Moved gatk example resources.
Moved test engine files from tools to engine packages.
Moved resources for phonehome to proper package.
Moved test classes under o.b.gatk into packages:
* o.b.g.utils.{BaseTest,ExampleToCopyUnitTest,GATKTextReporter,MD5DB,MD5Mismatch,TestNGTestTransformer}
* o.b.g.engine.walkers.WalkerTest
Updated package names in DependencyAnalyzerOutputLoaderUnitTest's data.
= Queue updates
Moving queue scripts to location where generated extensions can be used.
Renamed *.q to *.scala, updating licenses previously missed by git hooks.
Moved queue extensions to new artifact gatk-queue-extensions.
Fixed import statments frequently merge-conflicting on FullProcessingPipeline.scala.
= BWA
Added README on how to obtain and include bwa as a library.
Updated libbwa build.
Fixed packaged names under bwa/java implementation.
Updated contents of BWCAligner native implementation.
= Other fixes
Don't duplicate the resource bundle entries by both unpacking *and* appending.
(partial fix) Staged engine and utils poms to build GATKText.properties, once Utils random generator dependency on GATK engine is fixed.
Re-enabled custom testng listeners/reporters and moved testng dependencies to the gatk-root.
Updated comments referencing Sting with GATK.
Moved a couple untangled classes from gatk-tools-public to gatk-utils and gatk-engine.
2014-05-19 16:43:47 -04:00
Phillip Dexheimer a5abc079dc Revised final Queue status line to display number of jobs in each state when the script fails
* Addresses PT 61552466
* Included a simple scala script in private/testdata that will always fail
2014-05-15 21:30:44 -04:00
jmthibault79 78560212d0 Merge pull request #630 from broadinstitute/pd_blank_lines_in_listfile
Allow blank lines in a (non-BAM) list file
2014-05-14 11:32:44 -04:00
droazen 8297cd1a1a Merge pull request #619 from broadinstitute/pd_intervalmerge_doc
Made IntervalSharder respect the IntervalMergingRule specified on the co...
2014-05-14 11:22:18 -04:00
Phillip Dexheimer 77449961ab Allow blank lines in a (non-BAM) list file
* Addresses PT Bug 67841052
 * Added Unit Test
2014-05-13 23:14:15 -04:00
Khalid Shakir 67e44985b1 Java/Scala imports updated for new package names.
Fourth of four commits for picard/htsjdk package rename.
2014-05-08 19:13:31 +08:00
Khalid Shakir cc3f1f2b96 Revved picard libraries.
Third of four commits for picard/htsjdk package rename.
2014-05-08 19:13:27 +08:00
Khalid Shakir a894a2dddb Updates to GATK classes and POMs that need updating, plus RodSystemValidation md5 updates.
GATK classes accessing package protected htsjdk classes changed to new package names.
POMs updated to support merging of sam/tribble/variant -> htsjdk and changes to picard artifact.
RodSystemValidation outputs changed due to variant codec packages changes, requiring test md5 updates.
Second of four commits for picard/htsjdk package rename.
2014-05-08 19:13:27 +08:00
Khalid Shakir 3ce3e27aa1 Moved GATK classes and POMs that will need updating.
GATK classes accessing package protected htsjdk classes will need new package names.
POMs will merge sam/tribble/variant into htsjdk.
Move only, contents updated in next commit.
First of four commits for picard/htsjdk package rename.
2014-05-08 19:13:27 +08:00
Laura Gauthier bf7b97393e Add ability to output to a file discordant loci and their respective genotypes for each sample 2014-05-07 10:12:45 -04:00
Karthik Gururaj d9c489f928 Removed scary warning messages for VectorPairHMM 2014-05-06 10:59:24 -07:00
Karthik Gururaj f6ea25b4d1 Parallel version of the JNI for the PairHMM
The JNI treats shared memory as critical memory and doesn't allow any
parallel reads or writes to it until the native code finishes. This is
not a problem *per se* it is the right thing to do, but we need to
enable **-nct** when running the haplotype caller and with it have
multiple native PairHMM running for each map call.

Move to a copy based memory sharing where the JNI simply copies the
memory over to C++ and then has no blocked critical memory when running,
allowing -nct to work.

This version is slightly (almost unnoticeably) slower with -nct 1, but
scales better with -nct 2-4 (we haven't tested anything beyond that
because we know the GATK falls apart with higher levels of parallelism

* Make VECTOR_LOGLESS_CACHING the default implementation for PairHMM.
* Changed version number in pom.xml under public/VectorPairHMM
* VectorPairHMM can now be compiled using gcc 4.8.x
* Modified define-* to get rid of gcc warnings for extra tokens after #undefs
* Added a Linux kernel version check for AVX - gcc's __builtin_cpu_supports function does not check whether the kernel supports AVX or not.
* Updated PairHMM profiling code to update and print numbers only in single-thread mode
* Edited README.md, pom.xml and Makefile for users to pass path to gcc 4.8.x if necessary
* Moved all cpuid inline assembly to single function Changed info message to clog from cinfo
* Modified version in pom.xml in VectorPairHMM from 3.1 to 3.2
* Deleted some unnecessary code
* Modified C++ sandbox to print per interval timing
2014-05-02 19:12:48 -04:00
Valentin Ruano-Rubio d563072282 Fix for CombineGVCFs and GenotypeGVCFs recurrent exception about missing PLs
Story:

  https://www.pivotaltracker.com/story/show/68220438

Changes:

   - PL-less input genotypes are now uncalled and so non-variant sites when combining GVCFs.
   - HC GVCF/BP_RESOLUTION Mode now outputs non-variant sites in sites covered by deletions.
   - Fixed existing tests

Test:

   - HaplotypeCallerGVCFIntegrationTest
   - ReferenceConfidenceModelUnitTest
   - CombineGVCFsIntegrationTest
2014-05-02 09:21:06 -04:00
Phillip Dexheimer 7a2b70a10f Made IntervalSharder respect the IntervalMergingRule specified on the command line
* This addresses PT Bug 69741902
* Added a required IMR argument to FilePointer, BAMScheduler, IntervalSharder, and SAMDataSource
* This rule is used by FilePointer.combine and FilePointer.union
* Added unit and integration tests
2014-04-30 22:07:22 -04:00
Michael McCowan fe3c68cb2d Java 8 compatability fix: `Reflections` NPE bugfix. 2014-04-29 13:34:03 -04:00
Ryan Poplin 41d3069213 When we subset PLs because Alleles are removed during genotyping we also need to subset AD. 2014-04-28 15:52:26 -04:00
kshakir 10ee35eafa Merge pull request #616 from broadinstitute/ks_cjav_pbsengine_no_default_queue
Removed setting of a default queue in PbsEngineJobRunner.
2014-04-28 14:24:51 -04:00
Ryan Poplin 06dbe74a23 Merge pull request #609 from kcibul/kc_cancersimreads
extended SimulateReadsForVariants to optionally use the AF field to indi...
2014-04-28 13:31:56 -04:00
Carlos Borroto b7a59e01aa Removed setting of a default queue in PbsEngineJobRunner. Discussed here: http://gatkforums.broadinstitute.org/discussion/3959/would-it-be-possible-for-pbsengine-jobrunner-not-to-set-a-default-queue
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-04-29 00:44:12 +08:00
Ami Levy-Moonshine 13dd755468 create a new read transformer that refactor NDN cigar elements to one N element.
story:
https://www.pivotaltracker.com/story/show/69648104

description:
This read transformer will refactor cigar strings that contain N-D-N elements to one N element (with total length of the three refactored elements).
This is intended primarily for users of RNA-Seq data handling programs such as TopHat2.
Currently we consider that the internal N-D-N motif is illegal and we error out when we encounter it. By refactoring the cigar string of
those specific reads, users of TopHat and other tools can circumvent this problem without affecting the rest of their dataset.

edit: address review comments - change the tool's name and change the tool to be a readTransformer instead of read filter
2014-04-28 11:29:00 -04:00
Michael McCowan 8290d3c8ac Allow for non-tab whitespace in sample names when performing on-the-fly sample-renaming. 2014-04-22 11:07:13 -04:00
MauricioCarneiro f03e5ffeb1 Merge pull request #604 from broadinstitute/vrr_hc_omniploidy_general_api
Disentangle UG and HC Genotyper engines.
2014-04-20 07:43:23 -04:00
Valentin Ruano-Rubio 7455ac9796 Addressed revisions 2014-04-19 16:48:48 -04:00
Ryan Poplin a9a48f2459 Merge pull request #607 from broadinstitute/mm_bugfix_raise_mathutils_n_ceiling
Support more samples in math utilities.
2014-04-17 13:32:34 -04:00
Joel Thibault 1ab50f4ba8 CatVariants now handles BCF and Block-Compressed VCF
[Delivers #67461500]
2014-04-17 12:31:38 -04:00
Kristian Cibulskis 7115cadbd8 extended SimulateReadsForVariants to optionally use the AF field to indicate allele fraction of the simulated event, useful in cancer and other variable ploidy use cases 2014-04-16 16:20:02 -04:00
Joel Thibault 4c74319578 Update for Picard refactoring which improves block-compressed VCF reading
[Delivers #69215404]
2014-04-16 14:39:23 -04:00
Joel Thibault fd09cb7143 Rev Picard 1.111.1920 2014-04-16 14:39:19 -04:00
Joel Thibault f98df5c071 Integration test for the file extensions CatVariants should handle 2014-04-16 13:25:47 -04:00
Joel Thibault bdd7024d00 Integration test for block-compressed VCF reading 2014-04-16 13:09:40 -04:00
Joel Thibault ce770b032a Move execAndCheck() to ProcessController 2014-04-16 13:09:40 -04:00
Joel Thibault b197618d13 This comment is no longer true 2014-04-15 15:42:39 -04:00
Mike f0732d386c Support more samples in math utilities.
- Amend `MathUtils`' constants such that they support callings in excess of 70,000 samples (instead, 100,000).
2014-04-14 12:05:38 -04:00
Valentin Ruano-Rubio 08203b516e Disentangle UG and HC Genotyper engines.
Description:

  Transforms a delegation dependency from HC to UG genotyping engine into a reusage by inhertance where HC and UG engines inherit from a common superclass GenotyperEngine
  that implements the common parts. A side-effect some of the code is now more clear and redundant code has been removed.

  Changes have a few consequence for the end user. HC has now a few more user arguments, those that control the functionality that HC was borrowing directly from UGE.

     Added -ploidy argument although it is contraint to be 2 for now.
     Added -out_mode EMIT_ALL_SITES|EMIT_VARIANTS_ONLY ...
     Added -allSitePLs flag.

Stories:

   https://www.pivotaltracker.com/story/show/68017394

Changes:

   - Moved (HC's) GenotyperEngine to HaplotypeCallerGenotyperEngine (HCGE). Then created a engine superclass class GenotypingEngine (GE) that contains common parts between HCGE and the UG counterpart 'UnifiedGenotypingEngine' (UGE). Simplified the code and applied the template pattern to accomodate for small diferences in behaviour between both caller
   engines. (There is still room for improvement though).

   - Moved inner classes and enums to top-level components for various reasons including making them shorter and simpler names to refer to them.

   - Create a HomoSpiens class for Human specific constants; even if they are good default for most users we need to clearly identify the human assumption across the code if we want to make
   GATK work with any species in general; i.e. any reference to HomoSapiens, except as a default value for a user argument, should smell.

   - Fixed a bug deep in the genotyping calculation we were taking on fixed values for snp and indel heterozygisity to be the default for Human ignoring user arguments.

   - GenotypingLikehooldCalculationCModel.Model to Gen.*Like.*Calc.*Model.Name; not a definitive solution though as names are used often in conditionals that perhaps should be member methods of the
     GenLikeCalc classes.

   - Renamed LikelihoodCalculationEngine to ReadLikelihoodCalculationEngine to distinguish them clearly from Genotype likelihood calculation engines.

   - Changed copy by explicity argument listing to a clone/reflexion solution for casting between genotypers argument collection classes.

   - Created GenotypeGivenAllelesUtils to collect methods needed nearly exclusively by the GGA mode.

Tests :

    - StandardCallerArgumentCollectionUnitTest (check copy by cloning/reflexion).
    - All existing integration and unit tests for modified classes.
2014-04-13 03:09:55 -04:00
Joel Thibault c84126205b Test that stdout redirects and log files do not affect output 2014-04-09 13:52:42 -04:00
Joel Thibault 1103fd231a Better exception message 2014-04-09 10:51:45 -04:00
Eric Banks b07c0a6b4c Merge pull request #594 from broadinstitute/dr_vcf_sample_renaming
Extend on-the-fly sample renaming feature to vcfs
2014-04-08 11:47:45 -04:00
David Roazen af6a897479 Extend on-the-fly sample renaming feature to vcfs
-Only works with single-sample vcfs

-As with bams, the user must provide a file mapping the absolute path to
 each vcf whose samples are to be renamed to the new sample name for that
 vcf. The argument is the same as for bams: --sample_rename_mapping_file,
 and the mapping file may contain a mix of bam and vcf files should the
 user wish.

-It's an error to attempt to remap the sample names of a multi-sample
 or sites-only vcf

-Implemented at the codec level at the instant the vcf header is first
 read in to minimize the chances of downstream code examining vcf
 headers/records before renaming occurs.

-Integration tests are in sting, unit tests are in picard

-Rev picard et. al. to 1.111.1902
2014-04-08 11:07:00 -04:00
Eric Banks e690ed1a67 The contig is named MT not M in b36. Delivers PT68890442. 2014-04-08 10:03:47 -04:00
Eric Banks ad336375dc Merge pull request #590 from broadinstitute/vrr_validate_variants_unused_alleles_fix
Addresses issue with strict validation on GVCF files.
2014-04-07 22:10:49 -04:00
Valentin Ruano-Rubio 5afcc8e05f Change in the command line interface of ValidateVariants.
Following reviewers comments the command line interface has been simplified.
All extra strict validations are performed by default (as before) and the
user has to indicate which one he/she does not want to use with --validationTypeToExclude.

Before he/she was able to indicate the only ones to apply with --validationType but that has been scrapped out.

Stories:

    - https://www.pivotaltracker.com/story/show/68725164

Changes:

    - Removed validateType argument.
    - Improved documentation.
    - Added some warnning log message on suspicious argument combinations.

Tests:

    - ValidateVariantsIntegrationTest#*
2014-04-07 16:27:11 -04:00
Ryan Poplin 7d11b4d5f1 Balancing training classes between SNP/Indel and TP/FP.
-- This results in much more consistent distribution of LOD scores for SNPs and Indels.
-- Removing genotype summary stats since they are now produced by default.
-- Added functionality to specify certain subsets of the training data to be used in Tranche file generation, -good:tranche=true set.vcf
2014-04-07 15:23:53 -04:00
MauricioCarneiro 84861fa10a Merge pull request #587 from broadinstitute/eb_actually_fail_on_reduced_bams
Make sure to fail in all cases where the BAM being used was created by ReduceReads.
2014-04-04 17:27:57 -04:00
Laura Gauthier ff25b656e1 Added check to make sure file passed in with sample IDs is valid (used in SelectVariants) -- throws UserException. Corresponding test checks for UserException. 2014-04-04 15:38:50 -04:00
Valentin Ruano-Rubio 18deeec6b0 Addresses issue with strict validation on GVCF files.
More concretelly Picard's strict VCF validation does not like that there is alternative alleles that are not participating in any genotype call across samples.

This is an issue with GVCF in the single-sample pipeline where this is certainly expected with <NON_REF> and other relative unlikely alleles.

To solve this issue we allow the user to exclude some of the strict validations using a new argument --validationTypeToExclude. In order to avoid the validation
issue with GVCF the user needs to add the following to the command line: '--validationTypeToExclude ALLELES'

Story:

    https://www.pivotaltracker.com/story/show/68725164

Changes:

    - Added validateTypeToExclude argument to ValidateVariants walker.
    - Implemented the selective exclusion of validation types.
    - Added new info and improved existing documentation of the ValidateVariants walker.

Tests:

    - ValidateVariantsIntegrationTest#testUnusedAlleleError
    - ValidateVariantsIntegrationTest#testUnusedAlleleFix
2014-04-04 14:37:10 -04:00
Laura Gauthier 06d78ba068 Expanded documentation to include description of which callsets are being compared in what order and more definitions 2014-04-04 10:35:53 -04:00
Eric Banks a3d55b3341 Make sure to fail in all cases where the BAM being used was created by ReduceReads.
In some cases, the program records were being removed from the BAM headers by the GATK engine
before we applied the check for reduced reads (so we did not fail appropriately).  Pushed up the
check to happen before the PG tags are modified and added a unit test to ensure it stays that way.
It turns out that some UG tests still used reduced bams so I switched to use different ones.

Based on reviewer feedback, made it more generic so that it's easy to add new unsupported tools.
2014-04-03 16:52:41 -04:00
Eric Banks 0b73573abc Slightly modifying the way to use the IUPAC ambiguity codes in the FastaAlternateReferenceMaker.
Previously it required you to create a single sample VCF and then to pass that in to the tool, but
Geraldine convinced me that this was a pain for users (because they usually have multi-sample VCFs).
Instead now you can pass in a multi-sample VCF and specify which sample's genotypes should be used
for the IUPAC encoding.  Therefore the argument changed from '--useIUPAC' to '--use_IUPAC_sample NA12878'.
2014-04-02 21:34:25 -04:00
Valentin Ruano-Rubio 84711b8e90 Fixed bug using GraphBased due to infinite likelihoods resulting from the calculation of alignment cost of very long insertion or deletions (done in linear scale)
Stories:

  https://www.pivotaltracker.com/story/show/66263868

Bug:

  The problem was due to the way we were calculating the fix penalty of a large deletion or insertion. In this case we calculate the alignment likelihood of the portion
  or read or haplotype deletion as the penalty of that deletion/insertion without going through the full pair-hmm process. For large events this resulted in a 0 in
  in linear scale computations that ins transformed into an infinity in log scale.

Changes:

  - Change to use log10 scale for calculate those penalties.
  - Minor addition of .gitignore to hide ./public/external-example/target which is generated by the building process.
2014-04-01 16:14:52 -04:00
Joel Thibault 70fe7f72f1 Return a TabixIndexCreator for appropriate file types
[Fixes #68291082]
2014-03-31 16:15:34 -04:00
Joel Thibault ab5634cbac Test that a Tabix index is created for block-compressed output formats
- Replace .idx and .tbi with appropriate constants
2014-03-31 14:36:48 -04:00
Joel Thibault a2d40c84ba Keep the list of zipped suffixes in sync with Variant 2014-03-31 14:36:41 -04:00
Joel Thibault a2cd9703fa Rev Picard 1.110.1773 2014-03-31 14:15:06 -04:00
Joel Thibault 2049eb1658 Rev Picard 1.110.1763
- SamPairUtils migrated in Picard r1737
- Revert IndelRealigner changes made in commit 4f4b85
-- Those changes were based on Picard revision 1722 to net/sf/picard/sam/SamPairUtil.java
-- Picard revision 1723 reverts these changes, so we also revert to match
2014-03-30 09:33:57 -04:00
Ryan Poplin 6566dd6ca9 Fix for dropping of reference sample depth in the DP annotation.
-- In the case of hierarchical merge we can't assume that we have only one genotype.
-- Removed use of deprecated VC annotation access functions.
2014-03-24 14:01:50 -04:00
Ryan Poplin 69eaf7c82d Merge pull request #577 from broadinstitute/eb_minor_fixes_for_fragment_utils
Fixed docs for method and fixed the edge case optimization to properly u...
2014-03-21 14:01:44 -04:00
Eric Banks 0d82a70633 Fixed docs for method and fixed the edge case optimization to properly use equals() on Integers.
Shouldn't affect actual results at all.
2014-03-20 15:55:09 -04:00
Eric Banks 3b1c337401 Have CombineVariants throw a UserError when trying to combine GVCFs from the HaplotypeCaller.
Was previously throwing an IllegalArgumentException (in the wrong place in the code).
Error message tells users to use CombineGVCFs.
2014-03-19 19:11:40 -04:00
David Roazen e549f4a9d2 Fix typo in UtilsUnitTest data provider name
This is currently my leading suspect for the cause of the
intermittent NoSuchElementException errors on master, since
the maven surefire plugin seems unable to handle errors in
TestNG DataProviders without blowing up.
2014-03-18 11:52:29 -04:00
David Roazen 4ba72d43cf Re-enable GATKRunReportUnitTest
This test is not, as I had initially thought, the cause of the
maven errors. Our master branch is failing intermittently
regardless of whether this test is enabled or disabled.

This reverts commit 45fc9ff515eec8d676b64a04fb34fb357492ff84.
2014-03-18 09:53:41 -04:00
David Roazen afa6abe554 Temporarily disable GATKRunReportUnitTest in unstable while maven issues are worked out
This test passes when run individually, as part of the commit tests, or as
part of the package tests. However, when running the unit tests in isolation
it causes maven/surefire to throw a NoSuchElementException.

This is clearly a maven/surefire bug or configuration issue. I will re-enable
this test on a branch as Khalid and I try to work through it.
2014-03-18 01:28:28 -04:00
David Roazen 2d8653f493 Update pom versions to mark the start of GATK 3.2 development 2014-03-18 01:18:59 -04:00
David Roazen a6a41c777c Update pom versions for 3.1 2014-03-18 01:09:29 -04:00
David Roazen d5e38ec39b Move GATKRunReport tests from private to public
-Hide AWS downloader credentials in a private properties file
-Remove references to private ActiveRegion walker

Allows phone home functionality to be tested at release time
when we are running tests on the release jar.
2014-03-17 18:29:40 -04:00
droazen 6b3320f067 Merge pull request #561 from broadinstitute/ks_package_classpath
Updated package-tests classpath, and allowing javac -cp <package>.jar.
2014-03-17 17:38:24 -04:00
Eric Banks 2e34ff7692 Merge pull request #563 from broadinstitute/aw_refactor_tribble
GATK changes to conform to Tribble refactoring as part improving Tabix s...
2014-03-17 13:35:46 -04:00
Eric Banks dabdd0a0fd Remove unused and unnecessary argument 2014-03-17 12:28:27 -04:00
Alec Wysoker 0369f93b24 GATK changes to conform to Tribble refactoring as part improving Tabix support in Tribble (among other things).
1. Enable on-the-fly indexing for vcf.gz.
2. Handle on-the-fly indexing where file to be indexed is not a regular file, thus index should not be created.
3. Add method setProgressLogger to all SAMFileWriter implementations.
4. Revved picard to 1.109.1722
5. IndelRealigner md5s change because the MC tag is added to records now.

Fixed up and signed off by ebanks.
2014-03-17 11:56:22 -04:00
Khalid Shakir 639247ab48 Updated package-tests classpath, and allowing javac -cp <package>.jar.
Package tests now hard coding just the gatk-framework tests jar, to include ONLY BaseTest, until the exclusions may be debugged.
Removing cofoja's annotation service from the package jars, to allow javac -cp <package>.jar.
2014-03-17 05:47:59 -04:00
Valentin Ruano-Rubio 2e964c59b4 Improved criteria to select best haplotypes out from the assembly graph.
Currently the best haplotypes are those that accumulate the largest ABSOLUTE edge *multiplicity* sum across their path in the assembly graph.

The edge *mulitplicity* is equal to the number of reads that expand through that edge, i.e. have a kmer that uniquely map to some vertex up-stream from the edge and the following base calls extend across that edge to vertices downstream from it.

Despite that it is obvious that higher multiplicties correlated with haplotype probability this criterion fails short in some regards of which the most relevant is:

As it is evaluated in condensed seq-graph (as supposed to uncompressed read-threading-graphs) it is bias to haplotypes that have more short-sequence vetices
  ( -> ATGC -> CA -> has worse score than -> A -> T -> G -> C -> C -> A ->). This is partly result of how we modify the edge multiplicities when we merge vertices from a linear chain.

This pull-request addresses the problem by changing to a new scoring schema based in likelihood estimates:

Each haplotype's likelihood can be calculated as the multiplication of the likelihood of "taking" its edges in the assembly graph. The likelihood of "taking" an edge in the assembly
graph is calculated as its multiplicity divide by the sum of multiplicity of edges that share the same source vertex.

This pull-request addresses the following stories:

https://www.pivotaltracker.com/story/show/66691418
https://www.pivotaltracker.com/story/show/64319760

Change Summary:

1. Change to the new scoring schema.
2. Added a graph DOT printing code to KBestHaplotypeFinder in order to diagnose scoring.
3. Graph transformation have been modified in order to generate no 0-multiplicity edges. (Nevertheless the schema above should work with 0 edges assuming that they are in fact 0.5)
2014-03-14 18:37:01 -04:00
David Roazen 1324120c17 Unconditionally include all of commons-httpclient in the GATK/Queue jars
The maven shade plugin was eliminating a necessary class (IgnoreCookiesSpec)
when packaging the GATK/Queue. Work around this by telling maven to
always package all of commons-httpclient.
2014-03-14 10:50:15 -04:00
Eric Banks ffaf92f871 Added new functionality to the FastaAlternateReferenceMaker to have it output IUPAC codes for het sites.
Enable it with the new --useIUPAC argument.
Added both unit and integration tests for the new functionality - and fixed up the
exising tests once I was in there.
2014-03-12 14:31:57 -04:00
droazen 8b53567dc7 Merge pull request #553 from broadinstitute/dr_rename_pipeline_tests
Rename existing PipelineTests to QueueTests to prepare for upcoming push of new pipeline tests
2014-03-10 21:36:45 -04:00
David Roazen 78562c14bb Rename existing PipelineTests to QueueTests to prepare for upcoming push of new pipeline tests
-These tests are really integration tests for Queue rather than generalized
 pipeline tests, so it makes sense to call them QueueTests.

-Rename test classes and maven build targets, and update shell scripts
 to reflect new naming.
2014-03-10 21:24:03 -04:00
David Roazen 7c34f05082 Merge remote-tracking branch 'origin/master' into intel 2014-03-10 14:07:36 -04:00
Ami Levy-Moonshine 2a6f05a8a1 add an option to randomly (uniformly) split a vcf file/s to more than 2 files.
The old code that allow split to two files (given in the input) is kept to allow uneven splitting between files.
2014-03-10 10:58:44 -04:00
Karthik Gururaj 6e98e9e589 Removed g_haplotype* global variables in native code so that it works
with multi-threading in Java.
Modified VectorLoglessPairHMM.java so that jniInitializeRegion and
jniFinalizeRegion are empty
2014-03-06 22:08:35 -08:00
Karthik Gururaj 3999677c93 Changed to delete[] where applicable 2014-03-06 12:23:08 -08:00
Karthik Gururaj a29777765d Binary library 2014-03-06 11:14:46 -08:00
Karthik Gururaj 7844d956ac Modified delete to delete[] 2014-03-06 11:13:34 -08:00
Karthik Gururaj 27e640d640 Modified SSE4.1 and 4.2 checks with _may_i_use_cpu_feature() 2014-03-06 08:51:11 -08:00
Karthik Gururaj 37f107cb3a Using Mustafa's function _may_i_use_cpu_feature() for AVX check 2014-03-06 08:37:48 -08:00
David Roazen 9df59bd8cc Update pom versions to mark the start of GATK 3.1 development 2014-03-06 00:05:58 -05:00
David Roazen 34edcb8ddf Update pom versions for the 3.0 release 2014-03-05 23:37:21 -05:00
David Roazen a9ddfdb7c0 Remove external-example module from public pom.xml
This module was causing failures during the release
packaging tests. After discussing with Khalid, we've
decided to disable it for now until a fix can be
developed.
2014-03-05 20:25:38 -05:00
Karthik Gururaj ec54528605 Fixed error in Sandbox.java 2014-03-05 09:36:55 -08:00
Karthik Gururaj 8fcbf9272c Merge branch 'intel_pairhmm' of /data/broad/gsa-unstable into intel_pairhmm
Conflicts:
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java
	public/VectorPairHMM/src/main/c++/Sandbox.java
2014-03-05 09:35:50 -08:00
Intel Repocontact d81116eb1d Added vectorized PairHMM implementation by Mohammad and Mustafa into the Maven build of GATK.
C++ code has PAPI calls for reading hardware counters

Followed Khalid's suggestion for packing libVectorLoglessCaching into
the jar file with Maven

Native library part of git repo

1. Renamed directory structure from public/c++/VectorPairHMM to
public/VectorPairHMM/src/main/c++ as per Khalid's suggestion
2. Use java.home in public/VectorPairHMM/pom.xml to pass environment
variable JRE_HOME to the make process. This is needed because the
Makefile needs to compile JNI code with the flag -I<JRE_HOME>/../include (among
others). Assuming that the Maven build process uses a JDK (and not just
a JRE), the variable java.home points to the JRE inside maven.
3. Dropped all pretense at cross-platform compatibility. Removed Mac
profile from pom.xml for VectorPairHMM

Moved JNI_README

1. Added the catch UnsatisfiedLinkError exception in
PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING
in case the native library could not be loaded. Made
VECTOR_LOGLESS_CACHING as the default implementation.
2. Updated the README with Mauricio's comments
3. baseline.cc is used within the library - if the machine supports
neither AVX nor SSE4.1, the native library falls back to un-vectorized
C++ in baseline.cc.
4. pairhmm-1-base.cc: This is not part of the library, but is being
heavily used for debugging/profiling. Can I request that we keep it
there for now? In the next release, we can delete it from the
repository.
5. I agree with Mauricio about the ifdefs. I am sure you already know,
but just to reassure you the debug code is not compiled into the library
(because of the ifdefs) and will not affect performance.

1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java
2. Committing the right set of files after rebase

Added public license text to all C++ files

Added license to Makefile

Add package info to Sandbox.java

Conflicts:
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/HaplotypeCaller.java
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/gatk/walkers/haplotypecaller/PairHMMLikelihoodCalculationEngine.java
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/DebugJNILoglessPairHMM.java
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/JNILoglessPairHMM.java
	protected/gatk-protected/src/main/java/org/broadinstitute/sting/utils/pairhmm/VectorLoglessPairHMM.java
	public/VectorPairHMM/src/main/c++/.gitignore
	public/VectorPairHMM/src/main/c++/LoadTimeInitializer.cc
	public/VectorPairHMM/src/main/c++/LoadTimeInitializer.h
	public/VectorPairHMM/src/main/c++/Makefile
	public/VectorPairHMM/src/main/c++/Sandbox.cc
	public/VectorPairHMM/src/main/c++/Sandbox.h
	public/VectorPairHMM/src/main/c++/Sandbox.java
	public/VectorPairHMM/src/main/c++/Sandbox_JNIHaplotypeDataHolderClass.h
	public/VectorPairHMM/src/main/c++/Sandbox_JNIReadDataHolderClass.h
	public/VectorPairHMM/src/main/c++/baseline.cc
	public/VectorPairHMM/src/main/c++/define-double.h
	public/VectorPairHMM/src/main/c++/define-float.h
	public/VectorPairHMM/src/main/c++/define-sse-double.h
	public/VectorPairHMM/src/main/c++/define-sse-float.h
	public/VectorPairHMM/src/main/c++/headers.h
	public/VectorPairHMM/src/main/c++/jnidebug.h
	public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.cc
	public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_DebugJNILoglessPairHMM.h
	public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.cc
	public/VectorPairHMM/src/main/c++/org_broadinstitute_sting_utils_pairhmm_VectorLoglessPairHMM.h
	public/VectorPairHMM/src/main/c++/pairhmm-template-kernel.cc
	public/VectorPairHMM/src/main/c++/pairhmm-template-main.cc
	public/VectorPairHMM/src/main/c++/run.sh
	public/VectorPairHMM/src/main/c++/shift_template.c
	public/VectorPairHMM/src/main/c++/utils.cc
	public/VectorPairHMM/src/main/c++/utils.h
	public/VectorPairHMM/src/main/c++/vector_function_prototypes.h
2014-03-05 09:30:29 -08:00
Laura Gauthier 43fdd38342 Add error handling to CalculateGenotypePosteriors to catch multiallelic variants with wrong number of ACs
-- throws UserException; added tests in PosteriorLikelihoodsUtilsUnitTests
Add error handling to CalculateGenotypePosteriors for cases where MLEAC>AN; add tests in PosteriorLikelihoodsUtilsUnitTests
Add unit tests to confirm that CalculateGenotypePosteriors has the ability to switch genotypes for four cases
2014-03-05 12:03:18 -05:00
Laura Gauthier 7f9f58dbd1 Added hidden flag to GenotypeConcordance to output sites of discordant genotypes (to System.out)
Revised ConcondanceMetrics tests to adapt to change
Added comments to PosteriorLikelihoodsUtils
2014-03-05 12:03:18 -05:00
Joel Thibault 57747ad35e Logger output should go to STDERR instead of STDOUT 2014-03-05 10:01:06 -05:00
Joel Thibault b4dde6a78c Add WARN to the valid log types error message
- order if statements and error message in increasing severity
2014-03-05 10:01:06 -05:00
Valentin Ruano Rubio 243d1bc07a Merge pull request #542 from broadinstitute/vrr_efficient_find_best_haplotypes
Added a more efficient implementation of the KBest haplotype finder code...
2014-03-05 09:44:50 -05:00
David Roazen 58905e8fe0 Disable the intermittently-failing and flawed ProgressMeterDaemonUnitTest
-created a Pivotal ticket to eventually redesign this test
2014-03-05 09:15:26 -05:00
Valentin Ruano-Rubio 69bf2b3247 Added a more efficient implementation of the KBest haplotype finder code (CONT.)
Changes:

  1. Addressed review comments on new K-best haplotype assembly graph finder.
  2. Generalize KBestHaplotypeFinder to deal with multiple source and sink vertices.
  3. Updated test to use KBestHaplotypeFinder instead of KBestPaths
  4. Retired KBestPaths to the archive.
  5. Small improvements to the code and documentation.
2014-03-04 23:22:27 -05:00
Valentin Ruano-Rubio 7acf2eb0e7 Added a more efficient implementation of the KBest haplotype finder code.
Story:

  https://www.pivotaltracker.com/story/show/66238286

Changes:

  1. Created a new k-best haplotype search implementation in class KBestHaplotypeFinder.
  2. Changed HC code to use the new implementation.
  This seems to fix the original problem without causing significant changes in outputs using some empirical data test cases
  3. Moved haplotype's cigar calculation code from Path to CigarUtils; need that in order to gain independence from Path in some parts of the code.
     In any case that seems like a more natural location for that functionality.
2014-03-04 12:22:14 -05:00
Karthik Gururaj a893765ae2 Added license to Makefile 2014-03-03 09:11:02 -08:00
Karthik Gururaj 7cd23543a1 Added public license text to all C++ files 2014-03-03 09:04:00 -08:00
Eric Banks 22ad18b919 Moving Reduce Reads to the archive.
The GATK now fails with a user error if you try to run with a reduced bam.
(I added a unit test for that; everything else here is just the removal of all traces of RR)
2014-03-02 02:03:14 -05:00
Khalid Shakir 387188e5bb Attempting to limit gc during Maven tests, using defaults found in JavaCommandLineFunction 2014-03-01 15:24:45 +08:00
Karthik Gururaj 1b395a871a 1. Changed logger.info to logger.warn in PairHMMLikelihoodCalculationEngine.java
2. Committing the right set of files after rebase
2014-02-28 16:08:28 -08:00
Karthik Gururaj 37526dfad5 1. Added the catch UnsatisfiedLinkError exception in
PairHMMLikelihoodCalculationEngine.java to fall back to LOGLESS_CACHING
in case the native library could not be loaded. Made
VECTOR_LOGLESS_CACHING as the default implementation.
2. Updated the README with Mauricio's comments
3. baseline.cc is used within the library - if the machine supports
neither AVX nor SSE4.1, the native library falls back to un-vectorized
C++ in baseline.cc.
4. pairhmm-1-base.cc: This is not part of the library, but is being
heavily used for debugging/profiling. Can I request that we keep it
there for now? In the next release, we can delete it from the
repository.
5. I agree with Mauricio about the ifdefs. I am sure you already know,
but just to reassure you the debug code is not compiled into the library
(because of the ifdefs) and will not affect performance.
2014-02-28 08:59:55 -08:00
Chris Whelan e61ba8b340 Added command line checks for duplicate files in ROD lists
-- Keep a list of processed files in ArgumentTypeDescriptor.getRodBindingsCollection
  -- Throw user exception if a file name duplicates one that was previously parsed
  -- Throw user exception if the ROD list is empty
  -- Added two unit tests to RodBindingCollectionUnitTest
2014-02-27 13:32:18 -05:00
Karthik Gururaj 2d0ce45bb0 Moved JNI_README 2014-02-27 10:12:23 -08:00
Karthik Gururaj c645725fc3 1. Renamed directory structure from public/c++/VectorPairHMM to
public/VectorPairHMM/src/main/c++ as per Khalid's suggestion
2. Use java.home in public/VectorPairHMM/pom.xml to pass environment
variable JRE_HOME to the make process. This is needed because the
Makefile needs to compile JNI code with the flag -I<JRE_HOME>/../include (among
others). Assuming that the Maven build process uses a JDK (and not just
a JRE), the variable java.home points to the JRE inside maven.
3. Dropped all pretense at cross-platform compatibility. Removed Mac
profile from pom.xml for VectorPairHMM
2014-02-26 15:17:15 -08:00
Karthik Gururaj bd71ba35e5 Moved pom.xml to VectorPairHMM and updated artifactId 2014-02-26 14:01:46 -08:00
Khalid Shakir da587d48ed Using absolute paths in generated diff commands, to ease running them from any directory. 2014-02-27 04:43:39 +08:00
Khalid Shakir c163e6d0d2 Separate failsafe directories for each of the integration test types [#66515572] 2014-02-27 04:43:39 +08:00
Karthik Gururaj b81e2c2948 Native library part of git repo 2014-02-26 11:47:42 -08:00
Karthik Gururaj 0fe843bfd9 Followed Khalid's suggestion for packing libVectorLoglessCaching into
the jar file with Maven
2014-02-26 11:47:42 -08:00
Karthik Gururaj 15fe244e4b Now has PAPI values 2014-02-26 11:47:42 -08:00
Intel Repocontact e32e9e6af6 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2014-02-26 11:47:01 -08:00
Intel Repocontact ff2a972ab5 Merge branch 'master' of github.com:broadinstitute/gsa-unstable
Conflicts:
	.gitignore
2014-02-25 20:56:28 -08:00
Khalid Shakir f02ce6eca7 Added tests for cleaning up scattered .bai files, and using the log directory.
Re-added import java.io.File for BamGatherFunction.
Other cleanup to resolve scala syntax warnings from intellij.
Moved Example UG script to from protected to public.
2014-02-26 02:11:28 +08:00
pdexheimer 0405afeab2 Inherit BamGatherFunction from MergeSamFiles rather than PicardBamFunction
- This change means that BamGatherFunction will now have an @Output field for the BAM index, which will allow the bai to be deleted for intermediate functions

Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-26 02:11:28 +08:00
pdexheimer 504c125c26 Ensure .out files are saved into logDirectory
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-26 02:11:28 +08:00
pdexheimer 51dcd364a5 Added logDirectory argument
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-26 02:11:28 +08:00
Khalid Shakir 7e516b294f Replaced local drmaa and Jama artifacts with versions from maven central.
Removed unused caliper binary from local repo.
2014-02-22 01:21:35 +08:00
Khalid Shakir a75043b207 When git describe fails use "exported" instead of "unknown". 2014-02-22 01:21:35 +08:00
Khalid Shakir 4670c87313 Fixed mvn run for packagetests over external-example. 2014-02-22 01:21:34 +08:00
Khalid Shakir 70ecce2a0f Fixed scope for test-jar depedencies. 2014-02-22 01:21:34 +08:00
Eric Banks 235f0c6fa0 Merge pull request #528 from broadinstitute/eb_fix_cat_variants_usage_message
Fix the usage message for CatVariants to make it accurate.
2014-02-19 22:45:22 -05:00
Eric Banks 341d1bf2dd Fix the usage message for CatVariants to make it accurate.
It just hit a user on our forum...
2014-02-19 20:42:08 -05:00
Valentin Ruano-Rubio c167fb5fdf Fixing GenotypesGVCF.
Bug uncovered by some untrimmed alleles in the single sample pipeline output.

Notice however does not fix the untrimmed alleles in general.

Story:

https://www.pivotaltracker.com/story/show/65481104

Changes:

1. Fixed the bug itself.
2. Fixed non-working tests (sliently skipped due to exception in dataProvider).
2014-02-19 14:20:39 -05:00
Ryan Poplin 43c20264b0 Initial commit of the random forest classifier. 2014-02-17 13:07:27 -05:00
Khalid Shakir a505db79f5 Fixed build bug in ./ant-bridge.sh unittest -Dsingle=..., due to external-example.
pipeline.run property no longer required to be passed by test executor.
2014-02-15 13:52:20 +08:00
droazen 1e82f117ad Merge pull request #518 from broadinstitute/ks_skashin_gatkdocs_arguments
Ks skashin gatkdocs arguments
2014-02-14 13:57:19 -05:00
Eric Banks f6022a944b Merge pull request #513 from broadinstitute/eb_clean_up_genotype_posteriors
Various small fixes for CalculateGenotypePosteriors based on feedback fr...
2014-02-14 13:50:46 -05:00
Eric Banks 3724d4e5f3 Various small fixes for CalculateGenotypePosteriors based on feedback from guys in Ben Neale's group.
Note that this tool is still a work in progress and very experimental, so isn't 100% stable.  Most of
the features are untested (both by people and by unit/integration tests) because Chris Hartl implemented
it right before he left, and we're going to need to add tests at some point soon.  I added a first
integration test in this commit, but it's just a start.

The fixes include:

1. Stop having the genotyping code strip out AD values.  It doesn't make sense that it should do this so
I don't know why it was doing that at all.
Updated GenotypeGVCFs so that it doesn't need to manually recover them anymore.
This also helps CalculateGenotypePosteriors which was losing the AD values.
Updated code in LeftAlignAndTrimVariants to strip out PLs and AD, since it wasn't doing that before.
Updated the integration test for that walker to include such data.

2. Chris was calling Math.pow directly on the normalized posteriors which isn't safe.
Instead, the normalization routine itself can revert back to log scale in a safe manner so let's use it.
Also, renamed the variable to posteriorProbabilities (and not likelihoods).

3. Have CGP update the AC/AF/AN counts after fixing GTs.
2014-02-14 13:48:14 -05:00
kshakir 8b136d53b9 Merge pull request #524 from broadinstitute/ks_symlink_bin_jar
Create symlinks target/GenomeAnalysisTK.jar and target/Queue.jar
2014-02-15 02:32:59 +08:00
Khalid Shakir bc9ac93b6c Adding the external example to the build. 2014-02-15 01:26:07 +08:00
Khalid Shakir 2e99a6ecf8 Create symlinks target/GenomeAnalysisTK.jar and target/Queue.jar during package phase. 2014-02-15 01:12:32 +08:00
Nicholas Clarke 7ae19953f5 Squashed commit of the following:
commit 5e73b94eed3d1fc75c88863c2cf07d5972eb348b
Merge: e12593a d04a585
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Fri Feb 14 09:25:22 2014 +0000

    Merge pull request #1 from broadinstitute/checkpoint

    SimpleTimer passes tests, with formatting

commit d04a58533f1bf5e39b0b43018c9db3302943d985
Author: kshakir <github@kshakir.org>
Date:   Fri Feb 14 14:46:01 2014 +0800

    SimpleTimer passes tests, with formatting

    Fixed getNanoOffset() to offset nano to nano, instead of nano to seconds.
    Updated warning message with comma separated numbers, and exact values of offsets.

commit e12593ae66a5e6f0819316f2a580dbc7ae5896ad
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Wed Feb 12 13:27:07 2014 +0000

    Remove instance of 'Timer'.

commit 47a73e0b123d4257b57cfc926a5bdd75d709fcf9
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Wed Feb 12 12:19:00 2014 +0000

    Revert a couple of changes that survived somehow.

    - CheckpointableTimer,Timer -> SimpleTimer

commit d86d9888ae93400514a8119dc2024e0a101f7170
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Mon Jan 20 14:13:09 2014 +0000

    Revised commits following comments.

    - All utility merged into `SimpleTimer`.
    - All tests merged into `SimpleTimerUnitTest`.
    - Behaviour of `getElapsedTime` should now be consistent with `stop`.
    - Use 'TimeUnit' class for all unit conversions.
    - A bit more tidying.

commit 354ee49b7fc880e944ff9df4343a86e9a5d477c7
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Fri Jan 17 17:04:39 2014 +0000

    Add a new CheckpointableTimerUnitTest.

    Revert SimpleTimerUnitTest to the version before any changes were made.

commit 2ad1b6c87c158399ededd706525c776372bbaf6e
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Tue Jan 14 16:11:18 2014 +0000

    Add test specifically checking behaviour under checkpoint/restart.

    Slight alteration to the checkpointable timer based on observations
    during the testing - it seems that there's a fair amount of drift
    between the sources anyway, so each time we stop we resynchronise the
    offset. Hopefully this should avoid gradual drift building up and
    presenting as checkpoint/restart drift.

commit 1c98881594dc51e4e2365ac95b31d410326d8b53
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Tue Jan 14 14:11:31 2014 +0000

    Should use consistent time units

commit 6f70d42d660b31eee4c2e9d918e74c4129f46036
Author: Nicholas Clarke <nc6@sanger.ac.uk>
Date:   Tue Jan 14 14:01:10 2014 +0000

    Add a new timer supporting checkpoint mechanisms.

    The issue with this is that the current timer is locked to JVM nanoTime. This can be reset after
    a checkpoint/restart and result in negative elapsed times, which causes an error.

    This patch addresses the issue in two ways:
     - Moves the check on timer information in GenomeAnalysisEngine.java to only occur if a time limit has been
    set.
     - Create a new timer (CheckpointableTimer) which keeps track of the relation between system and nano time. If
    this changes drastically, then the assumption is that there has been a JVM restart owing to checkpoint/restart.
    Any time straddling a checkpoint/restart event will not be counted towards total running time.

Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-14 21:45:47 +08:00
Laura Gauthier 29bb3d4dc1 Check for empty BAM lists in command line input 2014-02-14 08:09:47 -05:00
Khalid Shakir 225ee4880b Using new parameters via skashin to run gatkdocs in the maven conventional subdirectory.
Updated path for output gatkdocs in nightly build script.
Removed patch in plugin manager that contained a workaround for gatkdocs running in the top level directory.
2014-02-14 15:57:21 +08:00
skashin 1b3ac95798 Added the following arguments: -settings-dir -destination-dir -forum-key-path
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-02-14 14:28:35 +08:00
Eric Banks 7095a60c8e Merge pull request #516 from broadinstitute/dr_reenable_tests_failing_due_to_java_update
Re-enable tests that were failing post-maven due to changes in Java's Math.pow() implementation
2014-02-13 21:05:18 -05:00
David Roazen 4b4b93ad1b Re-enable tests that were failing post-maven due to changes in Java's Math.pow() implementation
After extensive detective work, Joel determined that these tests were failing
due to changes in the implementation of Math.pow() in newer versions of
Java 1.7.

All GSA members should ensure that they're using a JDK that is at least
as current as the one in the Java-1.7 dotkit on the Broad servers
(build 1.7.0_51-b13).
2014-02-12 16:08:16 -05:00
Eric Banks 5bde7fbf37 Merge pull request #511 from broadinstitute/dr_enable_exclusions_in_maven_package_tests
Exclude all transitive dependencies in maven package-tests
2014-02-12 15:38:39 -05:00
Joel Thibault ef87b051b0 Rev Picard to 1.107.1683 (4 jars) 2014-02-12 15:25:50 -05:00
David Roazen 6f12c8b0dc Exclude all transitive dependencies in maven package-tests
This change should allow us to test that the GATK jar has been
correctly packaged at release time, by ensuring that only the
packaged jar + a few test-related dependencies are on the classpath
when tests are run.

Note that we still need to actually test that this works as intended
before we can make this live in the Bamboo release plan.
2014-02-12 14:59:05 -05:00
David Roazen 95e1402d21 Add ability to run *KnowledgeBaseTests to maven
Run with: mvn verify -Dsting.knowledgebasetests.skipped=false
2014-02-11 14:08:24 -05:00
Khalid Shakir 1666bb7e3a Patched PluginManager to ignore null classes, that will allow gatkdocs to build successfully when running from the source root directory, due to its hardcoded paths. 2014-02-12 00:48:58 +08:00
Karthik Gururaj 316501b32e Fixed denominator in profiling 2014-02-10 10:11:03 -08:00
Karthik Gururaj d081c19178 Minor: added support in C++ sandbox to choose implementation and check
from command line
2014-02-09 18:05:35 -08:00
Ryan Poplin b81494b704 Merge pull request #499 from broadinstitute/eb_fix_ad_updates
Fixed bug in generating AD values when new alleles are present for genot...
2014-02-09 17:55:00 -05:00
Eric Banks abb67cfa5e Fixed bug in generating AD values when new alleles are present for genotpying GVCFs.
This was a dumb mistake that wasn't well tested (but is now).
2014-02-09 15:15:19 -05:00
Khalid Shakir 12bb6fd361 Removed use of picard private.
Updated picard-maven script to tag locally modified builds with -SNAPSHOT.
Removed old picard jars.
2014-02-09 17:08:52 +08:00
Khalid Shakir 4e0f7521f2 Made scala.maxmemory an argument, and defaulted it to 1g. 2014-02-09 09:24:44 +08:00
Karthik Gururaj a03d83579b Matrices in baseline C++ (no vector) implementation of PairHMM are now
allocated on heap using "new". Stack allocation led to program crashes
for large matrix sizes.
2014-02-07 23:22:05 -08:00
Karthik Gururaj 20a46e4098 Check only for SSE 4.1 (rather than SSE 4.2) when trying to use the SSE
implementation of PairHMM
2014-02-07 15:19:55 -08:00
Eric Banks d689f61005 Fixed up some of the genotype-level annotations being propogated in the single sample HC pipeline.
1. AD values now propogate up (they weren't before).
2. MIN_DP gets transferred over to DP and removed.
3. SB gets removed after FS is calculated.

Also, added a bunch of new integration tests for GenotypeGVCFs.
2014-02-07 12:47:54 -05:00
Eric Banks db68d3fa10 Fixing failing unit tests 2014-02-07 12:24:14 -05:00
Eric Banks 2648219c42 Implementation of a hierarchical merger for gVCFs, called CombineGVCFs.
This tool will take any number of gVCFs and create a merged gVCF (as opposed to
GenotypeGVCFs which produces a standard VCF).

Added unit/integration tests and fixed up GATK docs.
2014-02-07 08:49:18 -05:00
mghodrat 7815c30df8 Adding comments to pairhmm-template-kernel 2014-02-06 20:13:06 -08:00
Karthik Gururaj b729fc0136 1. Split main JNI function into initializeTestcases, compute_testcases
and releaseReads
2. FTZ enabled
3. Cleaner profiling code
2014-02-06 14:35:32 -08:00
Karthik Gururaj 166f91d698 Merge branch 'test_branch'
Conflicts:
	public/c++/VectorPairHMM/LoadTimeInitializer.cc
	public/c++/VectorPairHMM/pairhmm-1-base.cc
	public/c++/VectorPairHMM/utils.cc
	public/c++/VectorPairHMM/utils.h

Merged test_branch with intel_pairhmm
2014-02-06 11:18:18 -08:00
Karthik Gururaj fab6f57e97 1. Enabled FTZ in LoadTimeInitializer.cc
2. Added Sandbox.java for testing
3. Moved compute to utils.cc (inside library)
4. Added flag for disabling FTZ in Makefile
2014-02-06 11:01:33 -08:00
Karthik Gururaj 78642944c0 1. Moved break statement in utils.cc to correct position
2. Tested sandbox with regions
3. Lots of profiling code from previous commit exists
2014-02-06 09:32:56 -08:00
Khalid Shakir b21c35482e Packages link private/testdata, so that mvn test -Dsting.serialunittests.skipped=false works. 2014-02-06 08:25:38 -05:00
Khalid Shakir 3848159086 Added a set of serial tests to gatk/queue packages, which runs all tests under their package in one TestNG execution.
New properties to disable regenerating example resources artifact when each parallel test runs under packagetest.
Moved collection of packagetest parameters from shell scripts into maven profiles.
Fixed necessity of test-utils jar by removing incorrect dependenciesToScan element during packagetests.
When building picard libraries, run clean first.
Fixed tools jar dependency in picard pom.
Integration tests properly use the ant-bridge.sh test.debug.port variable, like unit tests.
2014-02-06 08:25:38 -05:00
Valentin Ruano Rubio 988e3b4890 Merge pull request #487 from broadinstitute/vrr_reference_model_with_trimming
Get gVCF to work without --dontTrimActiveRegions
2014-02-05 22:52:17 -05:00
Valentin Ruano-Rubio 98ffcf6833 Get gVCF to work without --dontTrimActiveRegions
Story:

https://www.pivotaltracker.com/story/show/65048706
https://www.pivotaltracker.com/story/show/65116908

Changes:

ActiveRegionTrimmer in now an argument collection and it returns not only the trimmed down active region but also the non-variant containing flanking regions
HaplotypeCaller code has been simplified significantly pushing some functionality two other classes like ActiveRegion and AssemblyResultSet.

Fixed a problem with the way the trimming was done causing some gVCF non-variant records no have conservative 0,0,0 PLs
2014-02-05 22:50:45 -05:00
Karthik Gururaj acda6ca27b 1. Whew, finally debugged the source of performance issues with PairHMM
JNI. See copied text from email below.
2. This commit contains all the code used in profiling, detecting FP
exceptions, dumping intermediate results. All flagged off using ifdefs,
but it's there.
--------------Text from email
As we discussed before, it's the denormal numbers that are causing the
slowdown - the core executes some microcode uops (called FP assists)
when denormal numbers are detected for FP operations (even un-vectorized
code).
The C++ compiler by default enables flush to zero (FTZ) - when set, the
hardware simply converts denormal numbers to 0. The Java binary
(executable provided by Oracle, not the native library) seems to be
compiled without FTZ (sensible choice, they want to be conservative).
Hence, the JNI invocation sees a large slowdown. Disabling FTZ in C++
slows down the C++ sandbox performance to the JNI version (fortunately,
the reverse also holds :)).
Not sure how to show the overhead for these FP assists easily - measured
a couple of counters.
FP_ASSISTS:ANY - shows number of uops executed as part of the FP
assists. When FTZ is enabled, this is 0 (both C++ and JNI), when FTZ is
disabled this value is around 203540557 (both C++ and JNI)
IDQ:MS_UOPS_CYCLES - shows the number of cycles the decoder was issuing
uops when the microcode sequencing engine was busy. When FTZ is enabled,
this is around 1.77M cycles (both C++ and JNI), when FTZ is disabled
this value is around 4.31B cycles (both C++ and JNI). This number is
still small with respect to total cycles (~40B), but it only reflects
the cycles in the decode stage. The total overhead of the microcode
assist ops could be larger.
As suggested by Mustafa, I compared intermediate values (matrices M,X,Y)
and final output of compute_full_prob. The values produced by C++ and
Java are identical to the last bit (as long as both use FTZ or no-FTZ).
Comparing the outputs of compute_full_prob for the cases no-FTZ and FTZ,
there are differences for very small values (denormal numbers).
Examples:
Diff values 1.952970E-33 1.952967E-33
Diff values 1.135071E-32 1.135070E-32
Diff values 1.135071E-32 1.135070E-32
Diff values 1.135071E-32 1.135070E-32
For this test case (low coverage NA12878), all these values would be
recomputed using the double precision version. Enabling FTZ should be
fine.
-------------------End text from email
2014-02-05 17:09:57 -08:00
Ryan Poplin 693bfac341 Bug fix for missing annotations in CombineReferenceCalculationVariants. They were being dropped in the handoff between engines in a couple of places.
-- Updated single sample pipeline test data using Valentin's files and re-enabled CRCV tests
2014-02-05 12:58:48 -05:00
Eric Banks 740b33acbb We were never validating the sequence dictionary of tabix indexed VCFs for some reason. Fixed.
These changes happened in Tribble, but Joel clobbered them with his commit.
We can now change the logging priority on failures to validate the sequence dictionary to WARN.
Thanks to Tim F for indirectly pointing this out.
2014-02-05 10:12:38 -05:00
Eric Banks 9cac24d1e6 Moving logging status of VCF indexing to DEBUG instead of INFO, otherwise it's painful when reading in lots of files 2014-02-05 10:12:37 -05:00
Eric Banks 91bdf069d3 Some updates to CRCV.
1. Throw a user error when the input data for a given genotype does not contain PLs.
2. Add VCF header line for --dbsnp input
3. Need to check that the UG result is not null
4. Don't error out at positions with no gVCFs (which is possible when using a dbSNP rod)
2014-02-05 10:12:37 -05:00
Joel Thibault 7923e786e9 Rev Picard (public) to 1.107.1676
- Rename snappy to snappy-java
- Add maven-metadata-local.xml to .gitignore
2014-02-04 22:04:28 -05:00
Joel Thibault 0025fe190d Exclude sam's older TestNG 2014-02-04 22:04:27 -05:00
Karthik Gururaj 24f8aef344 Contains profiling, exception tracking, PAPI code
Contains Sandbox Java
2014-02-04 16:27:29 -08:00
David Roazen 76086f30b7 Temporarily disable tests that started failing post-maven
Joel is working on these failures in a separate branch. Since
maven (currently! we're working on this..) won't run the whole
test suite to completion if there's a failure early on, we need
to temporarily disable these tests in order to allow group members
to run tests on their branches again.
2014-02-04 15:31:24 -05:00
David Roazen 3b2f07990d Re-break the MWUnitTest for Joel to debug 2014-02-04 15:19:09 -05:00
David Roazen c9032f0b5c Fix failing unit tests 2014-02-04 03:05:30 -05:00
Khalid Shakir a4289711e2 Distinct failsafe summary reports, just like invoker report directories. 2014-02-03 13:50:47 -05:00
Khalid Shakir 857e6e0d6f Bumped version to 2.8-SNAPSHOT, using new update_pom_versions.sh script. 2014-02-03 13:50:46 -05:00
Khalid Shakir 9ca3004fc3 Setting the test-utils' type to test-jar, such that the multi-module build uses testClasses instead of classes as a directory dependency. 2014-02-03 13:50:46 -05:00
Khalid Shakir de13f41fc3 One step closer to a proper test-utils artifact. Using the maven-jar-plugin to create a test classifer, excluding actual tests, until we can properly separate the classes into separate artifacts/modules. 2014-02-03 13:50:46 -05:00
Khalid Shakir 25aee7164e Fixed missing "mvn" command execution in ant-bridge.
Added pom.xml workarounds for duplicate classpath error, due to gatk-framework dependency containing required BaseTest, and jarred *UnitTest/*IntegrationTest classes that also exist as files under target/test-classes.
2014-02-03 13:50:46 -05:00
Khalid Shakir caa76cdac4 Added maven pom.xmls for various artifacts. 2014-02-03 13:50:46 -05:00
Khalid Shakir d1a689af33 Added new utility files used by maven build, including the ant-bridge script. 2014-02-03 13:50:46 -05:00
Khalid Shakir 88150e0166 Switched commited dependency repository from ivy to maven. 2014-02-03 13:50:46 -05:00
Khalid Shakir 1e25a758f5 Moved files to maven directories.
Here are the git moved directories in case other files need to be moved during a merge:
  git-mv private/java/src/        private/gatk-private/src/main/java/
  git-mv private/R/scripts/       private/gatk-private/src/main/resources/
  git-mv private/java/test/       private/gatk-private/src/test/java/
  git-mv private/testdata/        private/gatk-private/src/test/resources/
  git-mv private/scala/qscript/   private/queue-private/src/main/qscripts/
  git-mv private/scala/src/       private/queue-private/src/main/scala/
  git-mv protected/java/src/      protected/gatk-protected/src/main/java/
  git-mv protected/java/test/     protected/gatk-protected/src/test/java/
  git-mv public/java/src/         public/gatk-framework/src/main/java/
  git-mv public/java/test/        public/gatk-framework/src/test/java/
  git-mv public/testdata/         public/gatk-framework/src/test/resources/
  git-mv public/scala/qscript/    public/queue-framework/src/main/qscripts/
  git-mv public/scala/src/        public/queue-framework/src/main/scala/
  git-mv public/scala/test/       public/queue-framework/src/test/scala/
2014-02-03 13:50:44 -05:00
Khalid Shakir faaef236ea Moved gsalib, R and other resources, Queue GATK extensions generator, Queue version java files. 2014-02-03 13:49:21 -05:00
Khalid Shakir eb52dc6a9b Moved build.xml, ivy.xml, ivysettings.xml, ivy properties, public/packages/*.xml into private/archive/ant 2014-02-03 13:49:20 -05:00
Karthik Gururaj 6d4d776633 Includes code for all debug code for obtaining profiling info 2014-01-30 12:08:06 -08:00
Valentin Ruano-Rubio 89c4e57478 gVCF <NON_REF> in all vcf lines including variant ones when –ERC gVCF is requested.
Changes:
-------

  <NON_REF> likelihood in variant sites is calculated as the maximum possible likelihood for an unseen alternative allele: for reach read is calculated as the second best likelihood amongst the reported alleles.

  When –ERC gVCF, stand_conf_emit and stand_conf_call are forcefully set to 0. Also dontGenotype is set to false for consistency sake.

  Integration test MD5 have been changed accordingly.

Additional fix:
--------------

  Specially after adding the <NON_REF> allele, but also happened without that, QUAL values tend to go to 0 (very large integer number in log 10) due to underflow when combining GLs (GenotypingEngine.combineGLs). To fix that combineGLs has been substituted by combineGLsPrecise that uses the log-sum-exp trick.

  In just a few cases this change results in genotype changes in integration tests but after double-checking using unit-test and difference between combineGLs and combineGLsPrecise in the affected integration test, the previous GT calls were either border-line cases and or due to the underflow.
2014-01-30 11:23:33 -05:00
Karthik Gururaj 5c7427e48c Temporary commit containing debug profiling code - commented out 2014-01-29 12:10:29 -08:00
Karthik Gururaj 0c63d6264f 1. Added synchronization block around loadLibrary in
VectorLoglessPairHMM
2. Edited Makefile to use static libraries where possible
2014-01-27 15:34:58 -08:00
Karthik Gururaj a15137a667 Modified run.sh 2014-01-27 14:56:46 -08:00
Karthik Gururaj 2c0d70c863 Moved vector JNI code to public/c++/VectorPairHMM 2014-01-27 14:52:59 -08:00
Karthik Gururaj 85a748860e 1. Added more profiling code
2. Modified JNI_README
2014-01-27 14:32:44 -08:00
Valentin Ruano-Rubio 748d2fdf92 Added Integration test to verify the bugs are not there anymore as reported in pivotracker 2014-01-26 23:29:31 -05:00
Karthik Gururaj 018e9e2c5f 1. Cleaned up code
2. Split into DebugJNILoglessPairHMM and VectorLoglessPairHMM with base
class JNILoglessPairHMM. DebugJNILoglessPairHMM can, in principle,
invoke any other child class of JNILoglessPairHMM.
3. Added more profiling code for Java parts of LoglessPairHMM
2014-01-26 19:18:12 -08:00
Valentin Ruano-Rubio 9e7bf75e89 Fix for the PairHMM transition probability miscalculation.
Problem:

matchToMatch transition calculation was wrong resulting in transition probabilites coming out of the Match state that added more than 1.

Reports:

https://www.pivotaltracker.com/s/projects/793457/stories/62471780
https://www.pivotaltracker.com/s/projects/793457/stories/61082450

Changes:

The transition matrix update code has been moved to a common place in PairHMMModel to dry out its multiple copies.

MatchToMatch transtion calculation has been fixed and implemented in PairHMMModel.

Affected integration test md5 have been updated, there were no differences in GT fields and example differences always implied
small changes in likelihoods that is what is expected.
2014-01-26 16:30:36 -05:00