Commit Graph

4355 Commits (c191103326d2f515a4ec08033eb4d0463affafdb)

Author SHA1 Message Date
Phillip Dexheimer 65eeb4a7ab Recast the "Invalid JEXL expression detected" error in SelectVariants from a RuntimeException to a UserException
- PT 68931448
2014-06-20 00:05:23 -04:00
Phillip Dexheimer da5e567b73 Added functionality to CatVariants to process .list files with -V
- Pivotal 70305712
2014-06-19 21:46:13 -04:00
Ryan Poplin da1dab6c32 Merge pull request #661 from broadinstitute/jw_allele_balance_gvcf
Enable AB annotation in reference model pipeline. Incorporates patches f...
2014-06-19 13:10:41 -04:00
Eric Banks 1092dd6e25 From Carlos Barroto: switch outputRoot in SplitSamFile to an empty string instead of null. 2014-06-19 11:06:55 -04:00
Eric Banks 9212edba41 From Carlos Barroto: made 'level' in Picard's CalculateHsMetrics Scala Queue extension an argument. 2014-06-19 11:06:50 -04:00
Ryan Poplin 8b75428a90 Enable AB annotation in reference model pipeline. Incorporates patches from John Wallace to public github account 2014-06-19 09:35:04 -04:00
Nigel Delaney 7570666f2a Merge pull request #655 from broadinstitute/nfd_mathutil_opts
Optimization of function to calculate the logged sum of exponentiated values
2014-06-17 17:07:42 -04:00
Nigel Delaney 5e258bfeff Minor optimization to function to calculate the log of exponentials.
* Avoids calling Math.Pow whenever possible (skips -Inf and 0 values),
leads to better performance.
2014-06-17 15:26:10 -04:00
Chris Whelan ba1d23e535 Created a new tool, SiblingIBD, which finds Identical-By-Descent regions in two siblings.
-When parental genotypes are available, implements an HMM on genotype observations in the quartet.
   -Outputs IBD regions as well as per-site posterior probabilities of being in each IBD state.
   -Includes an experimental heuristic based mode for when parental genotypes are not available.
   -Made a method in MendelianViolation public static to reuse code.
   -Added the mockito library to private/gatk-tools-private/pom.xml
2014-06-13 09:41:37 -04:00
Menachem Fromer a1868e8b82 For XHMM and Depth-of-Coverage Qscripts, add ability for user to input sample renaming file at the GATK level using existing GATK flag (--sample_rename_mapping_file) and custom pre-processing code. For XHMM Qscript, add scatter-gather for Discovery and Genotype stages. 2014-06-09 23:49:54 -04:00
Phillip Dexheimer 4eb9858461 Ensure that output files are specified in a writeable location
-PT 69579780
2014-06-02 21:13:59 -04:00
Valentin Ruano Rubio db96891d4b Merge pull request #638 from broadinstitute/vrr_createTempFile_testfix
Changed File.createTempFile to BaseTest.createTempFile calls Test
2014-05-29 10:15:05 -04:00
Valentin Ruano-Rubio 938172d7f0 Removed redundant overrride createTempFileFromBase (same code as super class) and added some finals to DepthOfCoverageB36IntegrationTest 2014-05-28 19:02:04 -04:00
Valentin Ruano-Rubio e0c221470c Changed File.createTempFile to BaseTest.createTempFile 2014-05-28 18:59:48 -04:00
EvolvedMicrobe ef7531d4a5 Merge pull request #640 from broadinstitute/IntegerSWImplementation
Change SmithWaterman to use integers instead of doubles.
2014-05-28 15:10:05 -04:00
Nigel Delaney cc45e62e8e Change SmithWaterman to use integers instead of doubles. 2014-05-28 13:13:14 -04:00
Eric Banks ff43b1f298 Merge pull request #636 from broadinstitute/pd_log10_refactor
Replaced the static, fixed MathUtils.log10Cache array with a dynamic Log...
2014-05-28 08:46:49 -04:00
Phillip Dexheimer 6122b2805d Legibility improvements to ProgressMeter
- Fields in the header are delimited with the pipe character
 - Header is now split into two lines to improve spacing
 - Field width in header and progress lines auto-adjusts to length of "processing units" label (sites, active regions, etc)
 - Addresses PT 69725930
2014-05-27 23:52:42 -04:00
Phillip Dexheimer c15e6fcc0e Refactored the static lookup arrays in MathUtils (log10Cache, log10FactorialCache, jacobianLogTable)
-They are now only computed when necessary
 -Log10Cache is dynamically resizable, either by calling get() on an out-of-range value or by calling ensureCacheContains
 -Log10FactorialCache and JacobianLogTable are initialized to a fixed size on first access and are not resizable
 -Addresses PT 69124396
2014-05-27 22:27:57 -04:00
David Roazen 74b51c5c7a Improve test suite tmp file cleanup
-Make BaseTest.createTempFile() mark any possible corresponding index files for deletion on exit

-Make WalkerTest mark shadow BCF files and auxiliary for deletion on exit

-Make VariantRecalibrationWalkersIntegrationTest mark PDF files for deletion on exit
2014-05-27 13:41:44 -04:00
Valentin Ruano-Rubio 7c8a1ae892 Fix for SW to make double comparisons with a tolerance
Stories:

  - https://www.pivotaltracker.com/story/show/69577868

Changes:

  - Added a epsilon difference tolerance in weight comparisons.

Tests:

  - Added HaplotypeCallerIntegrationTest#testDifferentIndelLocationsDueToSWExactDoubleComparisonsFix
  - Updated md5 due to minor likelihood changes.
  - Disabled a test for PathUtils.calculateCigar since does not work and is unclear what is causing the error (needs original author input)
2014-05-23 01:48:48 -04:00
Khalid Shakir b7e98bdae9 Fixed GATK docs artifact, moved protected ExampleUG tests. 2014-05-22 21:03:55 -04:00
Karthik Gururaj 972a82d386 Changed 'sting' to 'gatk' in the VectorLoglessPairHMM classes and the
C++ code
2014-05-19 17:36:41 -04:00
Khalid Shakir 3939971d78 After renaming the packages, instead of updating the JNI library used for testing bwa, moving the classes to the archive.
NOTE: The migrated READEME.md has been added that will allow others to possibly ressurect this code as needed.
2014-05-19 17:36:41 -04:00
Khalid Shakir 2c854e554a Refactored maven directories and java packages replacing "sting" with "gatk".
To reduce merge conflicts, this commit modifies contents of files, while file renamings are in previous commit.
See previous commit message for list of changes.
2014-05-19 17:36:39 -04:00
Khalid Shakir 4e6d43d003 Refactored maven directories and java packages replacing "sting" with "gatk".
To reduce merge conflicts, this commit only renames files, while file modifications are in next commit.
Some updates/fixes here are actually included in the next commit.
= Maven updates
Moved artifacts to new package names:
* private/queue-private -> private/gatk-queue-private
* private/gatk-private -> private/gatk-tools-private
* public/gatk-package -> protected/gatk-package-distribution
* public/queue-package -> protected/gatk-queue-package-distribution
* protected/gatk-protected -> protected/gatk-tools-protected
* public/queue-framework -> public/gatk-queue
* public/gatk-framework -> public/gatk-tools-public
New poms for new artifacts and packages:
* private/gatk-package-internal
* private/gatk-queue-package-internal
* private/gatk-queue-extensions-internal
* protected/gatk-queue-extensions-distribution
* public/gatk-engine
Updated references to StingText.properties to GATKText.properties.
Updated ant-bridge.sh to use gatk.* properties instead of sting.*.
= Engine updates
Renaming files containing engine parts from o.b.gatk.tools to o.b.gatk.engine.
Changed package references from tools to engine for CommandLineGATK, GenomeAnalysisEngine, ReadMetrics, ReadProperties, and WalkerManager.
Changed package reference tools.phonehome to engine.phonehome.
Renamed classes *Sting* to *GATK*, such as ReviewedGATKException.
= Test updates
Moved gatk example resources.
Moved test engine files from tools to engine packages.
Moved resources for phonehome to proper package.
Moved test classes under o.b.gatk into packages:
* o.b.g.utils.{BaseTest,ExampleToCopyUnitTest,GATKTextReporter,MD5DB,MD5Mismatch,TestNGTestTransformer}
* o.b.g.engine.walkers.WalkerTest
Updated package names in DependencyAnalyzerOutputLoaderUnitTest's data.
= Queue updates
Moving queue scripts to location where generated extensions can be used.
Renamed *.q to *.scala, updating licenses previously missed by git hooks.
Moved queue extensions to new artifact gatk-queue-extensions.
Fixed import statments frequently merge-conflicting on FullProcessingPipeline.scala.
= BWA
Added README on how to obtain and include bwa as a library.
Updated libbwa build.
Fixed packaged names under bwa/java implementation.
Updated contents of BWCAligner native implementation.
= Other fixes
Don't duplicate the resource bundle entries by both unpacking *and* appending.
(partial fix) Staged engine and utils poms to build GATKText.properties, once Utils random generator dependency on GATK engine is fixed.
Re-enabled custom testng listeners/reporters and moved testng dependencies to the gatk-root.
Updated comments referencing Sting with GATK.
Moved a couple untangled classes from gatk-tools-public to gatk-utils and gatk-engine.
2014-05-19 16:43:47 -04:00
Phillip Dexheimer a5abc079dc Revised final Queue status line to display number of jobs in each state when the script fails
* Addresses PT 61552466
* Included a simple scala script in private/testdata that will always fail
2014-05-15 21:30:44 -04:00
jmthibault79 78560212d0 Merge pull request #630 from broadinstitute/pd_blank_lines_in_listfile
Allow blank lines in a (non-BAM) list file
2014-05-14 11:32:44 -04:00
droazen 8297cd1a1a Merge pull request #619 from broadinstitute/pd_intervalmerge_doc
Made IntervalSharder respect the IntervalMergingRule specified on the co...
2014-05-14 11:22:18 -04:00
Phillip Dexheimer 77449961ab Allow blank lines in a (non-BAM) list file
* Addresses PT Bug 67841052
 * Added Unit Test
2014-05-13 23:14:15 -04:00
Khalid Shakir 67e44985b1 Java/Scala imports updated for new package names.
Fourth of four commits for picard/htsjdk package rename.
2014-05-08 19:13:31 +08:00
Khalid Shakir cc3f1f2b96 Revved picard libraries.
Third of four commits for picard/htsjdk package rename.
2014-05-08 19:13:27 +08:00
Khalid Shakir a894a2dddb Updates to GATK classes and POMs that need updating, plus RodSystemValidation md5 updates.
GATK classes accessing package protected htsjdk classes changed to new package names.
POMs updated to support merging of sam/tribble/variant -> htsjdk and changes to picard artifact.
RodSystemValidation outputs changed due to variant codec packages changes, requiring test md5 updates.
Second of four commits for picard/htsjdk package rename.
2014-05-08 19:13:27 +08:00
Khalid Shakir 3ce3e27aa1 Moved GATK classes and POMs that will need updating.
GATK classes accessing package protected htsjdk classes will need new package names.
POMs will merge sam/tribble/variant into htsjdk.
Move only, contents updated in next commit.
First of four commits for picard/htsjdk package rename.
2014-05-08 19:13:27 +08:00
Laura Gauthier bf7b97393e Add ability to output to a file discordant loci and their respective genotypes for each sample 2014-05-07 10:12:45 -04:00
Karthik Gururaj d9c489f928 Removed scary warning messages for VectorPairHMM 2014-05-06 10:59:24 -07:00
Karthik Gururaj f6ea25b4d1 Parallel version of the JNI for the PairHMM
The JNI treats shared memory as critical memory and doesn't allow any
parallel reads or writes to it until the native code finishes. This is
not a problem *per se* it is the right thing to do, but we need to
enable **-nct** when running the haplotype caller and with it have
multiple native PairHMM running for each map call.

Move to a copy based memory sharing where the JNI simply copies the
memory over to C++ and then has no blocked critical memory when running,
allowing -nct to work.

This version is slightly (almost unnoticeably) slower with -nct 1, but
scales better with -nct 2-4 (we haven't tested anything beyond that
because we know the GATK falls apart with higher levels of parallelism

* Make VECTOR_LOGLESS_CACHING the default implementation for PairHMM.
* Changed version number in pom.xml under public/VectorPairHMM
* VectorPairHMM can now be compiled using gcc 4.8.x
* Modified define-* to get rid of gcc warnings for extra tokens after #undefs
* Added a Linux kernel version check for AVX - gcc's __builtin_cpu_supports function does not check whether the kernel supports AVX or not.
* Updated PairHMM profiling code to update and print numbers only in single-thread mode
* Edited README.md, pom.xml and Makefile for users to pass path to gcc 4.8.x if necessary
* Moved all cpuid inline assembly to single function Changed info message to clog from cinfo
* Modified version in pom.xml in VectorPairHMM from 3.1 to 3.2
* Deleted some unnecessary code
* Modified C++ sandbox to print per interval timing
2014-05-02 19:12:48 -04:00
Valentin Ruano-Rubio d563072282 Fix for CombineGVCFs and GenotypeGVCFs recurrent exception about missing PLs
Story:

  https://www.pivotaltracker.com/story/show/68220438

Changes:

   - PL-less input genotypes are now uncalled and so non-variant sites when combining GVCFs.
   - HC GVCF/BP_RESOLUTION Mode now outputs non-variant sites in sites covered by deletions.
   - Fixed existing tests

Test:

   - HaplotypeCallerGVCFIntegrationTest
   - ReferenceConfidenceModelUnitTest
   - CombineGVCFsIntegrationTest
2014-05-02 09:21:06 -04:00
Phillip Dexheimer 7a2b70a10f Made IntervalSharder respect the IntervalMergingRule specified on the command line
* This addresses PT Bug 69741902
* Added a required IMR argument to FilePointer, BAMScheduler, IntervalSharder, and SAMDataSource
* This rule is used by FilePointer.combine and FilePointer.union
* Added unit and integration tests
2014-04-30 22:07:22 -04:00
Michael McCowan fe3c68cb2d Java 8 compatability fix: `Reflections` NPE bugfix. 2014-04-29 13:34:03 -04:00
Ryan Poplin 41d3069213 When we subset PLs because Alleles are removed during genotyping we also need to subset AD. 2014-04-28 15:52:26 -04:00
kshakir 10ee35eafa Merge pull request #616 from broadinstitute/ks_cjav_pbsengine_no_default_queue
Removed setting of a default queue in PbsEngineJobRunner.
2014-04-28 14:24:51 -04:00
Ryan Poplin 06dbe74a23 Merge pull request #609 from kcibul/kc_cancersimreads
extended SimulateReadsForVariants to optionally use the AF field to indi...
2014-04-28 13:31:56 -04:00
Carlos Borroto b7a59e01aa Removed setting of a default queue in PbsEngineJobRunner. Discussed here: http://gatkforums.broadinstitute.org/discussion/3959/would-it-be-possible-for-pbsengine-jobrunner-not-to-set-a-default-queue
Signed-off-by: Khalid Shakir <kshakir@broadinstitute.org>
2014-04-29 00:44:12 +08:00
Ami Levy-Moonshine 13dd755468 create a new read transformer that refactor NDN cigar elements to one N element.
story:
https://www.pivotaltracker.com/story/show/69648104

description:
This read transformer will refactor cigar strings that contain N-D-N elements to one N element (with total length of the three refactored elements).
This is intended primarily for users of RNA-Seq data handling programs such as TopHat2.
Currently we consider that the internal N-D-N motif is illegal and we error out when we encounter it. By refactoring the cigar string of
those specific reads, users of TopHat and other tools can circumvent this problem without affecting the rest of their dataset.

edit: address review comments - change the tool's name and change the tool to be a readTransformer instead of read filter
2014-04-28 11:29:00 -04:00
Michael McCowan 8290d3c8ac Allow for non-tab whitespace in sample names when performing on-the-fly sample-renaming. 2014-04-22 11:07:13 -04:00
MauricioCarneiro f03e5ffeb1 Merge pull request #604 from broadinstitute/vrr_hc_omniploidy_general_api
Disentangle UG and HC Genotyper engines.
2014-04-20 07:43:23 -04:00
Valentin Ruano-Rubio 7455ac9796 Addressed revisions 2014-04-19 16:48:48 -04:00
Ryan Poplin a9a48f2459 Merge pull request #607 from broadinstitute/mm_bugfix_raise_mathutils_n_ceiling
Support more samples in math utilities.
2014-04-17 13:32:34 -04:00
Joel Thibault 1ab50f4ba8 CatVariants now handles BCF and Block-Compressed VCF
[Delivers #67461500]
2014-04-17 12:31:38 -04:00