Commit Graph

11982 Commits (bd4e4f4ee39f141d8d45f6edd2d242cfd0071633)

Author SHA1 Message Date
Eric Banks bd4e4f4ee3 Merged bug fix from Stable into Unstable 2013-03-04 23:24:44 -05:00
Eric Banks b715218bfe Fix for mismatching indel quals erro: need to adjust for softclips just like we do for bases and normal quals. 2013-03-04 23:23:18 -05:00
Mark DePristo 1b7164ccdb Merge pull request #86 from broadinstitute/mc_fix_exception_messages
Just a quick cleanup on the exception messages no need to wait for bamboo.
2013-03-04 13:55:00 -08:00
Mauricio Carneiro d0c8105387 Cleaning up hilarious exception messages
Too many users (with RNASeq reads) are hitting these exceptions that were never supposed to happen. Let's give them (and us) a better and clearer error message.
2013-03-04 16:52:22 -05:00
Ryan Poplin ce7554e9d6 Merged bug fix from Stable into Unstable 2013-03-04 12:36:04 -05:00
Ryan Poplin 0697594778 Active regions that don't contain any usable reads should just be skipped over instead of throwing an IllegalStateException. 2013-03-04 12:35:40 -05:00
Ryan Poplin b3ecbb011d Merge pull request #81 from broadinstitute/md_hc_bam_writing
Expanded functionality of writing BAMs from the haplotype caller
2013-03-04 06:39:19 -08:00
Mark DePristo 42d3919ca4 Expanded functionality for writing BAMs from HaplotypeCaller
-- The new code includes a new mode to write out a BAM containing reads realigned to the called haplotypes from the HC, which can be easily visualized in IGV.
-- Previous functionality maintained, with bug fixes
-- Haplotype BAM writing code now lives in utils
-- Created a base class that includes most of the functionality of writing reads realigned to haplotypes onto haplotypes.
-- Created two subclasses, one that writes all haplotypes (previous functionality) and a CalledHaplotypeBAMWriter that will only write reads aligned to the actually called haplotypes
-- Extended PerReadAlleleLikelihoodMap.getMostLikelyAllele to optionally restrict set of alleles to consider best
-- Massive increase in unit tests in AlignmentUtils, along with several new powerful functions for manipulating cigars
-- Fix bug in SWPairwiseAlignment that produces cigar elements with 0 size, and are now fixed with consolidateCigar in AlignmentUtils
-- HaplotypeCaller now tracks the called haplotypes in the GenotypingEngine, and returns this information to the HC for use in visualization.
-- Added extensive docs to HaplotypeCaller on how to use this capability
-- BUGFIX -- don't modify the read bases in GATKSAMRecord in LikelihoodCalculationEngine in the HC
-- Cleaned up SWPairwiseAlignment.  Refactored out the big main and supplementary static methods.  Added a unit test with a bug TODO to fix what seems to be an edge case bug in SW
-- Integration test to make sure we can actually write a BAM for each mode.  This test only ensures that the code runs and doesn't exception out.  It doesn't actually enforce any MD5s
-- HaplotypeBAMWriter also left aligns indels in the reads, as SW can return a random placement of a read against the haplotype.  Calls leftAlign to make the alignments more clear, with unit test of real read to cover this case
-- Writes out haplotypes for both all haplotype and called haplotype mode
-- Haplotype writers now get the active region call, regardless of whether an actual call was made.  Only emitting called haplotypes is moved down to CalledHaplotypeBAMWriter
2013-03-03 12:07:29 -05:00
Mark DePristo ec3bf9f362 Adding 1mb of 2x250 bp PCR-free reads to private testdata 2013-03-01 20:44:17 -05:00
Mark DePristo b1ea2f6125 Merge pull request #83 from broadinstitute/dr_gatk_jar_with_private_GSA-803
Ant target to package a GATK jar with private included
2013-03-01 13:15:57 -08:00
David Roazen 2a1a20fc9d Parallel tests: switch working directory from /humgen/gsa-scr1 to /humgen/gsa-hpprojects
Hoping that the higher class of storage will get us down from the current
~40 minutes for a parallel run of the integration tests to the goal of
~20 minutes.
2013-03-01 16:11:29 -05:00
David Roazen a0be74c2ef Ant target to package a GATK jar with private included
Needed before we can start emitting full unstable jars from
Bamboo for our internal use.
2013-03-01 15:33:59 -05:00
David Roazen 3f7d888ea5 run_parallel_tests.sh: further improvements
-accept global timeout as a command-line argument

-kill outstanding jobs when timeout reached

-print job output files to stdout so that they get recorded in bamboo's logs

-periodically print number of jobs outstanding during run

-documentation / comments
2013-03-01 14:59:10 -05:00
Mark DePristo 0cff9b8027 Merge pull request #82 from broadinstitute/dr_split_long_integration_test_classes
Split long-running integration test classes into multiple classes
2013-03-01 11:07:23 -08:00
David Roazen c5c99c8339 Split long-running integration test classes into multiple classes
This is to facilitate the current experiment with class-level test
suite parallelism. It's our hope that with these changes, we can get
the runtime of the integration test suite down to 20 minutes or so.

-UnifiedGenotyper tests: these divided nicely into logical categories
 that also happened to distribute the runtime fairly evenly

-UnifiedGenotyperPloidy: these had to be divided arbitrarily into two
 classes in order to halve the runtime

-HaplotypeCaller: turns out that the tests for complex and symbolic
 variants make up half the runtime here, so merely moving these into
 a separate class was sufficient

-BiasedDownsampling: most of these tests use excessively large intervals
 that likely can't be reduced without defeating the goals of the tests. I'm
 disabling these tests for now until they can either be redesigned to use smaller
 intervals around the variants of interest, or refactored into unit tests
 (creating a JIRA for Yossi for this task)
2013-03-01 13:55:23 -05:00
depristo 6204e6ccc9 Merge pull request #76 from broadinstitute/md_kb_bugfix_GSA-795
Bug fixes and optimizations for NA12878 KB
2013-03-01 10:52:16 -08:00
depristo c05d1352b1 Merge pull request #80 from broadinstitute/eb_cleanup_genomelocsortedset_GSA-775
Fixed the add functionality of GenomeLocSortedSet.
2013-03-01 08:35:20 -08:00
Eric Banks ebd5404124 Fixed the add functionality of GenomeLocSortedSet.
* Fixed GenomeLocSortedSet.add() to ensure that overlapping intervals are detected and an exception is thrown.
 * Fixed GenomeLocSortedSet.addRegion() by merging it with the add() method; it now produces sorted inputs in all cases.
 * Cleaned up duplicated code throughout the engine to create a list of intervals over all contigs.
 * Added more unit tests for add functionality of GLSS.
 * Resolves GSA-775.
2013-02-28 23:31:00 -05:00
David Roazen 6a77eee5f4 parallel tests script: pass in bamboo build number to make globally unique working directories for each run 2013-02-28 18:06:18 -05:00
David Roazen 2a7f55ae45 Further run_parallel_tests.sh quick fixes
-Apparently the version of "basename" on gsa4 lacks the -s option...
2013-02-28 17:40:20 -05:00
David Roazen 394e8889f1 Fix silly typo in run_parallel_tests.sh script 2013-02-28 17:15:32 -05:00
MauricioCarneiro e5fa1672c1 Merge pull request #79 from broadinstitute/dr_parallel_tests_prototype
fingers crossed!
2013-02-28 14:12:37 -08:00
David Roazen e6ac94fd75 Experimental script to run tests using class-level parallelism on the farm
-script to dispatch one farm job per test class and monitor jobs until completion

-new ant target to run tests without doing ANY compilation or extra steps at all
 allows multiple instances of the test suite to share the same working directory
2013-02-28 16:51:58 -05:00
droazen ca42be9788 Merge pull request #78 from broadinstitute/dr_pdfgen_bamboo_script_GSA-794
Trivial shell script for bamboo to trigger the website pdfgen script
2013-02-28 11:48:41 -08:00
David Roazen b050d16b22 Trivial shell script for bamboo to trigger the website pdfgen script 2013-02-28 14:45:25 -05:00
Mark DePristo 0931afab39 NA12878 KB performance improvement
-- updateConsensus now don't call remove when it's updating the entire db from scratch.  This radically improves performance when you are simply dropping the entire consensus and rebuilding from scratch, as the server does upon start up
2013-02-28 10:51:59 -05:00
Mark DePristo 4095a9ef32 Bugfixes for AssessNA12878
-- Refactor initialization routine into BadSitesWriter.  This now adds the GQ and DP genotype header lines which are necessarily if the input VCF doesn't have proper headers
-- GATKVariantContextUtils subset to biallelics now tolerates samples with bad GL values for multi-allelics, where it just removes the PLs and issues a warning.
2013-02-28 10:35:06 -05:00
depristo 92d6a4f441 Merge pull request #75 from broadinstitute/eb_missing_rg_error_GSA-407
Added better error message for BAMs with bad read groups.
2013-02-28 05:20:39 -08:00
depristo cac3f80c64 Merge pull request #73 from broadinstitute/eb_remove_nested_hashmap_GSA-732
Replace uses of NestedHashMap with NestedIntegerArray.
2013-02-28 05:19:56 -08:00
Eric Banks 12fc198b80 Added better error message for BAMs with bad read groups.
* Split the cases into reads that don't have a RG at all vs. those with a RG that's not defined in the header.
  * Added integration tests to make sure that the correct error is thrown.
  * Resolved GSA-407.
2013-02-27 16:02:56 -05:00
Eric Banks 45fc0ed261 Merge pull request #74 from broadinstitute/eb_update_rtc_docs_GSA-716
Update docs for RTC.
2013-02-27 11:58:09 -08:00
Eric Banks d2904cb636 Update docs for RTC. 2013-02-27 14:56:44 -05:00
MauricioCarneiro 97b332943b Merge pull request #64 from broadinstitute/md_agbt 2013-02-27 11:41:04 -08:00
Eric Banks 69b8173535 Replace uses of NestedHashMap with NestedIntegerArray.
* Removed from codebase NestedHashMap since it is unused and untested.
 * Integration tests change because the BQSR CSV is now sorted automatically.
 * Resolves GSA-732
2013-02-27 14:03:39 -05:00
Eric Banks 4b1071a815 Merge pull request #68 from broadinstitute/aw_reduce_reads_perf
Eliminate 7-element arrays in BaseCounts and BaseAndQualsCount and repla...
2013-02-27 10:03:36 -08:00
Alec Wysoker c8368ae2a5 Eliminate 7-element arrays in BaseCounts and BaseAndQualsCount and replace with in-line primitive attributes. This is ugly but reduces heap overhead, and changes are localized. When used in conjunction with Mauricio's FastUtil changes it saves and additional 9% or so of execution time. 2013-02-27 12:49:56 -05:00
Ryan Poplin 69f6d53494 Merge pull request #72 from broadinstitute/md_profile_hc_vs_ug_GSA-749
GATKPerformanceOverTime now includes a mode to compare HC & UG performance
2013-02-27 08:14:00 -08:00
Mark DePristo b987df5d8d GATKPerformanceOverTime now includes a mode to comparing HC & UG performance
-- Compares HC and UG performance on single deep genomes and 140 WEx bams over small intervals, as well as deep WGS reduced data and 1000G 4x data
-- Added mode to GATKPerformanceOverTime to include lots of versions, so we can make beautiful graphs of the cost of tools over many versions as well
-- Marginally better plots for multiple iterations in GATKPeformanceOverTime.R
2013-02-27 10:58:34 -05:00
David Roazen 752f4335a5 Merged bug fix from Stable into Unstable 2013-02-27 05:20:41 -05:00
David Roazen 2a7af43164 Fix improper dependencies in QScripts used by pipeline tests, and attempt to fix the flawed MisencodedBaseQualityUnitTest
-Some QScripts used by public pipeline tests unnecessarily used the (now protected) UnifiedGenotyper.
 Changed them to use PrintReads instead.

-Moved ExampleUnifiedGenotyperPipelineTest to protected

-Attempt to fix the flawed and sporadically failing MisencodedBaseQualityUnitTest:

   After looking at this class a bit, I think the problem was the use of global arrays for the quals
   shared across all reads in all tests (BAMRecord class definitely does not make a separate copy for
   each read!). One test (testFixBadQuals) modifies the bad quals array, and if this happens to run
   before the testBadQualsThrowsError test the bad quals array will have been "fixed" and no exception
   will be thrown.
2013-02-27 04:45:53 -05:00
David Roazen 6466463d5a Merged bug fix from Stable into Unstable 2013-02-26 21:54:54 -05:00
David Roazen 12a3d7ecad Fix licenses on files modified in 2.4-1 2013-02-26 21:53:17 -05:00
David Roazen a53b4a7521 Merged bug fix from Stable into Unstable 2013-02-26 21:41:13 -05:00
David Roazen 65d31ba4ad Fix runtime public -> protected dependencies in the test suite
-replace unnecessary uses of the UnifiedGenotyper by public integration tests
 with PrintReads

-move NanoSchedulerIntegrationTest to protected, since it's completely dependent
 on the UnifiedGenotyper
2013-02-26 21:19:12 -05:00
droazen dd338bebd0 Merge pull request #70 from broadinstitute/dr_nightly_build_script_adjustments
Nightly build script improvements
2013-02-26 14:46:09 -08:00
David Roazen d2f4626bdd Nightly build script improvements
-Include the word "nightly" in the version

-Add a ".tar.bz2" extension to the symlinks for the current build
2013-02-26 17:43:19 -05:00
depristo 7c3f8d384b Merge pull request #69 from broadinstitute/dr_nightly_build_script_GSATDG-78
Shell script to release GATK nightly builds
2013-02-26 13:58:39 -08:00
David Roazen 3680879926 Shell script to release GATK nightly builds
-publishes GATK jar + accompanying GATKDocs archive to a new
 nightly build directory

-nightly builds are versioned by date rather than tag
2013-02-26 16:53:42 -05:00
depristo 93205154b5 Merge pull request #63 from broadinstitute/eb_fix_pairhmm_unittest_GSA-776
Eb fix pairhmm unittest gsa 776
2013-02-26 11:56:58 -08:00
Eric Banks 734353e9df Merge pull request #60 from broadinstitute/mc_fastutil_GSATDG-83
Brought all of ReduceReads to fastutils
2013-02-26 11:56:41 -08:00