Commit Graph

11963 Commits (ebd540412474e91ac4b153045bf08a2949e22fa2)

Author SHA1 Message Date
Eric Banks ebd5404124 Fixed the add functionality of GenomeLocSortedSet.
* Fixed GenomeLocSortedSet.add() to ensure that overlapping intervals are detected and an exception is thrown.
 * Fixed GenomeLocSortedSet.addRegion() by merging it with the add() method; it now produces sorted inputs in all cases.
 * Cleaned up duplicated code throughout the engine to create a list of intervals over all contigs.
 * Added more unit tests for add functionality of GLSS.
 * Resolves GSA-775.
2013-02-28 23:31:00 -05:00
David Roazen 6a77eee5f4 parallel tests script: pass in bamboo build number to make globally unique working directories for each run 2013-02-28 18:06:18 -05:00
David Roazen 2a7f55ae45 Further run_parallel_tests.sh quick fixes
-Apparently the version of "basename" on gsa4 lacks the -s option...
2013-02-28 17:40:20 -05:00
David Roazen 394e8889f1 Fix silly typo in run_parallel_tests.sh script 2013-02-28 17:15:32 -05:00
MauricioCarneiro e5fa1672c1 Merge pull request #79 from broadinstitute/dr_parallel_tests_prototype
fingers crossed!
2013-02-28 14:12:37 -08:00
David Roazen e6ac94fd75 Experimental script to run tests using class-level parallelism on the farm
-script to dispatch one farm job per test class and monitor jobs until completion

-new ant target to run tests without doing ANY compilation or extra steps at all
 allows multiple instances of the test suite to share the same working directory
2013-02-28 16:51:58 -05:00
droazen ca42be9788 Merge pull request #78 from broadinstitute/dr_pdfgen_bamboo_script_GSA-794
Trivial shell script for bamboo to trigger the website pdfgen script
2013-02-28 11:48:41 -08:00
David Roazen b050d16b22 Trivial shell script for bamboo to trigger the website pdfgen script 2013-02-28 14:45:25 -05:00
depristo 92d6a4f441 Merge pull request #75 from broadinstitute/eb_missing_rg_error_GSA-407
Added better error message for BAMs with bad read groups.
2013-02-28 05:20:39 -08:00
depristo cac3f80c64 Merge pull request #73 from broadinstitute/eb_remove_nested_hashmap_GSA-732
Replace uses of NestedHashMap with NestedIntegerArray.
2013-02-28 05:19:56 -08:00
Eric Banks 12fc198b80 Added better error message for BAMs with bad read groups.
* Split the cases into reads that don't have a RG at all vs. those with a RG that's not defined in the header.
  * Added integration tests to make sure that the correct error is thrown.
  * Resolved GSA-407.
2013-02-27 16:02:56 -05:00
Eric Banks 45fc0ed261 Merge pull request #74 from broadinstitute/eb_update_rtc_docs_GSA-716
Update docs for RTC.
2013-02-27 11:58:09 -08:00
Eric Banks d2904cb636 Update docs for RTC. 2013-02-27 14:56:44 -05:00
MauricioCarneiro 97b332943b Merge pull request #64 from broadinstitute/md_agbt 2013-02-27 11:41:04 -08:00
Eric Banks 69b8173535 Replace uses of NestedHashMap with NestedIntegerArray.
* Removed from codebase NestedHashMap since it is unused and untested.
 * Integration tests change because the BQSR CSV is now sorted automatically.
 * Resolves GSA-732
2013-02-27 14:03:39 -05:00
Eric Banks 4b1071a815 Merge pull request #68 from broadinstitute/aw_reduce_reads_perf
Eliminate 7-element arrays in BaseCounts and BaseAndQualsCount and repla...
2013-02-27 10:03:36 -08:00
Alec Wysoker c8368ae2a5 Eliminate 7-element arrays in BaseCounts and BaseAndQualsCount and replace with in-line primitive attributes. This is ugly but reduces heap overhead, and changes are localized. When used in conjunction with Mauricio's FastUtil changes it saves and additional 9% or so of execution time. 2013-02-27 12:49:56 -05:00
Ryan Poplin 69f6d53494 Merge pull request #72 from broadinstitute/md_profile_hc_vs_ug_GSA-749
GATKPerformanceOverTime now includes a mode to compare HC & UG performance
2013-02-27 08:14:00 -08:00
Mark DePristo b987df5d8d GATKPerformanceOverTime now includes a mode to comparing HC & UG performance
-- Compares HC and UG performance on single deep genomes and 140 WEx bams over small intervals, as well as deep WGS reduced data and 1000G 4x data
-- Added mode to GATKPerformanceOverTime to include lots of versions, so we can make beautiful graphs of the cost of tools over many versions as well
-- Marginally better plots for multiple iterations in GATKPeformanceOverTime.R
2013-02-27 10:58:34 -05:00
David Roazen 752f4335a5 Merged bug fix from Stable into Unstable 2013-02-27 05:20:41 -05:00
David Roazen 2a7af43164 Fix improper dependencies in QScripts used by pipeline tests, and attempt to fix the flawed MisencodedBaseQualityUnitTest
-Some QScripts used by public pipeline tests unnecessarily used the (now protected) UnifiedGenotyper.
 Changed them to use PrintReads instead.

-Moved ExampleUnifiedGenotyperPipelineTest to protected

-Attempt to fix the flawed and sporadically failing MisencodedBaseQualityUnitTest:

   After looking at this class a bit, I think the problem was the use of global arrays for the quals
   shared across all reads in all tests (BAMRecord class definitely does not make a separate copy for
   each read!). One test (testFixBadQuals) modifies the bad quals array, and if this happens to run
   before the testBadQualsThrowsError test the bad quals array will have been "fixed" and no exception
   will be thrown.
2013-02-27 04:45:53 -05:00
David Roazen 6466463d5a Merged bug fix from Stable into Unstable 2013-02-26 21:54:54 -05:00
David Roazen 12a3d7ecad Fix licenses on files modified in 2.4-1 2013-02-26 21:53:17 -05:00
David Roazen a53b4a7521 Merged bug fix from Stable into Unstable 2013-02-26 21:41:13 -05:00
David Roazen 65d31ba4ad Fix runtime public -> protected dependencies in the test suite
-replace unnecessary uses of the UnifiedGenotyper by public integration tests
 with PrintReads

-move NanoSchedulerIntegrationTest to protected, since it's completely dependent
 on the UnifiedGenotyper
2013-02-26 21:19:12 -05:00
droazen dd338bebd0 Merge pull request #70 from broadinstitute/dr_nightly_build_script_adjustments
Nightly build script improvements
2013-02-26 14:46:09 -08:00
David Roazen d2f4626bdd Nightly build script improvements
-Include the word "nightly" in the version

-Add a ".tar.bz2" extension to the symlinks for the current build
2013-02-26 17:43:19 -05:00
depristo 7c3f8d384b Merge pull request #69 from broadinstitute/dr_nightly_build_script_GSATDG-78
Shell script to release GATK nightly builds
2013-02-26 13:58:39 -08:00
David Roazen 3680879926 Shell script to release GATK nightly builds
-publishes GATK jar + accompanying GATKDocs archive to a new
 nightly build directory

-nightly builds are versioned by date rather than tag
2013-02-26 16:53:42 -05:00
depristo 93205154b5 Merge pull request #63 from broadinstitute/eb_fix_pairhmm_unittest_GSA-776
Eb fix pairhmm unittest gsa 776
2013-02-26 11:56:58 -08:00
Eric Banks 734353e9df Merge pull request #60 from broadinstitute/mc_fastutil_GSATDG-83
Brought all of ReduceReads to fastutils
2013-02-26 11:56:41 -08:00
Eric Banks 3ce0a32da7 Merge remote-tracking branch 'unstable/master' 2013-02-26 14:48:39 -05:00
Eric Banks 7a7adb79f1 Merge pull request #67 from broadinstitute/dr_release_script_disable_validation
Temporarily disable paranoid validation in the release scripts
2013-02-26 11:25:01 -08:00
Eric Banks 2cf0dc9939 Merge pull request #66 from broadinstitute/mc_retire_coveragebysample_walker_GSATDG-90
Archiving CoverageBySample
2013-02-26 11:19:09 -08:00
David Roazen 2b13af042d Temporarily disable paranoid validation in the release scripts
These validation steps are not strictly necessary, and would fail
with the protected repo right now, as it currently lacks a master
branch.
2013-02-26 14:17:39 -05:00
Mauricio Carneiro 711cbd3b5a Archiving CoverageBySample
This walker was not updated since 2009, and users were getting wrong answers when running it with ReduceReads. I don't want to deal with this because DiagnoseTargets does everything this walker does.
2013-02-26 13:49:00 -05:00
Ryan Poplin 357a05683d Merge pull request #65 from broadinstitute/dr_change_haplotypecaller_downsampling_settings_GSA-699
Change default downsampling coverage target for the HaplotypeCaller to 2...
2013-02-26 10:33:19 -08:00
David Roazen 8b29030467 Change default downsampling coverage target for the HaplotypeCaller to 250
-was previously set to 30, which seems far too aggressive given that with
 ActiveRegionWalkers, as with LocusWalkers, this limits the depth of any
 pileup returned by LIBS

-250 is a more conservative default used by the UG

-can adjust down/up later based on further experiments (GSA-699 will
 remain open)

-verified with Ryan that all integration test differences are either
 innocent or represent an improvement

GSA-699
2013-02-26 09:33:25 -05:00
depristo 51d618de97 Merge pull request #62 from broadinstitute/rp_increase_max_kmer_in_assembly
The maximum kmer length is derived from the reads.
2013-02-26 05:37:02 -08:00
Mark DePristo 79d1050457 AGBT analysis scripts
-- Simple scripts to realign BAMs around indels and run HC scatter gathered
-- AGBT analysis R script
2013-02-25 16:01:46 -05:00
depristo ed5aff3702 Merge pull request #55 from broadinstitute/dr_fix_sequence_dictionary_validation_GSA-768
Sequence dictionary validation: detect problematic contig indexing differences
2013-02-25 12:39:56 -08:00
Eric Banks 396b7e0933 Fixed the intermittent PairHMM unit test failure.
The issue here is that the OptimizedLikelihoodTestProvider uses the same basic underlying class as the
BasicLikelihoodTestProvider and we were using the BasicTestProvider functionality to pull out tests of
that class; so if the optimized tests were run first we were unintentionally running those same tests
again with the basic ones (but expecting different results).
2013-02-25 15:05:13 -05:00
Eric Banks 7519484a38 Refactored PairHMM.initialize to first take haplotype max length and then the read max length so that it is consistent with other PairHMM methods. 2013-02-25 15:04:23 -05:00
Ryan Poplin 89e2943dd1 The maximum kmer length is derived from the reads.
-- This is done to take advantage of longer reads which can produce less ambiguous haplotypes
-- Integration tests change for HC and BiasedDownsampling
2013-02-25 14:40:25 -05:00
MauricioCarneiro bd9875aff5 Merge pull request #61 from broadinstitute/dr_update_release_scripts
1. removed all directives related to gatklite (we're getting rid of this distribution)
2. adapting scripts to the new gsa-protected repository
2013-02-25 10:37:59 -08:00
Mauricio Carneiro 0ff3343282 Addressing Eric's comments
-- added @param docs to the new variables
-- made all variables final
-- switched to string builder instead of String for performance.

GSATDG-83
2013-02-25 13:33:47 -05:00
David Roazen 3645ea9bb6 Sequence dictionary validation: detect problematic contig indexing differences
The GATK engine does not behave correctly when contigs are indexed
differently in the reads sequence dictionaries vs. the reference
sequence dictionary, and the inconsistently-indexed contigs are included
in the user's intervals. For example, given the dictionaries:

Reference dictionary = { chrM, chr1, chr2, ... }
BAM dictionary       = { chr1, chr2, ... }

and the interval "-L chr1", the engine would fail to correctly retrieve
the reads from chr1, since chr1 has a different index in the two dictionaries.

With this patch, we throw an exception if there are contig index differences
between the dictionaries for reads and reference, AND the user's intervals
include at least one of the mismatching contigs.

The user can disable this exception via -U ALLOW_SEQ_DICT_INCOMPATIBILITY

In all other cases, dictionary validation behaves as before.

I also added comprehensive unit tests for the (previously-untested)
SequenceDictionaryUtils class.

GSA-768 #resolve
2013-02-25 11:14:22 -05:00
David Roazen baa3b15207 Update release scripts in preparation for open-sourcing protected 2013-02-25 10:17:16 -05:00
Eric Banks f62dd84869 Merge pull request #57 from broadinstitute/rp_bubble_traversal_merge_GSA-680
Rp bubble traversal merge gsa 680
2013-02-24 05:08:05 -08:00
Mauricio Carneiro 9e5a31b595 Brought all of ReduceReads to fastutils
-- Added unit tests to ReduceReads name compression
-- Updated reduce reads walker for unit testing

GSATDG-83
2013-02-23 22:53:23 -05:00