Commit Graph

9673 Commits (edf08ca9283d083d421bdc3db9d34f2a9ceca072)

Author SHA1 Message Date
Mauricio Carneiro edf08ca928 Updates to ReduceReads
* Turn off the post-downsampler by default until new implementation.
   * optimized read clipping in consensus regions -- only clip once instead of every time the window slides.
   * extracted HeaderElement into it's own class.
   * moved RFA to archive
   * unified the context size of indels and mismatches
2012-06-08 14:57:34 -04:00
Eric Banks afa9b2718a Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-06-08 13:54:48 -04:00
Eric Banks 92280b4068 BQSR optimization: cache the BitSetUtils.bitSetFrom() calls since they are called over and over again with the same values. Another 10% reduction in runtime. 2012-06-08 13:54:37 -04:00
Eric Banks 898a0e6161 Minor optimizations 2012-06-08 12:07:58 -04:00
Ryan Poplin 0a37e19998 Bug fix in VQSR so that the VCF index will be created for the recalFile. 2012-06-08 11:51:28 -04:00
Eric Banks d463ab2cbf BQSR optimization: String manipulation is extremely expensive in Java (accounts for 8% of BQSR runtime). Instead use byte[] and StringBuilder when possible. 2012-06-08 10:42:42 -04:00
Eric Banks 2bd48a7351 Bad comments made it into the previous commit 2012-06-07 23:12:56 -04:00
Eric Banks 31c3a6be48 BQSR optimization: getRequiredCovariates() and getOptionalCovariates() were creating a new List every time they were being called, and unfortunately getRequiredCovariates().size() is used as the stop condition in for-loops throughout the code. Just maintaining the original list of covariates results in a 15% reduction in runtime for BQSR. 2012-06-07 20:04:10 -04:00
Eric Banks 0fb9179f76 BQSR optimization: don't clone the original quals for each read, we can just overwrite the original array 2012-06-07 19:41:03 -04:00
Joel Thibault 32e4fe5c87 Enable zero-sample testing 2012-06-07 14:43:13 -04:00
Joel Thibault fe9dc0cc3f Explicitly set intervals in new test
mem limit 4g
2012-06-07 14:43:13 -04:00
Ryan Poplin 0ac4ba9ad3 unintended change crept in 2012-06-07 10:59:04 -04:00
Ryan Poplin d449f169d3 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-06-07 10:56:55 -04:00
Ryan Poplin 0b4281fdd0 misc minor update to HC debug output for when there are a lot of samples 2012-06-07 10:56:41 -04:00
Eric Banks bad50a1b05 Fix docs 2012-06-06 22:45:38 -04:00
Eric Banks b093ba9dcc Stabilized NGSPlatform code: don't assume all reads have read groups (e.g. artificial SAM records) 2012-06-06 15:17:30 -04:00
Eric Banks 54f682a99c Unify to NGSPlatform framework. TechnologyComposition annotation now generalizes to Illumina and not just SLX. 2012-06-06 11:44:37 -04:00
Eric Banks dd46d843fb IR should skip Ion reads just like it does with 454 reads; Tim has confirmed that official platform name for Ion. 2012-06-06 11:04:55 -04:00
David Roazen b6a7c3f780 Merged bug fix from Stable into Unstable
Resolved merge conflict

Conflicts:
	public/java/test/org/broadinstitute/sting/BaseTest.java
2012-06-05 17:44:44 -04:00
David Roazen 3b2fab9a37 Update stable for shptmp -> hptmp migration 2012-06-05 17:38:06 -04:00
Guillermo del Angel 2cbd6e5f90 Merged bug fix from Stable into Unstable 2012-06-05 15:58:23 -04:00
Guillermo del Angel ce4dc2128d Adding minor clarification to -mbq argument documentation 2012-06-05 15:17:56 -04:00
Eric Banks e02ec8c8b6 Don't update the record ID unless we are actually going to emit the record 2012-06-04 14:58:50 -04:00
Eric Banks 8405156ae1 Refactored VariantsToTable so that 1) genotype-level fields can be specified (stabilized and supported code) and 2) the --moltenize argument could be supported to produce molten output of the data. Added tests that cover these capabilities. 2012-06-04 14:28:32 -04:00
Ryan Poplin f11e7ebc3a Fixing the previous fix related to clipping. Adding extra reference padding in the HaplotypeCaller to get those larger alleles during GGA. 2012-06-04 12:49:36 -04:00
Ryan Poplin 320956ee4b Bug fix in clipping function in ReadUtils for when the read ends at exactly the clipping boundary. Bug fixes in HaplotypeCaller GGA mode for when Smith-Waterman produces a different allele than what was given in the input alleles VCF. GGA mode now works with multiallelic records. Adding min pruning factor argument which is combined with the pruning factor that is determined dynamically by the coverage. 2012-06-04 10:55:36 -04:00
Guillermo del Angel 7a54baf08c Merged bug fix from Stable into Unstable 2012-06-03 08:42:08 -04:00
Guillermo del Angel 47df7bbc14 Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable 2012-06-03 08:38:54 -04:00
Guillermo del Angel 2ddbdee3bc Fixed broken VariantEval stratifications VariantType and IndelSize - integration tests to follow 2012-06-03 08:38:38 -04:00
Mauricio Carneiro 12a8c54f9a Fixing VCF header for filter elements (thanks Eric) 2012-06-01 15:45:15 -04:00
David Roazen 3b20b4615d ant fasttest: for when you just can't wait
Cuts major corners for speed. Tests start in SECONDS instead of minutes.
SIGNIFICANT limitations (see below!)

Usage: ant fasttest -Dsingle=TestClass

The idea is that you do a regular "ant test -Dsingle=TestClass" (or "ant committests")
FIRST, then do "ant fasttest -Dsingle=TestClass" for all subsequent runs until
satisfied.

LIMITATIONS:

-REQUIRES that a full test build has already been done (using one of the
 test targets like committests, or a manual "ant test.compile").
-Java only
-Single test class only
-No contracts
-Build jars in dist/ not updated, only classes in build/
-Version number output at runtime may be incorrect
2012-06-01 15:17:18 -04:00
David Roazen 2e3af1a739 Enable contracts for the release jar tests (whoops...) 2012-06-01 13:08:46 -04:00
Guillermo del Angel fdf607c3a9 One-off walker but techdev may find it useful and expand on it: given a BAM, output series of intervals to file (as interval_list [default] or BED [if -bed used]) where there is coverage. Additionally, print out to stdout at the end total covered bases and total covered bases including soft-clips 2012-05-31 20:47:37 -04:00
Eric Banks 3a15ba2102 Malformed VCF headers should be User Errors 2012-05-31 16:05:53 -04:00
David Roazen d1822c926c Temporary fake release directories for Bamboo testing purposes 2012-05-31 16:04:07 -04:00
Joel Thibault 1aa9742e55 Decrease block size from 1Mb to 100Kb 2012-05-31 12:17:46 -04:00
Khalid Shakir c4f7df4dce When an underlying exception occurs because of the user error, if the exception instance does not include a message instead of telling the user "because null", tell them "because <exception class name>". 2012-05-30 16:39:06 -04:00
Ryan Poplin 421d0d1435 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-30 15:21:35 -04:00
Ryan Poplin 9be4f6894c Adding GGA integration test to HaplotypeCaller. 2012-05-30 15:21:18 -04:00
Ryan Poplin 5dd811f84a Adding genotype given alleles mode to the HaplotypeCaller. 2012-05-30 15:07:01 -04:00
Eric Banks d09b8d5584 Fixing docs 2012-05-30 13:24:08 -04:00
Mauricio Carneiro d6e1205310 Updating default values for DiagnoseTargets 2012-05-30 12:43:07 -04:00
Ryan Poplin f987d5487e Adding RMSE value to the plots produced by CalibrateGenotypeLikelihoods. 2012-05-30 11:27:53 -04:00
Ryan Poplin 12da794116 Smooth geom plotted on the main CalibrateGenotypeLikelihood plots wasn't using the weight of the data points for fitting. 2012-05-30 09:44:39 -04:00
David Roazen 1fa88fd389 Fixing some problems with the binary release tests
-Classpaths to test the release jars were being constructed prematurely,
 before all needed properties had been defined
-Added reportng as a TestNG dependency for testing purposes
2012-05-29 16:29:22 -04:00
Ryan Poplin 2908fb77e0 Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-05-29 15:52:57 -04:00
Ryan Poplin ebc3ff680f Bug fix for CalibrateGenotypeLikelihoods. There is no overall read group when in external likelihoods mode. Added integration test for external likelihoods mode. 2012-05-29 15:50:00 -04:00
David Roazen 5e76c811b9 Build system round 2: finished preparing the packaging system for dual protected/lite releases
* Targets to package and release lite/protected versions of the GATK/Queue
* Still TODO: -determine the actual directories where the protected releases should go
              -update the Bamboo release plan
              -fix a bug in the binary release test targets
2012-05-29 13:54:51 -04:00
Khalid Shakir c3c7f17d90 Updated hard limit MathUtils.MAXN number of samples from 11,000 to 50,000.
Instead of creating a supposed network temporary directory locally which then fails when remote nodes try to access the non-existant dir, now checking to see if they network directory is available and throwing a SkipException to bypass the test when it cannot be run.
TODO: Throw similar SkipExceptions when fastas are not available. Right now instead of skipping the test or failing fast the REQUIRE_NETWORK_CONNECTION=false means that the errors popup later when the networked fastas aren't found.
2012-05-29 11:18:22 -04:00
Roger Zurawicki b8b139841d DiagnoseTargets with working Q1,Median,Q3
- Merged Roger's metrics with Mauricio's optimizations
 - Added Stats for DiagnoseTargets
     - now has functions to find the median depth, and upper/lower quartile
     - the REF_N callable status is implemented
 - The walker now runs efficiently
 - Diagnose Targets accepts overlapping intervals
 - Diagnose Targets now checks for bad mates
 - The read mates are checked in a memory efficient manner
 - The statistics thresholds have been consolidated and moved outside of the statistics classes and into the walker.
 - Fixed some bugs
 - Removed rod binding

Added more Unit tests

 - Test callable statuses on the locus level
 - Test bad mates

 - Changed NO_COVERAGE -> COVERAGE_GAPS to avoid confusion

Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-05-29 10:16:45 -04:00