Eric Banks
1da3e43679
Wow, apparently it's way, way less efficient to iterate over Java Lists than native arrays. With this change and the bit fiddling, Ryan's 10-day test case now runs in 1 day. More to come.
2012-06-12 13:32:56 -04:00
Eric Banks
a96c5da884
Oops, forgot to push the unit tests
2012-06-12 11:38:30 -04:00
Eric Banks
fec0bd5e11
Fixing UG argument docs
2012-06-12 09:46:16 -04:00
Eric Banks
a4defdfb29
Adding a GT header line to SomaticIndelDetector output
2012-06-12 09:39:17 -04:00
Eric Banks
891ce51908
Refactoring of BQSRv2 to use longs (and standard bit fiddling techniques) instead of Java BitSets for performance improvements.
2012-06-12 09:19:36 -04:00
Eric Banks
ff5749599d
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-11 15:46:17 -04:00
Eric Banks
fea625632f
Don't use asList because it maintains an iterator to the original list and then the result can't be used to create a new one
2012-06-11 15:45:58 -04:00
Ryan Poplin
e4d371dc80
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-11 10:38:50 -04:00
Ryan Poplin
683d4b508e
Bug fix in fragment utils: the read name wasn't being set in the merged read. Misc minor updates to the HaplotypeCaller.
2012-06-11 10:38:35 -04:00
Mauricio Carneiro
4aad7e23ef
New ReduceReads v2 with unclipped variant regions and soft-clipped bases
...
* Re-wrote the sliding window approach to allow the variant region not to clip the reads that overlap it.
* Updated consensus to include only reads that were not passed on by the variant region, header counts are updated on the fly to avoid recompute
* Added soft clipped bases to ReduceReads analysis by unclipping high quality soft-clips then re-clipping after reduce reads
* Updated all integration tests
2012-06-08 14:58:31 -04:00
Eric Banks
afa9b2718a
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-08 13:54:48 -04:00
Eric Banks
92280b4068
BQSR optimization: cache the BitSetUtils.bitSetFrom() calls since they are called over and over again with the same values. Another 10% reduction in runtime.
2012-06-08 13:54:37 -04:00
Eric Banks
898a0e6161
Minor optimizations
2012-06-08 12:07:58 -04:00
Ryan Poplin
0a37e19998
Bug fix in VQSR so that the VCF index will be created for the recalFile.
2012-06-08 11:51:28 -04:00
Eric Banks
d463ab2cbf
BQSR optimization: String manipulation is extremely expensive in Java (accounts for 8% of BQSR runtime). Instead use byte[] and StringBuilder when possible.
2012-06-08 10:42:42 -04:00
Eric Banks
2bd48a7351
Bad comments made it into the previous commit
2012-06-07 23:12:56 -04:00
Eric Banks
31c3a6be48
BQSR optimization: getRequiredCovariates() and getOptionalCovariates() were creating a new List every time they were being called, and unfortunately getRequiredCovariates().size() is used as the stop condition in for-loops throughout the code. Just maintaining the original list of covariates results in a 15% reduction in runtime for BQSR.
2012-06-07 20:04:10 -04:00
Eric Banks
0fb9179f76
BQSR optimization: don't clone the original quals for each read, we can just overwrite the original array
2012-06-07 19:41:03 -04:00
Ryan Poplin
d449f169d3
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-07 10:56:55 -04:00
Ryan Poplin
0b4281fdd0
misc minor update to HC debug output for when there are a lot of samples
2012-06-07 10:56:41 -04:00
Eric Banks
bad50a1b05
Fix docs
2012-06-06 22:45:38 -04:00
Eric Banks
b093ba9dcc
Stabilized NGSPlatform code: don't assume all reads have read groups (e.g. artificial SAM records)
2012-06-06 15:17:30 -04:00
Eric Banks
54f682a99c
Unify to NGSPlatform framework. TechnologyComposition annotation now generalizes to Illumina and not just SLX.
2012-06-06 11:44:37 -04:00
Eric Banks
dd46d843fb
IR should skip Ion reads just like it does with 454 reads; Tim has confirmed that official platform name for Ion.
2012-06-06 11:04:55 -04:00
Guillermo del Angel
2cbd6e5f90
Merged bug fix from Stable into Unstable
2012-06-05 15:58:23 -04:00
Guillermo del Angel
ce4dc2128d
Adding minor clarification to -mbq argument documentation
2012-06-05 15:17:56 -04:00
Eric Banks
e02ec8c8b6
Don't update the record ID unless we are actually going to emit the record
2012-06-04 14:58:50 -04:00
Eric Banks
8405156ae1
Refactored VariantsToTable so that 1) genotype-level fields can be specified (stabilized and supported code) and 2) the --moltenize argument could be supported to produce molten output of the data. Added tests that cover these capabilities.
2012-06-04 14:28:32 -04:00
Ryan Poplin
f11e7ebc3a
Fixing the previous fix related to clipping. Adding extra reference padding in the HaplotypeCaller to get those larger alleles during GGA.
2012-06-04 12:49:36 -04:00
Ryan Poplin
320956ee4b
Bug fix in clipping function in ReadUtils for when the read ends at exactly the clipping boundary. Bug fixes in HaplotypeCaller GGA mode for when Smith-Waterman produces a different allele than what was given in the input alleles VCF. GGA mode now works with multiallelic records. Adding min pruning factor argument which is combined with the pruning factor that is determined dynamically by the coverage.
2012-06-04 10:55:36 -04:00
Guillermo del Angel
7a54baf08c
Merged bug fix from Stable into Unstable
2012-06-03 08:42:08 -04:00
Guillermo del Angel
47df7bbc14
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/stable
2012-06-03 08:38:54 -04:00
Guillermo del Angel
2ddbdee3bc
Fixed broken VariantEval stratifications VariantType and IndelSize - integration tests to follow
2012-06-03 08:38:38 -04:00
Mauricio Carneiro
12a8c54f9a
Fixing VCF header for filter elements (thanks Eric)
2012-06-01 15:45:15 -04:00
Eric Banks
3a15ba2102
Malformed VCF headers should be User Errors
2012-05-31 16:05:53 -04:00
David Roazen
d1822c926c
Temporary fake release directories for Bamboo testing purposes
2012-05-31 16:04:07 -04:00
Khalid Shakir
c4f7df4dce
When an underlying exception occurs because of the user error, if the exception instance does not include a message instead of telling the user "because null", tell them "because <exception class name>".
2012-05-30 16:39:06 -04:00
Ryan Poplin
421d0d1435
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-05-30 15:21:35 -04:00
Ryan Poplin
5dd811f84a
Adding genotype given alleles mode to the HaplotypeCaller.
2012-05-30 15:07:01 -04:00
Eric Banks
d09b8d5584
Fixing docs
2012-05-30 13:24:08 -04:00
Mauricio Carneiro
d6e1205310
Updating default values for DiagnoseTargets
2012-05-30 12:43:07 -04:00
David Roazen
5e76c811b9
Build system round 2: finished preparing the packaging system for dual protected/lite releases
...
* Targets to package and release lite/protected versions of the GATK/Queue
* Still TODO: -determine the actual directories where the protected releases should go
-update the Bamboo release plan
-fix a bug in the binary release test targets
2012-05-29 13:54:51 -04:00
Khalid Shakir
c3c7f17d90
Updated hard limit MathUtils.MAXN number of samples from 11,000 to 50,000.
...
Instead of creating a supposed network temporary directory locally which then fails when remote nodes try to access the non-existant dir, now checking to see if they network directory is available and throwing a SkipException to bypass the test when it cannot be run.
TODO: Throw similar SkipExceptions when fastas are not available. Right now instead of skipping the test or failing fast the REQUIRE_NETWORK_CONNECTION=false means that the errors popup later when the networked fastas aren't found.
2012-05-29 11:18:22 -04:00
Roger Zurawicki
b8b139841d
DiagnoseTargets with working Q1,Median,Q3
...
- Merged Roger's metrics with Mauricio's optimizations
- Added Stats for DiagnoseTargets
- now has functions to find the median depth, and upper/lower quartile
- the REF_N callable status is implemented
- The walker now runs efficiently
- Diagnose Targets accepts overlapping intervals
- Diagnose Targets now checks for bad mates
- The read mates are checked in a memory efficient manner
- The statistics thresholds have been consolidated and moved outside of the statistics classes and into the walker.
- Fixed some bugs
- Removed rod binding
Added more Unit tests
- Test callable statuses on the locus level
- Test bad mates
- Changed NO_COVERAGE -> COVERAGE_GAPS to avoid confusion
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-05-29 10:16:45 -04:00
Eric Banks
50031b63c5
Fix possible NPE from NBaseCount annotation module
2012-05-29 09:46:00 -04:00
Mark DePristo
08de4dfd96
Missed one integration test
2012-05-29 07:23:24 -04:00
Mark DePristo
454c8e63e6
Made GQ an int, not a float. Updated VC code and lots of corresponding MD5s
...
-- VCFWriter / codec now passes the same rigorous UnitTest as the BCF2 writer / codec. As part of this we now can only test doubles for equivalence in VCFs to 1e-2 (not exactly impressive)
2012-05-28 20:20:05 -04:00
Mark DePristo
7ce24a96f1
PBT now uses getGenotypeLikelihoodString to avoid NPE when there are no PLs present
2012-05-28 20:18:16 -04:00
Mark DePristo
1818c29371
Fixed long-standing bug in beagle codec that was passing on the header record for decoding
2012-05-28 20:17:26 -04:00
Mark DePristo
06b02e1b9b
Update MD5s to reflect new limited output of DiffObjectsWalkers
...
-- Also updated GQ change in VCFIntegrationTest
2012-05-27 11:20:47 -04:00