Mark DePristo
b0ea14ef0f
VCFHeader getMetaData returns 4.1 version not 4.0
2012-06-14 16:42:22 -04:00
Mark DePristo
5fda16bea9
Enable shadow BCF2
2012-06-14 16:42:22 -04:00
Mauricio Carneiro
7d12429917
First step towards indel qualities in RR
...
Let the BI's and BD's pass through the reduce reads machinery
2012-06-14 15:37:39 -04:00
Mauricio Carneiro
e68038c5d8
Refactor post-processing downsampling using David's generic downsampler interface
2012-06-14 15:37:32 -04:00
Eric Banks
0398ae9695
I hate these disabled unit tests, #2
2012-06-14 15:19:27 -04:00
Eric Banks
676a57de7b
I hate these disabled unit tests
2012-06-14 14:03:58 -04:00
Eric Banks
de5508fcea
Bug fixes for cycle and context covariates
2012-06-14 13:01:14 -04:00
Eric Banks
5c3c6cbc40
Long -> long conversions in BQSR
2012-06-14 09:07:02 -04:00
Eric Banks
29a74908bb
The next round of BQSR optimizations: no more Long[] array creation
2012-06-14 00:05:42 -04:00
Guillermo del Angel
cd2074b1dc
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-13 20:59:30 -04:00
Guillermo del Angel
92669a0468
Second intermediate commit for indel pool caller - now works (more or less) in reference sample-free mode. Still needs a lot of cleanups/add more tests and not done w/refactoring quite yet
2012-06-13 20:59:17 -04:00
David Roazen
0550b27799
Make downsampler classes themselves generic (instead of just the Downsampler interface)
...
This is in response to a request from Mauricio to make it easier
to use the downsamplers with GATKSAMRecords (as opposed to SAMRecords)
without having to do any cumbersome typecasting. Sadly, Java
language limitations make this sort of solution the best choice.
Thanks to Khalid for his feedback on this issue.
Also:
-added a unit test to verify GATKSAMRecord support with no typecasting required
-added some unit tests for the FractionalDownsampler that Mauricio will/might be using
-moved classes from private to public to better sync up with my local development
branch for engine integration
2012-06-13 16:43:39 -04:00
Guillermo del Angel
67c0569f9c
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-13 11:50:00 -04:00
Eric Banks
81993b08e2
Don't put null entries into the key array
2012-06-13 11:43:44 -04:00
Roger Zurawicki
bdf5945dcc
Fixed bugs in DiagnoseTargets
...
DT would not report bad mates!
that has been fixed
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-06-13 11:15:26 -04:00
Roger Zurawicki
538cdf9210
Created the FindCoveredIntervals
...
Moved some stuff in the DiagnoseTargets walker to the more general ThresHolder class
Minor tweaks
FindCoveredIntervals supports Gathering
FindCoveredIntervals outputs an interval list instead of GATKReport
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-06-13 11:15:25 -04:00
Guillermo del Angel
aee66ab157
Big UG refactoring and intermediate commit to support indels in pool caller (not done yet). Lots of code pulled out of long spaghetti-like functions and modularized to be easily shareable. Add functionality in ErrorModel to count indel matches/mismatches (but left part disabled as not to change integration tests in this commit), add computation of pool genotype likelihoods for indels (not fully working yet in more realistic cases, only working in artificial nice pools). Lot's of TBD's still but existing UG and pool SNP functionality should be intact
2012-06-13 11:14:44 -04:00
Eric Banks
bb77aa88c3
Drat, forgot the unit tests again
2012-06-12 19:00:47 -04:00
Eric Banks
37f56ce8fd
A couple of minor updates to BQSR
2012-06-12 16:12:13 -04:00
Eric Banks
277493dd83
Yet more instances of Lists changed over to native arrays
2012-06-12 15:56:09 -04:00
Eric Banks
613badc835
Very minor optimizations for the context covariate
2012-06-12 15:47:32 -04:00
Eric Banks
0f79adb2aa
Changing more Java Lists to native arrays in BQSR for performance optimization.
2012-06-12 15:41:01 -04:00
Eric Banks
1da3e43679
Wow, apparently it's way, way less efficient to iterate over Java Lists than native arrays. With this change and the bit fiddling, Ryan's 10-day test case now runs in 1 day. More to come.
2012-06-12 13:32:56 -04:00
Eric Banks
a96c5da884
Oops, forgot to push the unit tests
2012-06-12 11:38:30 -04:00
Eric Banks
fec0bd5e11
Fixing UG argument docs
2012-06-12 09:46:16 -04:00
Eric Banks
a4defdfb29
Adding a GT header line to SomaticIndelDetector output
2012-06-12 09:39:17 -04:00
Eric Banks
891ce51908
Refactoring of BQSRv2 to use longs (and standard bit fiddling techniques) instead of Java BitSets for performance improvements.
2012-06-12 09:19:36 -04:00
Eric Banks
ff5749599d
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-11 15:46:17 -04:00
Eric Banks
fea625632f
Don't use asList because it maintains an iterator to the original list and then the result can't be used to create a new one
2012-06-11 15:45:58 -04:00
Ryan Poplin
e4d371dc80
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-11 10:38:50 -04:00
Ryan Poplin
683d4b508e
Bug fix in fragment utils: the read name wasn't being set in the merged read. Misc minor updates to the HaplotypeCaller.
2012-06-11 10:38:35 -04:00
Mauricio Carneiro
4aad7e23ef
New ReduceReads v2 with unclipped variant regions and soft-clipped bases
...
* Re-wrote the sliding window approach to allow the variant region not to clip the reads that overlap it.
* Updated consensus to include only reads that were not passed on by the variant region, header counts are updated on the fly to avoid recompute
* Added soft clipped bases to ReduceReads analysis by unclipping high quality soft-clips then re-clipping after reduce reads
* Updated all integration tests
2012-06-08 14:58:31 -04:00
Eric Banks
afa9b2718a
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-08 13:54:48 -04:00
Eric Banks
92280b4068
BQSR optimization: cache the BitSetUtils.bitSetFrom() calls since they are called over and over again with the same values. Another 10% reduction in runtime.
2012-06-08 13:54:37 -04:00
Eric Banks
898a0e6161
Minor optimizations
2012-06-08 12:07:58 -04:00
Ryan Poplin
0a37e19998
Bug fix in VQSR so that the VCF index will be created for the recalFile.
2012-06-08 11:51:28 -04:00
Eric Banks
d463ab2cbf
BQSR optimization: String manipulation is extremely expensive in Java (accounts for 8% of BQSR runtime). Instead use byte[] and StringBuilder when possible.
2012-06-08 10:42:42 -04:00
Eric Banks
2bd48a7351
Bad comments made it into the previous commit
2012-06-07 23:12:56 -04:00
Eric Banks
31c3a6be48
BQSR optimization: getRequiredCovariates() and getOptionalCovariates() were creating a new List every time they were being called, and unfortunately getRequiredCovariates().size() is used as the stop condition in for-loops throughout the code. Just maintaining the original list of covariates results in a 15% reduction in runtime for BQSR.
2012-06-07 20:04:10 -04:00
Eric Banks
0fb9179f76
BQSR optimization: don't clone the original quals for each read, we can just overwrite the original array
2012-06-07 19:41:03 -04:00
Ryan Poplin
d449f169d3
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-07 10:56:55 -04:00
Ryan Poplin
0b4281fdd0
misc minor update to HC debug output for when there are a lot of samples
2012-06-07 10:56:41 -04:00
Eric Banks
bad50a1b05
Fix docs
2012-06-06 22:45:38 -04:00
Eric Banks
b093ba9dcc
Stabilized NGSPlatform code: don't assume all reads have read groups (e.g. artificial SAM records)
2012-06-06 15:17:30 -04:00
Eric Banks
54f682a99c
Unify to NGSPlatform framework. TechnologyComposition annotation now generalizes to Illumina and not just SLX.
2012-06-06 11:44:37 -04:00
Eric Banks
dd46d843fb
IR should skip Ion reads just like it does with 454 reads; Tim has confirmed that official platform name for Ion.
2012-06-06 11:04:55 -04:00
Guillermo del Angel
2cbd6e5f90
Merged bug fix from Stable into Unstable
2012-06-05 15:58:23 -04:00
Guillermo del Angel
ce4dc2128d
Adding minor clarification to -mbq argument documentation
2012-06-05 15:17:56 -04:00
Eric Banks
e02ec8c8b6
Don't update the record ID unless we are actually going to emit the record
2012-06-04 14:58:50 -04:00
Eric Banks
8405156ae1
Refactored VariantsToTable so that 1) genotype-level fields can be specified (stabilized and supported code) and 2) the --moltenize argument could be supported to produce molten output of the data. Added tests that cover these capabilities.
2012-06-04 14:28:32 -04:00