Mauricio Carneiro
7d12429917
First step towards indel qualities in RR
...
Let the BI's and BD's pass through the reduce reads machinery
2012-06-14 15:37:39 -04:00
Mauricio Carneiro
e68038c5d8
Refactor post-processing downsampling using David's generic downsampler interface
2012-06-14 15:37:32 -04:00
Eric Banks
0398ae9695
I hate these disabled unit tests, #2
2012-06-14 15:19:27 -04:00
Eric Banks
676a57de7b
I hate these disabled unit tests
2012-06-14 14:03:58 -04:00
Eric Banks
065fe9a7b9
Updating md5 for bug fixes
2012-06-14 13:03:51 -04:00
Eric Banks
de5508fcea
Bug fixes for cycle and context covariates
2012-06-14 13:01:14 -04:00
Eric Banks
5c3c6cbc40
Long -> long conversions in BQSR
2012-06-14 09:07:02 -04:00
Eric Banks
29a74908bb
The next round of BQSR optimizations: no more Long[] array creation
2012-06-14 00:05:42 -04:00
Guillermo del Angel
cd2074b1dc
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-13 20:59:30 -04:00
Guillermo del Angel
92669a0468
Second intermediate commit for indel pool caller - now works (more or less) in reference sample-free mode. Still needs a lot of cleanups/add more tests and not done w/refactoring quite yet
2012-06-13 20:59:17 -04:00
David Roazen
0550b27799
Make downsampler classes themselves generic (instead of just the Downsampler interface)
...
This is in response to a request from Mauricio to make it easier
to use the downsamplers with GATKSAMRecords (as opposed to SAMRecords)
without having to do any cumbersome typecasting. Sadly, Java
language limitations make this sort of solution the best choice.
Thanks to Khalid for his feedback on this issue.
Also:
-added a unit test to verify GATKSAMRecord support with no typecasting required
-added some unit tests for the FractionalDownsampler that Mauricio will/might be using
-moved classes from private to public to better sync up with my local development
branch for engine integration
2012-06-13 16:43:39 -04:00
Joel Thibault
fc36177004
Accidentally removed the scattering from this test
2012-06-13 14:18:32 -04:00
Guillermo del Angel
67c0569f9c
Merge branch 'master' of ssh://gsa4.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-13 11:50:00 -04:00
Guillermo del Angel
9a4c39357a
Fixed bugs in previous commit
2012-06-13 11:49:20 -04:00
Eric Banks
81993b08e2
Don't put null entries into the key array
2012-06-13 11:43:44 -04:00
Roger Zurawicki
bdf5945dcc
Fixed bugs in DiagnoseTargets
...
DT would not report bad mates!
that has been fixed
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-06-13 11:15:26 -04:00
Roger Zurawicki
538cdf9210
Created the FindCoveredIntervals
...
Moved some stuff in the DiagnoseTargets walker to the more general ThresHolder class
Minor tweaks
FindCoveredIntervals supports Gathering
FindCoveredIntervals outputs an interval list instead of GATKReport
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2012-06-13 11:15:25 -04:00
Guillermo del Angel
aee66ab157
Big UG refactoring and intermediate commit to support indels in pool caller (not done yet). Lots of code pulled out of long spaghetti-like functions and modularized to be easily shareable. Add functionality in ErrorModel to count indel matches/mismatches (but left part disabled as not to change integration tests in this commit), add computation of pool genotype likelihoods for indels (not fully working yet in more realistic cases, only working in artificial nice pools). Lot's of TBD's still but existing UG and pool SNP functionality should be intact
2012-06-13 11:14:44 -04:00
Eric Banks
bb77aa88c3
Drat, forgot the unit tests again
2012-06-12 19:00:47 -04:00
Eric Banks
bacd25d1a4
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-12 16:15:36 -04:00
Eric Banks
37f56ce8fd
A couple of minor updates to BQSR
2012-06-12 16:12:13 -04:00
Eric Banks
277493dd83
Yet more instances of Lists changed over to native arrays
2012-06-12 15:56:09 -04:00
Eric Banks
613badc835
Very minor optimizations for the context covariate
2012-06-12 15:47:32 -04:00
Joel Thibault
c8cda209b5
Make intersection easy to change
...
Write each client's output to a different file
2012-06-12 15:46:34 -04:00
Eric Banks
0f79adb2aa
Changing more Java Lists to native arrays in BQSR for performance optimization.
2012-06-12 15:41:01 -04:00
Eric Banks
1da3e43679
Wow, apparently it's way, way less efficient to iterate over Java Lists than native arrays. With this change and the bit fiddling, Ryan's 10-day test case now runs in 1 day. More to come.
2012-06-12 13:32:56 -04:00
Eric Banks
a96c5da884
Oops, forgot to push the unit tests
2012-06-12 11:38:30 -04:00
Eric Banks
a057cf31b3
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-12 11:02:50 -04:00
Eric Banks
fec0bd5e11
Fixing UG argument docs
2012-06-12 09:46:16 -04:00
Eric Banks
a4defdfb29
Adding a GT header line to SomaticIndelDetector output
2012-06-12 09:39:17 -04:00
Eric Banks
891ce51908
Refactoring of BQSRv2 to use longs (and standard bit fiddling techniques) instead of Java BitSets for performance improvements.
2012-06-12 09:19:36 -04:00
Mauricio Carneiro
0c98c34f5f
Fixing bugs caught by Eric
...
* if a read starts with insertion, and is part of a read that overlaps both the consensus and the variant region, and the insertion is the first base of this read. Don't try to remove it.
* keep the last header element in a variant region to prevent adjacent reads starting with insertions from breaking when trying to remove the header element before the insertion.
2012-06-11 17:32:49 -04:00
Mauricio Carneiro
e0fbffac3a
Fixing bugs caught by Eric
...
* if a read starts with insertion, and is part of a read that overlaps both the consensus and the variant region, and the insertion is the first base of this read. Don't try to remove it.
*
2012-06-11 17:32:49 -04:00
Eric Banks
ff5749599d
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-11 15:46:17 -04:00
Eric Banks
fea625632f
Don't use asList because it maintains an iterator to the original list and then the result can't be used to create a new one
2012-06-11 15:45:58 -04:00
Eric Banks
41ef22fe7c
Re-enabling BQSR integration tests
2012-06-11 15:44:55 -04:00
Ryan Poplin
e4d371dc80
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-11 10:38:50 -04:00
Ryan Poplin
683d4b508e
Bug fix in fragment utils: the read name wasn't being set in the merged read. Misc minor updates to the HaplotypeCaller.
2012-06-11 10:38:35 -04:00
Mauricio Carneiro
4aad7e23ef
New ReduceReads v2 with unclipped variant regions and soft-clipped bases
...
* Re-wrote the sliding window approach to allow the variant region not to clip the reads that overlap it.
* Updated consensus to include only reads that were not passed on by the variant region, header counts are updated on the fly to avoid recompute
* Added soft clipped bases to ReduceReads analysis by unclipping high quality soft-clips then re-clipping after reduce reads
* Updated all integration tests
2012-06-08 14:58:31 -04:00
Mauricio Carneiro
edf08ca928
Updates to ReduceReads
...
* Turn off the post-downsampler by default until new implementation.
* optimized read clipping in consensus regions -- only clip once instead of every time the window slides.
* extracted HeaderElement into it's own class.
* moved RFA to archive
* unified the context size of indels and mismatches
2012-06-08 14:57:34 -04:00
Eric Banks
afa9b2718a
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-06-08 13:54:48 -04:00
Eric Banks
92280b4068
BQSR optimization: cache the BitSetUtils.bitSetFrom() calls since they are called over and over again with the same values. Another 10% reduction in runtime.
2012-06-08 13:54:37 -04:00
Eric Banks
898a0e6161
Minor optimizations
2012-06-08 12:07:58 -04:00
Ryan Poplin
0a37e19998
Bug fix in VQSR so that the VCF index will be created for the recalFile.
2012-06-08 11:51:28 -04:00
Eric Banks
d463ab2cbf
BQSR optimization: String manipulation is extremely expensive in Java (accounts for 8% of BQSR runtime). Instead use byte[] and StringBuilder when possible.
2012-06-08 10:42:42 -04:00
Eric Banks
2bd48a7351
Bad comments made it into the previous commit
2012-06-07 23:12:56 -04:00
Eric Banks
31c3a6be48
BQSR optimization: getRequiredCovariates() and getOptionalCovariates() were creating a new List every time they were being called, and unfortunately getRequiredCovariates().size() is used as the stop condition in for-loops throughout the code. Just maintaining the original list of covariates results in a 15% reduction in runtime for BQSR.
2012-06-07 20:04:10 -04:00
Eric Banks
0fb9179f76
BQSR optimization: don't clone the original quals for each read, we can just overwrite the original array
2012-06-07 19:41:03 -04:00
Joel Thibault
32e4fe5c87
Enable zero-sample testing
2012-06-07 14:43:13 -04:00
Joel Thibault
fe9dc0cc3f
Explicitly set intervals in new test
...
mem limit 4g
2012-06-07 14:43:13 -04:00