Mauricio Carneiro
94791a2a75
Add support for reads starting with insertion
...
* Modified cleanCigarShift to allow insertions in the beginning and end of the read
* Allowed cigars starting/ending in insertions in the systematic ReadClipper tests
* Updated all ReadClipper unit tests
* ReduceReads does not hard clip leading insertions by default anymore
* SlidingWindow adjusts start location if read starts with insertion
* SlidingWindow creates an empty element with insertions to the right
* Fixed all potential divide by zero with totalCount() (from BaseCounts)
* Updated all Integration tests
* Added new integration test for multiple interval reducing
2012-01-03 09:29:45 -05:00
Mark DePristo
3ecb9a0bf7
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-02 13:56:55 -05:00
Mark DePristo
b3e613647a
GATKPerformanceOverTime bug fixes
...
-- Don't try to do nt 16, it's just too painful as the threading doesn't work well and it consumes a large chunk of our available slots on gsa4
-- bugfix: only do multi-threaded test for each iteration, not expanding by subiterations, so we no longer try to do 3x3 nt 16 runs
2012-01-02 13:56:44 -05:00
Mark DePristo
188bd48139
runGATKReport only archives and shows errors for last days runs
2012-01-02 10:39:05 -05:00
Mark DePristo
d05f0c2318
GATKPerformanceOverTime script update
...
-- Automatic detection of most recent version of GATK release (just tell the script now to use 1.2, 1.3, and 1.4)
-- Uses 1.4 now
-- By default we do 9 runs of each non-parallel test
-- In PathUtils added convenience utility to find most recent release GATK jar with a specific release number
2012-01-02 09:58:46 -05:00
Mauricio Carneiro
a837970ea2
Merged bug fix from Stable into Unstable
2012-01-01 22:20:53 -05:00
Mauricio Carneiro
1b6d52817e
fixing adaptor clipping effect on recalibration integration test
2012-01-01 22:20:06 -05:00
Ryan Poplin
e45ca8bfa2
Protect against too many alternate alleles in the haplotype caller.
2012-01-01 19:12:48 -05:00
Eric Banks
b0d68eb0e3
Merge remote-tracking branch 'unstable/master'
2011-12-31 20:26:44 -05:00
Mauricio Carneiro
55cfa76cf3
Updated integration tests for the new adaptor clipping fix.
2011-12-30 18:47:14 -05:00
Mauricio Carneiro
c7d0a9ebee
Forgot to test for inter-chromosomal mates in the adaptor clipping
...
* Fixing bug caught by Eric (and Kristian)
2011-12-30 00:19:53 -05:00
Matt Hanna
a259bfefd4
First commit addressing problems running RTC in parallel.
...
Turns out that because the RTC is the first walker to 'correctly' tree reduce according to functional programming
standards, the RTC has revealed a few problems with the tree reducer holding on to too much data. This is the first
and smaller of two commits to reduce memory consumption. The second commit will likely be pushed after GATK1.4 is
released.
2011-12-29 16:22:14 -05:00
Matt Hanna
e6e80e8d3f
Update Picard to fix a bug Mauricio found in Picard where Picard unnecessarily depends on Snappy during some usages of SortingCollection.
2011-12-29 14:35:02 -05:00
Roger Zurawicki
efe33a0a1b
BUG FIX: Output is correct
...
The output would put zero coverage because the pileup filtered using the wrong method
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2011-12-28 23:05:43 -05:00
Roger Zurawicki
5672688a73
Optimized CoverageByRG and Added GCContent
...
- CoverageByRG now uses a hashmap for its value instead of a list. It runs about 4 times faster.
- Cleaned up some of the code
- CoverageByRG now calculates GCContent
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2011-12-28 15:25:07 -05:00
Roger Zurawicki
0c05998c4c
Added CoverageByRG LocusWalker
...
WIll take any number of input bams and intervals
Returns a ReportTable with Average Coverage of each Read Group per Interval
Signed-off-by: Mauricio Carneiro <carneiro@broadinstitute.org>
2011-12-28 15:25:07 -05:00
Mauricio Carneiro
f692911903
GATKSAMRecord emptyRead static constructor
...
* Creates an empty GATKSAMRecord with empty (not null) Cigar, bases and quals. Allows empty reads to be probed without breaking.
* All ReadClipper utilities now emit empty reads for fully clipped reads
2011-12-27 17:01:17 -05:00
Mauricio Carneiro
8259c748f2
No more Filtered Reads tag.
...
All synthetic reads are marked with the reduced read tag.
2011-12-27 17:01:17 -05:00
Ryan Poplin
ef31b2f0a7
fixing merge conflicts.
2011-12-27 14:26:36 -05:00
Ryan Poplin
4f09a95221
Updating HaplotypeCaller for the new contracts in the adapter clipping.
2011-12-27 14:25:03 -05:00
Mauricio Carneiro
17bfe48d5e
Made all class methods private in the ReadClipper
...
* ReadClipperUnitTest now uses static methods
* Haplotype caller now uses static methods
* Exon Junction Genotyper now uses static methods
2011-12-27 02:11:32 -05:00
Mauricio Carneiro
ce493bf257
Added adaptor clipping to ReduceReads
...
* made all clipping steps optional with arguments.
2011-12-27 01:19:06 -05:00
Mauricio Carneiro
f7a5752025
Let this one slip through my commits.
2011-12-26 21:55:02 -05:00
Mauricio Carneiro
c1eaf7cf81
ReduceReads will allows different context sizes for different events
...
* Rename contextSize to contextSizeMismatches
* Indel context size is now different from mismatches context size
2011-12-26 21:17:29 -05:00
Mauricio Carneiro
4633637af6
Moved ReduceReads to static ReadClipper
...
* all clipping done in ReduceReads is done using the static methods of the ReadClipper now.
2011-12-26 21:14:40 -05:00
Mauricio Carneiro
9aa1c0c6e5
Better documentation and contracts for ReduceReads
...
* added javadoc to all methods
* added GATKDocs style documentation to the ReduceReadsWalker
* revised contracts and made explicit in the documentation
2011-12-26 21:12:23 -05:00
Mauricio Carneiro
3051cdf9c5
fixed reduced reads integration tests
2011-12-26 21:12:22 -05:00
Mauricio Carneiro
256a7d8bd2
fixing the arguments for RRead script
2011-12-26 21:12:22 -05:00
Eric Banks
dd990061f6
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-26 14:45:35 -05:00
Eric Banks
2130b39f33
Found the bug in the engine: RodLocusView was using the wrong seek method so that it would only move to the first locus of a shard (and with multi-locus shards, this meant that we never processed RODs from the other positions). In fact, because the seek(Shard) method is extremely misleading and now no longer used, I think it's safer to delete it and make everyone use the much more transparent seek(GenomeLoc). Note that I have not re-enabled my improvements to the intervals accumulation of ReferenceDataSource because that inefficiency is still present downstream in RodLocusView; need to discuss those changes with Matt.
2011-12-26 14:45:19 -05:00
Mauricio Carneiro
02495a5fd5
renaming script, once more
2011-12-23 20:01:25 -05:00
Mauricio Carneiro
afc58b81b2
changing permissions on the scala script
2011-12-23 19:47:48 -05:00
Mauricio Carneiro
5198f3a287
Making -e optional and renaming script
...
* Expanding intervals should be optional, not mandatory
2011-12-23 19:36:57 -05:00
Mauricio Carneiro
35c41409a1
Better contracts and docs for the ReadClipper
...
* Described the ReadClipper contract in the top of the class
* Added contracts where applicable
* Added descriptive information to all tools in the read clipper
* Organized public members and static methods together with the same javadoc
2011-12-23 19:36:57 -05:00
David Roazen
506c0e9c97
Disabling SnpEff support in the GATK and SnpEff annotation in the HybridSelectionPipeline
...
SnpEff support will remain disabled until SnpEff 2.0.4 has been officially released
and we've verified the quality of its annotations.
2011-12-23 19:12:57 -05:00
Eric Banks
24c84da60d
'Fixing' the changes in ReferenceDataSource so that a shard properly contains a list of GenomeLocs instead of a single merged one. However, that uncovered a probable bug in the engine, so instead of letting this code fester unfixed in the build (affecting everyone in the group) I've decided to revert the previous (slow, but working) version and fix the engine in my own branch.
2011-12-23 15:39:12 -05:00
Eric Banks
8762313a0d
Better TODO message
2011-12-22 20:54:35 -05:00
Eric Banks
a815e875a8
Removing debugging output
2011-12-22 15:49:11 -05:00
Eric Banks
deef542a38
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-22 15:44:58 -05:00
Eric Banks
6d260ec6ae
Start printing traversal stats after 30 seconds. I can't stand waiting 2 minutes.
2011-12-22 15:40:59 -05:00
David Roazen
510c71158c
Merged bug fix from Stable into Unstable
2011-12-22 10:49:52 -05:00
David Roazen
32cdef9682
Rename *PerformanceTest test classes to *LargeScaleTest
...
This is in preparation for the installation of the new performance test suite in Bamboo.
Note that "ant performancetest" is now "ant largescaletest"
2011-12-22 10:38:49 -05:00
Mauricio Carneiro
473af102c1
Added 'expand intervals' option to reduce reads scala script
...
This allows generating reduce reads with off-target regions. Default is not to use it.
2011-12-21 15:15:45 -05:00
Mauricio Carneiro
3358c132a8
Updating the MD5s
...
Clipping adaptor boundaries changed the results of CountCovariates which affected the PPP output.
a few more loci were visible to locus walkers.
2011-12-21 15:14:05 -05:00
Mauricio Carneiro
a333144aaf
more verbose output for updateSampleList.lua
2011-12-21 13:30:35 -05:00
Mauricio Carneiro
2e232e26da
New name for ReduceReads scala script
2011-12-21 13:13:07 -05:00
Mauricio Carneiro
98f4cdecc8
Renaming ReduceReads script
...
name was confusing with the walker
2011-12-21 13:11:34 -05:00
Matt Hanna
4d65aefc7b
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-12-20 21:43:57 -05:00
Matt Hanna
d50f9b98bb
Make sure that the temporary ReadWalker performance improvement hack
...
works well in the binary release, jic GATK 1.4 arrives before I get
a Picard patch.
2011-12-20 21:42:30 -05:00
Mauricio Carneiro
731a463415
Updated IntegrationTests with new adaptor clipper
...
phew!
2011-12-20 17:48:52 -05:00