Mauricio Carneiro
0fa50056eb
Reorganizing package structure for compression tools (incl. Reduce Reads)
...
ReduceReads is no longer the only compression tool. Reorganizing the package structure to reflect that.
2011-11-22 15:03:31 -05:00
Mauricio Carneiro
1a50d54c03
A walker that generates the distribution of quality scores
...
Outputs a GATKReportTable that will be used as input to the Quantization walker. Eventually this functionality may be merged into ReduceReads or CountCovariates to avoid another traversal.
2011-11-22 14:49:28 -05:00
Mauricio Carneiro
1614ca1115
Force use of LinkedList
...
Disambiguating which collection I need to use for mapped reads.
2011-11-22 14:49:28 -05:00
Guillermo del Angel
38a90da92c
Fixed merge conflict to Unstable
2011-11-22 14:39:45 -05:00
Guillermo del Angel
32a77a8a56
Prevent out of bound error in case read span > reference context + indel length. Can happen in RNAseq reads with long N CIGAR operators in the middle.
2011-11-22 13:57:24 -05:00
Eric Banks
5821c11fad
For BAM and Reviewed errors we now check the error message to see if it's actually a 'too many open files' problem and, if so, we generate a User Error instead.
2011-11-22 10:50:22 -05:00
Eric Banks
5e7b9ae119
The NONE option is not supported yet (but the error message should mention this)
2011-11-22 10:48:38 -05:00
Mauricio Carneiro
8c7430e6ff
Bugfix: isOriginalRead had the wrong check for WGS
...
The original read has to be contained in the current interval and the first read in the list. In WGS all reads are 'original'.
2011-11-21 17:12:11 -05:00
Mauricio Carneiro
756695ee3f
Better Unit tests for Synthetic Reads
2011-11-21 17:12:11 -05:00
Mauricio Carneiro
5ad3dfcd62
BugFix: byte overflow in SyntheticRead compressed base counts
...
* fixed and added unit test
2011-11-21 17:11:50 -05:00
Eric Banks
44554b2bfd
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-21 15:01:45 -05:00
Eric Banks
022832bd74
Very bad use of the == operator with Strings was ensuring that validating GenomeLocs was very inefficient. This fix resulted in a significant speedup for a simple RodWalker.
2011-11-21 14:49:47 -05:00
David Roazen
1296dd41be
Removing the legacy -L "interval1;interval2" syntax
...
This syntax predates the ability to have multiple -L arguments, is
inconsistent with the syntax of all other GATK arguments, requires
quoting to avoid interpretation by the shell, and was causing
problems in Queue.
A UserException is now thrown if someone tries to use this syntax.
2011-11-21 13:18:53 -05:00
Mauricio Carneiro
b5de182014
isEmpty now checks if mReadBases is null
...
Since newly created reads have mReadBases == null. This is an effort to centralize the place to check for empty GATKSAMRecords.
2011-11-18 18:34:05 -05:00
Mauricio Carneiro
8ab3ee9c65
Merge remote-tracking branch 'unstable/master' into rr
2011-11-18 16:50:25 -05:00
Mauricio Carneiro
333e5de812
returning read instead of GATKSAMRecord
...
Do not create new GATKSAMRecord when read has been fully clipped, because it is essentially the same as returning the currently fully clipped read.
2011-11-18 16:49:59 -05:00
Mauricio Carneiro
3f141d3c32
slightly clearer context to finalizeAndAdd()
...
now it can handle "both" synthetic and running consensus at once. This should avoid forgetting to close one or the other in the future.
2011-11-18 16:35:14 -05:00
Mauricio Carneiro
e08b070a6a
Bug fix: Variant region starting after synthetic read didn't close it properly
...
If a synthetic read preceded a variant region, the variant region was not closing the synthetic read before moving the sliding window. Fixed.
2011-11-18 16:35:14 -05:00
Mauricio Carneiro
74eeb32d74
adapting reduce reads script to do WGS
2011-11-18 16:35:14 -05:00
Matt Hanna
8bb4d4dca3
First pass of the asynchronous block loader.
...
Block loads are only triggered on queue empty at this point. Disabled by
default (enable with nt:io=?).
2011-11-18 15:02:59 -05:00
Eric Banks
6459784351
Merged bug fix from Stable into Unstable
2011-11-18 12:34:57 -05:00
Eric Banks
c62082ba1b
Making this class public again as per request from Cancer folks
2011-11-18 12:34:27 -05:00
Eric Banks
8710673a97
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-18 12:29:33 -05:00
Eric Banks
768b27322b
I figured out why we were getting tons of hom var genotype calls with Mauricio's low quality (synthetic) reduced reads: the RR implementation in the UG was not capping the base quality by the mapping quality, so all the low quality reads were used to generate GLs. Fixed.
2011-11-18 12:29:15 -05:00
Guillermo del Angel
dbc1d53e7a
Simple qscript to select 2000 random multiallelic indels from VQSR indel release, for array validation
2011-11-18 10:52:09 -05:00
Guillermo del Angel
77ef2be9b8
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-18 07:53:13 -05:00
Guillermo del Angel
99ed64933f
OK, I had it with the validation site selector running for 60 hours (due to speed of genotype reading/parsing) in gsa3 only to fail in OnTraversalDone() because of some silly operator issue. Break up validation site selection process by chromosome, pick # of sites in each chromosome proportional to chr length, (taking care of roundoff issues to ensure precisely requested number of sites is kept), and then CombineVariants in the end. This also makes the selector run comfortably under 2Gb and thus can be easily LSF'ed
2011-11-18 07:52:52 -05:00
Roger Zurawicki
f48d4cfa79
Bug fix: fully clipping GATKSAMRecords and flushing ops
...
Reads that are emptied after clipping become new GATKSAMRecords.
When applying ClippingOps, the ops are cleared after the clipping
2011-11-18 00:24:39 -05:00
David Roazen
68b2a0968c
Updating the HybridSelectionPipeline for SnpEff 2.0.4 RC3
...
This will have to be done again when the 2.0.4 release becomes official,
but it's necessary to do now in order to re-enable the pipeline tests.
2011-11-17 14:46:12 -05:00
Khalid Shakir
c50274e02e
During flanking interval creation merging overlapping flanks so that on scatter the list doesn't accidentally genotype the same site twice.
...
Moved flanking interval utilies to IntervalUtils with UnitTests.
2011-11-17 13:56:42 -05:00
Eric Banks
bad19779b9
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-17 13:29:43 -05:00
Eric Banks
16a021992b
Updated header description for the INFO and FORMAT DP fields to be more accurate.
2011-11-17 13:17:53 -05:00
Eric Banks
e7d41d8d33
Minor cleanup
2011-11-17 12:00:28 -05:00
Mauricio Carneiro
72f00e2883
Merging Roger's Unit tests for Reduce Reads from RR repository
2011-11-16 17:26:49 -05:00
Eric Banks
f250b47228
Someone broke this for SNPs when adding support for indels
2011-11-16 10:49:27 -05:00
Matt Hanna
eb8e031f75
Merged bug fix from Stable into Unstable
2011-11-16 09:57:37 -05:00
Matt Hanna
6a5d5e7ac9
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable
2011-11-16 09:57:13 -05:00
Matt Hanna
7ac5cf8430
Getting rid of unsupported CountReadPairs walker in stable. Removal of
...
remainder of pairs processing framework to follow in unstable.
2011-11-16 09:53:59 -05:00
Eric Banks
c2ebe58712
Merge remote-tracking branch 'Laurent/master'
2011-11-16 09:34:47 -05:00
Laurent Francioli
7d77fc51f5
Corrected bug causing PhaseByTransmission to crash in case of new Genotype.Type
2011-11-16 03:32:43 -05:00
David Roazen
0d163e3f52
SnpEff 2.0.4 support
...
-Modified the SnpEff parser to work with the SnpEff 2.0.4 VCF output format
-Assigning functional classes and effect impacts now handled directly
by SnpEff rather than the GATK
-Removed support for SnpEff 2.0.2, as we no longer trust the output of that
version since it doesn't exclude effects associated with certain nonsensical
transcripts. These effects are excluded as of 2.0.4.
-Updated unit and integration tests
This support is based on a *release-candidate* of SnpEff 2.0.4, and so is subject
to change between now and the next GATK release.
2011-11-15 18:36:22 -05:00
Laurent Francioli
fb685f88ec
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-15 16:23:53 -05:00
Eric Banks
7fada320a9
The right fix for this test is just to delete it.
2011-11-15 14:53:27 -05:00
Mauricio Carneiro
231b8e9f74
Do not output deletion only synthetic reads
...
If a synthetic read is composed exclusively of deletions, do not output it.
2011-11-15 13:24:43 -05:00
Eric Banks
b45d10e6f1
The DP in the FORMAT field (per sample) must also use the representative count or else it's always 1 for reduced reads.
2011-11-15 10:23:59 -05:00
Eric Banks
b66556f4a0
Update error message so that it's clear ReadPair Walkers are exceptions
2011-11-15 09:22:57 -05:00
Roger Zurawicki
284430d61d
Added more basic UnitTests for ReadClipper
...
hardClipByReadCoordinatesWorks
hardClipLowQualTailsWorks
2011-11-15 00:13:52 -05:00
Roger Zurawicki
8e91e19229
Merge branch 'master' of ssh://nickel/humgen/gsa-scr1/gsa-engineering/git/unstable
2011-11-15 00:13:37 -05:00
Mauricio Carneiro
cde829899d
compress Reduce Read counts bytes by offset
...
compressed the representation of the reduce reads counts by offset results in 17% average compression in final BAM file size.
Example compression -->
from : 10, 10, 11, 11, 12, 12, 12, 11, 10
to: 10, 0, 1, 1,2, 2, 2, 1, 0
2011-11-14 18:30:24 -05:00
Mauricio Carneiro
a1ce3d8141
Not reporting counts to reduced deletions (temporary patch)
...
Deletions will not have counts represented in the reduced form. This may change in the future with a ReadBackedPileup refactor.
2011-11-14 18:30:24 -05:00