Commit Graph

8140 Commits (b5de1820145f3b2ca4156454dfff79164e4dde72)

Author SHA1 Message Date
Mauricio Carneiro b5de182014 isEmpty now checks if mReadBases is null
Since newly created reads have mReadBases == null. This is an effort to centralize the place to check for empty GATKSAMRecords.
2011-11-18 18:34:05 -05:00
Mauricio Carneiro 8ab3ee9c65 Merge remote-tracking branch 'unstable/master' into rr 2011-11-18 16:50:25 -05:00
Mauricio Carneiro 333e5de812 returning read instead of GATKSAMRecord
Do not create new GATKSAMRecord when read has been fully clipped, because it is essentially the same as returning the currently fully clipped read.
2011-11-18 16:49:59 -05:00
Mauricio Carneiro 3f141d3c32 slightly clearer context to finalizeAndAdd()
now it can handle "both" synthetic and running consensus at once. This should avoid forgetting to close one or the other in the future.
2011-11-18 16:35:14 -05:00
Mauricio Carneiro e08b070a6a Bug fix: Variant region starting after synthetic read didn't close it properly
If a synthetic read preceded a variant region, the variant region was not closing the synthetic read before moving the sliding window. Fixed.
2011-11-18 16:35:14 -05:00
Mauricio Carneiro 74eeb32d74 adapting reduce reads script to do WGS 2011-11-18 16:35:14 -05:00
Matt Hanna 8bb4d4dca3 First pass of the asynchronous block loader.
Block loads are only triggered on queue empty at this point.  Disabled by
default (enable with nt:io=?).
2011-11-18 15:02:59 -05:00
Eric Banks 6459784351 Merged bug fix from Stable into Unstable 2011-11-18 12:34:57 -05:00
Eric Banks c62082ba1b Making this class public again as per request from Cancer folks 2011-11-18 12:34:27 -05:00
Eric Banks 8710673a97 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-18 12:29:33 -05:00
Eric Banks 768b27322b I figured out why we were getting tons of hom var genotype calls with Mauricio's low quality (synthetic) reduced reads: the RR implementation in the UG was not capping the base quality by the mapping quality, so all the low quality reads were used to generate GLs. Fixed. 2011-11-18 12:29:15 -05:00
Guillermo del Angel dbc1d53e7a Simple qscript to select 2000 random multiallelic indels from VQSR indel release, for array validation 2011-11-18 10:52:09 -05:00
Guillermo del Angel 77ef2be9b8 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-18 07:53:13 -05:00
Guillermo del Angel 99ed64933f OK, I had it with the validation site selector running for 60 hours (due to speed of genotype reading/parsing) in gsa3 only to fail in OnTraversalDone() because of some silly operator issue. Break up validation site selection process by chromosome, pick # of sites in each chromosome proportional to chr length, (taking care of roundoff issues to ensure precisely requested number of sites is kept), and then CombineVariants in the end. This also makes the selector run comfortably under 2Gb and thus can be easily LSF'ed 2011-11-18 07:52:52 -05:00
Roger Zurawicki f48d4cfa79 Bug fix: fully clipping GATKSAMRecords and flushing ops
Reads that are emptied after clipping become new GATKSAMRecords.
When applying ClippingOps, the ops are cleared after the clipping
2011-11-18 00:24:39 -05:00
David Roazen 68b2a0968c Updating the HybridSelectionPipeline for SnpEff 2.0.4 RC3
This will have to be done again when the 2.0.4 release becomes official,
but it's necessary to do now in order to re-enable the pipeline tests.
2011-11-17 14:46:12 -05:00
Khalid Shakir c50274e02e During flanking interval creation merging overlapping flanks so that on scatter the list doesn't accidentally genotype the same site twice.
Moved flanking interval utilies to IntervalUtils with UnitTests.
2011-11-17 13:56:42 -05:00
Eric Banks bad19779b9 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-17 13:29:43 -05:00
Eric Banks 16a021992b Updated header description for the INFO and FORMAT DP fields to be more accurate. 2011-11-17 13:17:53 -05:00
Eric Banks e7d41d8d33 Minor cleanup 2011-11-17 12:00:28 -05:00
Mauricio Carneiro 72f00e2883 Merging Roger's Unit tests for Reduce Reads from RR repository 2011-11-16 17:26:49 -05:00
Eric Banks f250b47228 Someone broke this for SNPs when adding support for indels 2011-11-16 10:49:27 -05:00
Matt Hanna eb8e031f75 Merged bug fix from Stable into Unstable 2011-11-16 09:57:37 -05:00
Matt Hanna 6a5d5e7ac9 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/stable 2011-11-16 09:57:13 -05:00
Matt Hanna 7ac5cf8430 Getting rid of unsupported CountReadPairs walker in stable. Removal of
remainder of pairs processing framework to follow in unstable.
2011-11-16 09:53:59 -05:00
Eric Banks c2ebe58712 Merge remote-tracking branch 'Laurent/master' 2011-11-16 09:34:47 -05:00
Laurent Francioli 7d77fc51f5 Corrected bug causing PhaseByTransmission to crash in case of new Genotype.Type 2011-11-16 03:32:43 -05:00
David Roazen 0d163e3f52 SnpEff 2.0.4 support
-Modified the SnpEff parser to work with the SnpEff 2.0.4 VCF output format
-Assigning functional classes and effect impacts now handled directly
 by SnpEff rather than the GATK
-Removed support for SnpEff 2.0.2, as we no longer trust the output of that
 version since it doesn't exclude effects associated with certain nonsensical
 transcripts. These effects are excluded as of 2.0.4.
-Updated unit and integration tests

This support is based on a *release-candidate* of SnpEff 2.0.4, and so is subject
to change between now and the next GATK release.
2011-11-15 18:36:22 -05:00
Laurent Francioli fb685f88ec Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-15 16:23:53 -05:00
Eric Banks 7fada320a9 The right fix for this test is just to delete it. 2011-11-15 14:53:27 -05:00
Mauricio Carneiro 231b8e9f74 Do not output deletion only synthetic reads
If a synthetic read is composed exclusively of deletions, do not output it.
2011-11-15 13:24:43 -05:00
Eric Banks b45d10e6f1 The DP in the FORMAT field (per sample) must also use the representative count or else it's always 1 for reduced reads. 2011-11-15 10:23:59 -05:00
Eric Banks b66556f4a0 Update error message so that it's clear ReadPair Walkers are exceptions 2011-11-15 09:22:57 -05:00
Roger Zurawicki 284430d61d Added more basic UnitTests for ReadClipper
hardClipByReadCoordinatesWorks
hardClipLowQualTailsWorks
2011-11-15 00:13:52 -05:00
Roger Zurawicki 8e91e19229 Merge branch 'master' of ssh://nickel/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-15 00:13:37 -05:00
Mauricio Carneiro cde829899d compress Reduce Read counts bytes by offset
compressed the representation of the reduce reads counts by offset results in 17% average compression in final BAM file size.

Example compression -->

from : 10, 10, 11, 11, 12, 12, 12, 11, 10
to:      10, 0, 1, 1,2, 2, 2, 1, 0
2011-11-14 18:30:24 -05:00
Mauricio Carneiro a1ce3d8141 Not reporting counts to reduced deletions (temporary patch)
Deletions will not have counts represented in the reduced form. This may change in the future with a ReadBackedPileup refactor.
2011-11-14 18:30:24 -05:00
David Roazen ab0ee9b847 Perform only necessary validation in VariantContext modify methods 2011-11-14 16:49:59 -05:00
Guillermo del Angel 5c38a9cfd6 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-14 15:00:03 -05:00
Guillermo del Angel f1db31f072 Attempt to reduce memory footprint of ValidationSiteSelector (if this doesn't work then a radical rewrite of the walker to make it two-pass will be necessary): don't log any attributes of original VCF, if we need chr counts later we can reannotate from original inputs. As things stand, we can't select SNP's genomewide due to memory usage. 2011-11-14 14:56:09 -05:00
Eric Banks 4dc9dbe890 One quick fix to previous commit 2011-11-14 14:42:12 -05:00
Eric Banks b3313e1445 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2011-11-14 14:31:38 -05:00
Eric Banks 7b2a7cfbe7 Transfer headers from the resource VCF when possible when using expressions. While there, VA was modified so that it didn't assume that the ID field was present in the VC's info map in preparation for Mark's upcoming changes. 2011-11-14 14:31:27 -05:00
Guillermo del Angel 509ecc62cc Another bug fix for when no samples are specified in ValidationSiteSelectionWalker 2011-11-14 13:02:51 -05:00
Eric Banks 7aee80cd3b Fix to deal with reduced reads containing a deletion 2011-11-14 12:23:46 -05:00
Eric Banks 3d2970453b Misc minor cleanup 2011-11-14 09:41:54 -05:00
Laurent Francioli 1347beef40 Merge branch 'PhaseByTransmission' 2011-11-14 11:31:28 +01:00
Laurent Francioli 6881d4800c Added Integration tests for Phasing by Transmission 2011-11-14 10:47:51 +01:00
Laurent Francioli 34acf8b978 Added Unit tests for new methods in GenotypeLikelihoods 2011-11-14 10:47:02 +01:00
Roger Zurawicki 1202a809cb Added Basic Unit Tests for ReadClipper
Tests some but not all functions
Some tests have been disabled because they are not working
2011-11-13 22:27:49 -05:00