Eric Banks
dcd31e654d
Turn off RR tests while I debug
2012-09-21 17:26:00 -04:00
Eric Banks
21251c29c2
Off-by-one error in sliding window manifests itself at end of a coverage region dropping the last covered base.
2012-09-21 17:22:30 -04:00
Mauricio Carneiro
2c3dc291c0
Added positive/negative strand to the synthetic reads
2012-09-21 10:00:48 -04:00
Mauricio Carneiro
51cb5098e4
Fixed the alignment issues with reads that started with empty consensus headers
2012-09-21 10:00:47 -04:00
Mauricio Carneiro
aa1d2f3a5b
Not every consensus is well aligned. Need to check more, but starting position has been fixed.
2012-09-21 10:00:45 -04:00
Mauricio Carneiro
97874b92d1
Program runs, but the consensus reads are all out of place and need more tags
2012-09-21 10:00:44 -04:00
Mauricio Carneiro
3494a52ddc
another intermediate commit to update changes from stable
2012-09-21 10:00:43 -04:00
Mauricio Carneiro
a89ff7b5dd
Intermediate commit to resolve conflicts coming from stable
2012-09-21 10:00:41 -04:00
Eric Banks
1316b579f0
Bad news folks: BQSR scatter-gather was totally busted; you absolutely cannot trust any BQSR table that was a product of SG (for any version of BQSR). I fixed BQSR-gathering, rewrote (and enabled) the unit test, and confirmed that outputs are now identical whether or not SG is used to create the table.
2012-09-20 14:14:34 -04:00
Eric Banks
4b7edc72d1
Fixing edge case bug in the Exact model (both standard and generalized) where we could abort prematurely in the special case of multiple polymorphic alleles and samples with widely different depths of coverage (e.g. exome and low-pass). In these cases it was possible to call the site bi-allelic when in fact it was multi-allelic (but it wouldn't cause it to create a monomorphic call).
2012-09-20 10:59:42 -04:00
Mauricio Carneiro
ee31a54a03
Merged bug fix from Stable into Unstable
2012-09-19 16:09:45 -04:00
Mauricio Carneiro
7cf9911924
Fixed ReduceReads bug where variant regions were missing.
...
This affected variant regions with more than 100 reads and less than 250 reads. Only bams reduced with GATK v2 and 2.1 were affected.
2012-09-19 16:09:08 -04:00
Ryan Poplin
26e35e5ee2
updating BQSR integration tests
2012-09-19 14:10:34 -04:00
Ryan Poplin
b99099f05c
The BaseRecalibrator and DelocalizedBaseRecalibrator have gotten out of sync. Fixing.
2012-09-19 12:30:26 -04:00
Ryan Poplin
7a7103a757
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-19 10:39:18 -04:00
Guillermo del Angel
bebd5c14b8
Update general ploidy md5's due to bad merge of md5's in previous commit, and new shortened interval definition for EMIT_ALL_CONFIDENT_SITES was buggy
2012-09-18 20:12:15 -04:00
Guillermo del Angel
ca010160a9
Merge fix
2012-09-14 14:05:21 -04:00
Guillermo del Angel
6b37350bc0
Two hairy bugs in pool caller: a) Site error model wasn't counting errors in insertions correctly - Alleles passed in had padded ref byte, but event base in PileupElement doesn't have it. As a result, mismatch rate was grossly overestimated with insertions and we missed several calls we should have made. Integration test reflects changes. b) Adding a ref GL to the exact model is correct mathematically but AFResult wasn't filled properly. As a result, QUAL was junk in pure ref sites, and in all other sites the last ref GL introduced wasn't properly updating Pr(AF>0). c) Added integration test that covers -out_mode EMIT_ALL_CONFIDENT_SITES. Not fully sure if the math is 100% correct (for both diploid and generalized case) but at least now diploid and non-diploid cases behave similarly. md5 of this new test will fail since it's taking me a long time to run so I'll update from Bamboo output shortly
2012-09-14 13:13:22 -04:00
Eric Banks
0206e09a6a
Merge branch 'master' of ssh://gsa2/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-12 15:18:27 -04:00
Eric Banks
d94d0d15c2
Complete overhaul of previous commits to make it all work with scatter-gather. Now tracks output files correctly and can print to stdout.
2012-09-12 15:15:40 -04:00
Ryan Poplin
c9111bb23e
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-12 14:46:50 -04:00
Ryan Poplin
849a2b8839
Adding HC integration test for _structural_ insertions and deletions.
2012-09-12 12:23:00 -04:00
Eric Banks
994a4ff387
Track all outputs from BQSR (.table, .csv., and .pdf) as @Output arguments. Updated integration tests because we no longer have command-line options not to generate plots (now just don't provide a pdf) or to keep the intermediate csv (now, just provide a filename on the command-line). This is currently busted because we can't access the original filenames from the Engine's storage/stub system and therefore cannot call out to the Rscript with the executor (which requires filename strings).
2012-09-12 11:24:53 -04:00
Mark DePristo
bfbf1686cd
Fixed nasty bug with defaulting to diploid no-call genotypes
...
-- For the pooled caller we were writing diploid no-calls even when other samples were haploid. Changed maxPloidy function to return a defaultPloidy, rather than 0, in the case where all samples are missing.
-- VCF/BCF Writers now create missing genotypes with the ploidy of other samples, or 2 if none are available at all.
-- Updating integration tests for general ploidy, as previously we wrote ./. even when other calls were 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/1/1/1/1/1, but now we write ./././././././././././././././././././././././. (ugly but correct)
2012-09-12 07:08:03 -04:00
Ryan Poplin
35d15278af
Merge branch 'master' of ssh://gsa2.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-11 14:34:17 -04:00
Guillermo del Angel
13831106d5
Fix GSA-535: storing likelihoods in allele map was busted when running HaplotypeCaller, only the last likelihood of a haplotype was being stored, as opposed to the max likelihood of all haplotypes mapping to an allele
2012-09-11 11:01:26 -04:00
Ryan Poplin
aa9829b55c
fixing typo
2012-09-10 13:36:37 -04:00
Guillermo del Angel
10c720cbba
Merge branch 'master' of ssh://gsa4/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-09-10 09:56:47 -04:00
Guillermo del Angel
2d4b00833b
Bug fix for logging likelihoods in new read allele map: reads which were filtered out were being excluded from map, but they should be included in annotations
2012-09-09 20:35:45 -04:00
Ryan Poplin
36913706c0
Bug fix in HC GenotypingEngine to ensure that all the merged complex events get properly added to the priority list used by VariantContextUtils when combining multiallelic events.
2012-09-09 13:47:54 -04:00
Ryan Poplin
688fc9fb56
Bug fix in HC GenotypingEngine to ensure that all the merged complex events get properly added to the priority list used by VariantContextUtils when combining multiallelic events.
2012-09-09 10:36:09 -04:00
David Roazen
cb84a6473f
Downsampling: experimental engine integration
...
-Off by default; engine fork isolates new code paths from old code paths,
so no integration tests change yet
-Experimental implementation is currently BROKEN due to a serious issue
involving file spans. No one can/should use the experimental features
until I've patched this issue.
-There are temporarily two independent versions of LocusIteratorByState.
Anyone changing one version should port the change to the other (if possible),
and anyone adding unit tests for one version should add the same unit tests
for the other (again, if possible). This situation will hopefully be extremely
temporary, and last only until the experimental implementation is proven.
2012-09-06 15:03:27 -04:00
Yossi Farjoun
d6884e705a
Revert "fixed a typo in StringText.properties"
...
This reverts commit b74c1c17e748f75e59d23545084b983e2a8d2fa6.
2012-09-05 15:21:00 -04:00
Yossi Farjoun
f4b39a7545
Merge branch 'master' of ssh://gsa4/humgen/gsa-scr1/gsa-engineering/git/unstable
...
merging trivially after a commit
2012-09-05 14:33:39 -04:00
Yossi Farjoun
6e517df5d9
fixed a typo in StringText.properties
2012-09-05 14:33:08 -04:00
Ryan Poplin
9cc1a9931b
Resolving merge conflicts.
2012-09-04 10:47:38 -04:00
Ryan Poplin
c9944d81ef
Skip array needs to also be used in the updateDataForRead function of the delocalized BQSR.
2012-09-04 10:33:37 -04:00
Mark DePristo
1b0ce511a6
Updating BQSR tests due to my change to reset BQSR calibration data
2012-08-31 19:51:09 -04:00
Mark DePristo
817ece37a2
General infrastructure for ReadTransformers
...
-- These are like read filters but can be applied either on input, on output, of handled by the walker
-- Previous example of BAQ now uses the general framework
-- Resulted in massive conceptual cleanup of SAMDataSource and ReadProperties! Yeah!
-- BQSR now uses this framework. We can now do BQSR on input, on output, or within a walker
-- PrintReads now handles all read transformers in the walker in map, enabling us to parallelize PrintReads with BAQ and BQSR
-- Currently BQSR is excepting in parallel, which subsequent commit with fix
-- Removed global variable setting in GenomeAnalysisEngine for BAQ, as command line parameters are cleanly handled by ReadTransformer infrastructure
-- In principle ReadFilters are just a special kind of ReadTransformer, but this refactoring is larger than I can do. It's a JIRA entry
-- Many files touched simply due to the refactoring and renaming of classes
2012-08-31 13:42:41 -04:00
Mark DePristo
1200848bbf
Part II of GSA-462: Consistent RODBinding access across Ref and Read trackers
...
-- Deleted ReadMetaDataTracker
-- Added function to ReadShard to give us the span from the left most position of the reads in the shard to the right most, which is needed for the new view
2012-08-30 10:15:10 -04:00
Ryan Poplin
57d997f06f
Fixing bug from when FragmentUtils merging function moved over to the soft clipped start instead of the unclipped start
2012-08-30 10:10:43 -04:00
Ryan Poplin
35baf0b155
This along with Mauricio's previous commit (thanks!) fixes GSA-522. There are no longer any modifications to reads in the map calls of ActiveRegion walkers. Added the bam which identified this error as a new integration test.
2012-08-30 09:07:36 -04:00
Ryan Poplin
e12ae65d33
Changing the commenting style in the BQSR
2012-08-29 11:27:45 -04:00
Ryan Poplin
18eca3544e
Initial commit of the delocalized BQSR written as a read walker.
2012-08-28 15:24:20 -04:00
Mark DePristo
0f4acaae1b
Update MD5s with new FS score
2012-08-28 08:06:47 -04:00
Mark DePristo
b3fd74f0c4
HaplotypeCaller forbids BAQ
2012-08-24 13:25:05 -04:00
Ryan Poplin
fe3069b278
Merged bug fix from Stable into Unstable
2012-08-22 14:40:34 -04:00
Ryan Poplin
e5cfdb4811
Bug fix for popular _Duplicate allele added to VariantContext_ error reported on the forum. It seems to be due to lower case bases in the reference being treated as reference mismatches. We would try to turn these mismatches into SNP events, for example c/C. We now uppercase the result from IndexedFastaSequenceFile.getSubsequenceAt()
2012-08-22 14:39:35 -04:00
Ryan Poplin
63213e8eb5
Expanding the HaplotypeCaller integration tests to cover a wider range of data
2012-08-22 14:18:44 -04:00
Guillermo del Angel
901f47d8af
Final step (for now) in VA refactoring: update MD5's because, a) since it's not guaranteed that we'll iterate through reads/pileups in the same order, the rank sum dithering will change annotations, b) FS uses new generic threshold to distinguish uninformative reads (it used to use ad-hoc thresholds), c) AD definition changed and throws away uninformative reads, d) shortened general ploidy integration tests for quicker debugging. May have missed some MD5's in the update so there may be lingering test failures still
2012-08-22 11:38:51 -04:00