Eric Banks
bd944ab04f
Another test where we no longer print out 'NaN' for the AF.
2012-02-27 15:19:08 -05:00
Eric Banks
52871187d7
Adding integration test for file with no GTs. Also updated md5 for one other test (since we no longer print out 'NaN' for the AF).
2012-02-27 15:09:56 -05:00
Eric Banks
998ed8fff3
Bug fix to deal with VCF records that don't have GTs. While in there, optimized a bunch of related functions (including removing a copy of the method calculateChromosomeCounts(); why did we have 2 copies? very dangerous).
2012-02-27 14:56:10 -05:00
Eric Banks
1ea34058c2
Updating integration tests now that standard annotations support multiple alleles
2012-02-27 11:32:26 -05:00
Eric Banks
64754e7870
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-27 11:31:41 -05:00
Eric Banks
850c5d0db2
Enabling Rank Sum Tests for multi-allelics: use ref vs any alt allele.
2012-02-27 09:59:36 -05:00
Eric Banks
dfdf4f989b
Enabling Fisher Strand for multi-allelics: use the alt allele with max AC. Added minor optimization to the method in the VC.
2012-02-27 09:50:09 -05:00
Guillermo del Angel
16122bea8d
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-25 13:57:54 -05:00
Guillermo del Angel
dea35943d1
a) Bug fix in calling new functions that give indel bases and length from regular pileup in LocusIteratorByState, b) Added unit test to cover these.
2012-02-25 13:57:28 -05:00
Mark DePristo
c8a06e53c1
DoC now properly handles reference N bases + misc. additional cleanups
...
-- DoC now by default ignores bases with reference Ns, so these are not included in the coverage calculations at any stage.
-- Added option --includeRefNSites that will include them in the calculation
-- Added integration tests that ensures the per base tables (and so all subsequent calculations) work with and without reference N bases included
-- Reorganized command line options, tagging advanced options with @Advanced
2012-02-25 11:32:50 -05:00
Mark DePristo
50de1a3eab
Fixing bad VCFIntegration tests
...
-- Left disabled a test that should have been enabled
-- Didn't add the md5 to the test I actually added
-- Now VCFIntegrationTests should be working!
2012-02-25 11:26:36 -05:00
Guillermo del Angel
c9a4c74f7a
a) Bug fixes for last commit related to PileupElements (unit tests are forthcoming). b) Changes needed to make pool caller work in GENOTYPE_GIVEN_ALLELES mode c) Bug fix (yet again) for UG when GENOTYPE_GIVEN_ALLELES and EMIT_ALL_SITES are on, when there's no coverage at site and when input vcf has genotypes: output vcf would still inherit genotypes from input vcf. Now, we just build vc from scratch instead of initializing from input vc. We just take location and alleles from vc
2012-02-24 10:27:59 -05:00
Mauricio Carneiro
ee9a56ad27
Fix subtle bug in the ReduceReads stash reported by Adam
...
* The tailSet generated every time we flush the reads stash is still being affected by subsequent clears because it is just a pointer to the parent element in the original TreeSet. This is dangerous, and there is a weird condition where the clear will affects it.
* Fix by creating a new set, given the tailSet instead of trying to do magic with just the pointer.
2012-02-23 18:35:25 -05:00
Mark DePristo
e0c189909f
Added support for breakpoint alleles
...
-- See https://getsatisfaction.com/gsa/topics/support_vcf_4_1_structural_variation_breakend_alleles?utm_content=topic_link&utm_medium=email&utm_source=new_topic
-- Added integrationtest to ensure that we can parse and write out breakpoint example
2012-02-23 12:14:48 -05:00
Guillermo del Angel
6866a41914
Added functionality in pileups to not only determine whether there's an insertion or deletion following the current position, but to also get the indel length and involved bases - definitely needed for extended event removal, and needed for pool caller indel functionality.
2012-02-23 09:45:47 -05:00
Eric Banks
d34f07dba0
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-22 20:41:03 -05:00
Ryan Poplin
2b6c0939ab
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-22 19:00:38 -05:00
Ryan Poplin
8695738400
Bug fix in HaplotypeCaller's GENOTYPE_GIVEN_ALLELES mode for insertions greater than length 1. The allele being genotyped was off by one base pair.
2012-02-22 19:00:04 -05:00
Christopher Hartl
2c1b14d35e
Mostly small changes to my own scala scripts: .vcf.gz compatibility for output files, smarter beagle generation, simple script to scatter-gather combine variants. Whole genome indel calling now uses the gold standard indel set.
2012-02-22 17:20:04 -05:00
Mauricio Carneiro
75783af6fc
int <-> BitSet conversion utils for MathUtils
...
* added unit tests.
2012-02-21 14:10:36 -05:00
Guillermo del Angel
0f5674b95e
Redid fix for corner case when forming consensus with reads that start/end with insertions and that don't agree with each other in inserted bases: since I can't iterate over the elements of a HashMap because keys might change during iteration, and since I can't use ConcurrentHashMaps, the code now copies structure of (bases, number of times seen) into ArrayList, which can be addressed by element index in order to iterate on it.
2012-02-20 09:12:51 -05:00
Ryan Poplin
3d9eee4942
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-18 10:55:29 -05:00
Ryan Poplin
a8be96f63d
This caching in the BQSR seems to be too slow now that there are so many keys
2012-02-18 10:54:39 -05:00
Ryan Poplin
78718b8d6a
Adding Genotype Given Alleles mode to the HaplotypeCaller. It constructs the possible haplotypes via assembly and then injects the desired allele to be genotyped.
2012-02-18 10:31:26 -05:00
Guillermo del Angel
e724c63f2b
Reverting last commit until I learn how to effectively replicate and debug pipeline test failures, and until I also learn how to effectively remove a kep from a HashMap that's being iterated on
2012-02-17 17:18:43 -05:00
Guillermo del Angel
f2ef8d1d23
Reverting last commit until I learn how to effectively replicate and debug pipeline test failures, and until I also learn how to effectively remove a kep from a HashMap that's being iterated on
2012-02-17 17:15:53 -05:00
Guillermo del Angel
3e031a540f
Solve merge conflict
2012-02-17 10:56:03 -05:00
Guillermo del Angel
cd352f502d
Corner case bug fix: if a read starts with an insertion, when computing the consensus allele for calling the insertion was only added to the last element in the consensus key hash map. Now, an insertion that partially overlaps with several candidate alleles will have their respective count increased for all of them
2012-02-17 10:21:37 -05:00
Eric Banks
2f33c57060
No reason to restrict HaplotypeScore to bi-allelic SNPs when the plumbing for multi-allelic events is already present.
2012-02-16 13:58:00 -05:00
Guillermo del Angel
2f08846d82
Merged bug fix from Stable into Unstable
2012-02-14 21:26:25 -05:00
Guillermo del Angel
7dc6f73399
Bug fix for validation site selector: records with AC=0 in them were always being thrown out if input vcf was sites-only, even when -ignorePolymorphicStatus flag was set
2012-02-14 21:11:24 -05:00
Ryan Poplin
30085781cf
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-14 14:01:20 -05:00
Ryan Poplin
ae5b42c884
Put base insertion and base deletions in the SAMRecord as a string of quality scores instead of an array of bytes. Start of a proper genotype given alleles mode in HaplotypeCaller
2012-02-14 14:01:04 -05:00
David Roazen
85d31f80a2
Merged bug fix from Stable into Unstable
2012-02-13 16:37:11 -05:00
David Roazen
03e5184741
Fix serious engine bug that could cause reads to be dropped under certain circumstances
...
When aggregating raw BAM file spans into shards, the IntervalSharder tries to combine
file spans when it can. Unfortunately, the method that combines two BAM file
spans was seriously flawed, and would produce a truncated union if the file spans
overlapped in certain ways. This could cause entire regions of the BAM file containing
reads within the requested intervals to be dropped.
Modified GATKBAMFileSpan.union() to correct this problem, and added unit tests
to verify that the correct union is produced regardless of how the file spans
happen to overlap.
Thanks to Khalid, who did at least as much work on this bug as I did.
2012-02-13 16:25:21 -05:00
Eric Banks
ad90af94ed
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-13 15:10:10 -05:00
Eric Banks
0920a1921e
Minor fixes to splitting multi-allelic records (as regards printing indel alleles correctly); minor code refactoring; adding integration tests to cover +/- splitting multi-allelics.
2012-02-13 15:09:53 -05:00
Eric Banks
14981bed10
Cleaning up VariantsToTable: added docs for supported fields; removed one-off hidden arguments for multi-allelics; default behavior is now to include multi-allelics in one record; added option to split multi-allelics into separate records.
2012-02-13 14:32:03 -05:00
Ryan Poplin
e9338e2c20
Context covariate needs to look in the reverse direction for negative stranded reads.
2012-02-13 13:40:41 -05:00
Ryan Poplin
41ffd08d53
On the fly base quality score recalibration now happens up front in a SAMIterator on input instead of in a lazy-loading fashion if the BQSR table is provided as an engine argument. On the fly recalibration is now completely hooked up and live.
2012-02-13 12:35:09 -05:00
Ryan Poplin
3caa1b83bb
Updating HC integration tests
2012-02-11 11:48:32 -05:00
Ryan Poplin
9b8fd4c2ff
Updating the half of the code that makes use of the recalibration information to work with the new refactoring of the bqsr. Reverting the covariate interface change in the original bqsr because the error model enum was moved to a different class and didn't make sense any more.
2012-02-11 10:57:20 -05:00
Eric Banks
f52f1f659f
Multiallelic implementation of the TDT should be a pairwise list of values as per Mark Daly. Integration tests change because the count in the header is now A instead of 1.
2012-02-10 14:15:59 -05:00
Mauricio Carneiro
1fb19a0f98
Moving the covariates and shared functionality to public
...
so Ryan can work on the recalibration on the fly without breaking the build. Supposedly all the secret sauce is in the BQSR walker, which sits in private.
2012-02-10 11:44:01 -05:00
Eric Banks
5e18020a5f
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-10 11:08:33 -05:00
Eric Banks
f53cd3de1b
Based on Ryan's suggestion, there's a new contract for genotyping multiple alleles. Now the requester submits alleles in any arbitrary order - rankings aren't needed. If the Exact model decides that it needs to subset the alleles because too many were requested, it does so based on PL mass (in other words, I moved this code from the SNPGenotypeLikelihoodsCalculationModel to the Exact model). Now subsetting alleles is consistent.
2012-02-10 11:07:32 -05:00
Mauricio Carneiro
5af373a3a1
BQSR with indels integrated!
...
* added support to base before deletion in the pileup
* refactored covariates to operate on mismatches, insertions and deletions at the same time
* all code is in private so original BQSR is still working as usual in public
* outputs a molten CSV with mismatches, insertions and deletions, time to play!
* barely tested, passes my very simple tests... haven't tested edge cases.
2012-02-09 18:46:45 -05:00
Eric Banks
7a937dd1eb
Several bug fixes to new genotyping strategy. Update integration tests for multi-allelic indels accordingly.
2012-02-09 16:14:22 -05:00
Eric Banks
0f728a0604
The Exact model now subsets the VC to the first N alleles when the VC contains more than the maximum number of alleles (instead of throwing it out completely as it did previously). [Perhaps the culling should be done by the UG engine? But theoretically the Exact model can be called outside of the UG and we'd still want the context subsetted.]
2012-02-09 14:02:34 -05:00
Matt Hanna
aa097a83d5
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-09 11:26:48 -05:00