Commit Graph

8976 Commits (92bbb9bbdd5cd29b1d71df86b1c87923aa5e5d0b)

Author SHA1 Message Date
Guillermo del Angel c9a4c74f7a a) Bug fixes for last commit related to PileupElements (unit tests are forthcoming). b) Changes needed to make pool caller work in GENOTYPE_GIVEN_ALLELES mode c) Bug fix (yet again) for UG when GENOTYPE_GIVEN_ALLELES and EMIT_ALL_SITES are on, when there's no coverage at site and when input vcf has genotypes: output vcf would still inherit genotypes from input vcf. Now, we just build vc from scratch instead of initializing from input vc. We just take location and alleles from vc 2012-02-24 10:27:59 -05:00
Mauricio Carneiro 470375db58 added integration test for the ReduceReadsStash bug reported by Adam 2012-02-23 18:59:27 -05:00
Mauricio Carneiro ee9a56ad27 Fix subtle bug in the ReduceReads stash reported by Adam
* The tailSet generated every time we flush the reads stash is still being affected by subsequent clears because it is just a pointer to the parent element in the original TreeSet. This is dangerous, and there is a weird  condition where the clear will affects it.
   * Fix by creating a new set, given the tailSet instead of trying to do magic with just the pointer.
2012-02-23 18:35:25 -05:00
Mark DePristo e0c189909f Added support for breakpoint alleles
-- See https://getsatisfaction.com/gsa/topics/support_vcf_4_1_structural_variation_breakend_alleles?utm_content=topic_link&utm_medium=email&utm_source=new_topic
-- Added integrationtest to ensure that we can parse and write out breakpoint example
2012-02-23 12:14:48 -05:00
Menachem Fromer 522ace6d57 CNV discovery is also a long-running job (depending on the number of samples) 2012-02-23 11:28:22 -05:00
Guillermo del Angel 6866a41914 Added functionality in pileups to not only determine whether there's an insertion or deletion following the current position, but to also get the indel length and involved bases - definitely needed for extended event removal, and needed for pool caller indel functionality. 2012-02-23 09:45:47 -05:00
Eric Banks d34f07dba0 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-22 20:41:03 -05:00
Ryan Poplin 2b6c0939ab Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-22 19:00:38 -05:00
Ryan Poplin 8695738400 Bug fix in HaplotypeCaller's GENOTYPE_GIVEN_ALLELES mode for insertions greater than length 1. The allele being genotyped was off by one base pair. 2012-02-22 19:00:04 -05:00
Christopher Hartl 2c1b14d35e Mostly small changes to my own scala scripts: .vcf.gz compatibility for output files, smarter beagle generation, simple script to scatter-gather combine variants. Whole genome indel calling now uses the gold standard indel set. 2012-02-22 17:20:04 -05:00
Christopher Hartl 9b61a398b3 Merge branch 'master' of ssh://ni.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable 2012-02-22 17:18:10 -05:00
Ryan Poplin ca7b5e068f updating HaplotypeCaller integration tests after change to separate insertion and deletion GOP. 2012-02-22 15:23:24 -05:00
Ryan Poplin e39638323b Misc cleanup in HaplotypeCaller's HMM code now that we have separate GOP for insertions and deletions 2012-02-22 12:24:43 -05:00
Ryan Poplin a611f86558 CalibrateGenotypeLikelihoods now accepts any number of external likelihood VCFs. We decided in the dev group to have the assigned name be a combination of the sample name provided in the VCF and the name provided to the rod binding. 2012-02-22 12:23:45 -05:00
Mauricio Carneiro 75783af6fc int <-> BitSet conversion utils for MathUtils
* added unit tests.
2012-02-21 14:10:36 -05:00
Christopher Hartl 685bcaced2 Merge branch 'master' of ssh://ni.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable 2012-02-21 13:53:37 -05:00
Guillermo del Angel 0f5674b95e Redid fix for corner case when forming consensus with reads that start/end with insertions and that don't agree with each other in inserted bases: since I can't iterate over the elements of a HashMap because keys might change during iteration, and since I can't use ConcurrentHashMaps, the code now copies structure of (bases, number of times seen) into ArrayList, which can be addressed by element index in order to iterate on it. 2012-02-20 09:12:51 -05:00
Ryan Poplin fe102a5d47 Fix for my renaming of the BQSR walker 2012-02-18 11:13:20 -05:00
Ryan Poplin 3d9eee4942 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-18 10:55:29 -05:00
Ryan Poplin a8be96f63d This caching in the BQSR seems to be too slow now that there are so many keys 2012-02-18 10:54:39 -05:00
Ryan Poplin 78718b8d6a Adding Genotype Given Alleles mode to the HaplotypeCaller. It constructs the possible haplotypes via assembly and then injects the desired allele to be genotyped. 2012-02-18 10:31:26 -05:00
Guillermo del Angel e724c63f2b Reverting last commit until I learn how to effectively replicate and debug pipeline test failures, and until I also learn how to effectively remove a kep from a HashMap that's being iterated on 2012-02-17 17:18:43 -05:00
Guillermo del Angel f2ef8d1d23 Reverting last commit until I learn how to effectively replicate and debug pipeline test failures, and until I also learn how to effectively remove a kep from a HashMap that's being iterated on 2012-02-17 17:15:53 -05:00
Guillermo del Angel 3e031a540f Solve merge conflict 2012-02-17 10:56:03 -05:00
Guillermo del Angel cd352f502d Corner case bug fix: if a read starts with an insertion, when computing the consensus allele for calling the insertion was only added to the last element in the consensus key hash map. Now, an insertion that partially overlaps with several candidate alleles will have their respective count increased for all of them 2012-02-17 10:21:37 -05:00
Eric Banks 2f33c57060 No reason to restrict HaplotypeScore to bi-allelic SNPs when the plumbing for multi-allelic events is already present. 2012-02-16 13:58:00 -05:00
Guillermo del Angel 2f08846d82 Merged bug fix from Stable into Unstable 2012-02-14 21:26:25 -05:00
Guillermo del Angel 7dc6f73399 Bug fix for validation site selector: records with AC=0 in them were always being thrown out if input vcf was sites-only, even when -ignorePolymorphicStatus flag was set 2012-02-14 21:11:24 -05:00
Ryan Poplin 30085781cf Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-14 14:01:20 -05:00
Ryan Poplin ae5b42c884 Put base insertion and base deletions in the SAMRecord as a string of quality scores instead of an array of bytes. Start of a proper genotype given alleles mode in HaplotypeCaller 2012-02-14 14:01:04 -05:00
David Roazen 8f7587048c Update the expected novel TiTv in the HybridSelectionPipelineTest
The expected novel TiTv has changed for this set of variants now that
multi-allelic mode is on by default.
2012-02-13 20:25:52 -05:00
David Roazen dfcdf92afa Revert "Disable HaplotypeCaller integration tests in Stable"
These tests should remain enabled in Unstable.

This reverts commit 15c5b7aee1327f9dc012d2168f127a4700fe5064.
2012-02-13 16:37:31 -05:00
David Roazen 85d31f80a2 Merged bug fix from Stable into Unstable 2012-02-13 16:37:11 -05:00
David Roazen d5fce22d78 Disable HaplotypeCaller integration tests in Stable
These tests use out-of-date files that no longer exist, and only
need to be enabled in Unstable for now.
2012-02-13 16:28:19 -05:00
David Roazen 03e5184741 Fix serious engine bug that could cause reads to be dropped under certain circumstances
When aggregating raw BAM file spans into shards, the IntervalSharder tries to combine
file spans when it can. Unfortunately, the method that combines two BAM file
spans was seriously flawed, and would produce a truncated union if the file spans
overlapped in certain ways. This could cause entire regions of the BAM file containing
reads within the requested intervals to be dropped.

Modified GATKBAMFileSpan.union() to correct this problem, and added unit tests
to verify that the correct union is produced regardless of how the file spans
happen to overlap.

Thanks to Khalid, who did at least as much work on this bug as I did.
2012-02-13 16:25:21 -05:00
Ryan Poplin 8742f5e36c Updating BQSR scala script to take any number of known sites files and to use the scatter count input argument. 2012-02-13 15:44:30 -05:00
Eric Banks ad90af94ed Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-13 15:10:10 -05:00
Eric Banks 0920a1921e Minor fixes to splitting multi-allelic records (as regards printing indel alleles correctly); minor code refactoring; adding integration tests to cover +/- splitting multi-allelics. 2012-02-13 15:09:53 -05:00
Eric Banks 14981bed10 Cleaning up VariantsToTable: added docs for supported fields; removed one-off hidden arguments for multi-allelics; default behavior is now to include multi-allelics in one record; added option to split multi-allelics into separate records. 2012-02-13 14:32:03 -05:00
Ryan Poplin e9338e2c20 Context covariate needs to look in the reverse direction for negative stranded reads. 2012-02-13 13:40:41 -05:00
Ryan Poplin 41ffd08d53 On the fly base quality score recalibration now happens up front in a SAMIterator on input instead of in a lazy-loading fashion if the BQSR table is provided as an engine argument. On the fly recalibration is now completely hooked up and live. 2012-02-13 12:35:09 -05:00
Eric Banks c8c06c7753 Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-02-11 23:02:19 -05:00
Eric Banks ac9250b12b Don't assume chrom20, just pull from the file list 2012-02-11 23:02:05 -05:00
Ryan Poplin 3caa1b83bb Updating HC integration tests 2012-02-11 11:48:32 -05:00
Ryan Poplin 9b8fd4c2ff Updating the half of the code that makes use of the recalibration information to work with the new refactoring of the bqsr. Reverting the covariate interface change in the original bqsr because the error model enum was moved to a different class and didn't make sense any more. 2012-02-11 10:57:20 -05:00
Eric Banks f52f1f659f Multiallelic implementation of the TDT should be a pairwise list of values as per Mark Daly. Integration tests change because the count in the header is now A instead of 1. 2012-02-10 14:15:59 -05:00
Mauricio Carneiro f1990981fc A little BQSR scala script to use with scatter/gather 2012-02-10 14:00:53 -05:00
Mauricio Carneiro a7c6f255e9 Adding the old gatherer to BQSR
for now, the old gatherer will still work for us to scatter/gather our tests.
2012-02-10 13:33:57 -05:00
Mauricio Carneiro 1fb19a0f98 Moving the covariates and shared functionality to public
so Ryan can work on the recalibration on the fly without breaking the build. Supposedly all the secret sauce is in the BQSR walker, which sits in private.
2012-02-10 11:44:01 -05:00
Mark DePristo 48cc4b913a bugfix for incremental refresh in gsafolkLSFlogs 2012-02-10 11:30:51 -05:00