Mark DePristo
c8a06e53c1
DoC now properly handles reference N bases + misc. additional cleanups
...
-- DoC now by default ignores bases with reference Ns, so these are not included in the coverage calculations at any stage.
-- Added option --includeRefNSites that will include them in the calculation
-- Added integration tests that ensures the per base tables (and so all subsequent calculations) work with and without reference N bases included
-- Reorganized command line options, tagging advanced options with @Advanced
2012-02-25 11:32:50 -05:00
Mark DePristo
50de1a3eab
Fixing bad VCFIntegration tests
...
-- Left disabled a test that should have been enabled
-- Didn't add the md5 to the test I actually added
-- Now VCFIntegrationTests should be working!
2012-02-25 11:26:36 -05:00
Mark DePristo
9bad51877e
Generalized gsafolkLSFLogs.py to gsafolkLogsForTableau.py
...
-- Now updates both LSF logs and filesystem sizes
-- New Tableau emails will include both LSF and FS info!
2012-02-24 15:58:24 -05:00
Mark DePristo
80b5c7ad21
Fix gitVersionNumbers script to not print git status messages to our file
2012-02-24 15:58:22 -05:00
Mark DePristo
747e1a728f
Script to recreate entire GATKLog db from scratch
...
Useful primarily as a reference. Sometimes necessary when low-level changes are made to the scripts, requiring all of the data to be reprocessed
2012-02-24 15:58:21 -05:00
Mark DePristo
e94a534076
Added dry run and verbose options to gsafolkLSFLogs
2012-02-24 15:58:20 -05:00
Mark DePristo
253bb46bcd
Add support to analyzeRunReports to tag xml logs with git version numbers
2012-02-24 15:58:19 -05:00
Mauricio Carneiro
470375db58
added integration test for the ReduceReadsStash bug reported by Adam
2012-02-23 18:59:27 -05:00
Mauricio Carneiro
ee9a56ad27
Fix subtle bug in the ReduceReads stash reported by Adam
...
* The tailSet generated every time we flush the reads stash is still being affected by subsequent clears because it is just a pointer to the parent element in the original TreeSet. This is dangerous, and there is a weird condition where the clear will affects it.
* Fix by creating a new set, given the tailSet instead of trying to do magic with just the pointer.
2012-02-23 18:35:25 -05:00
Mark DePristo
e0c189909f
Added support for breakpoint alleles
...
-- See https://getsatisfaction.com/gsa/topics/support_vcf_4_1_structural_variation_breakend_alleles?utm_content=topic_link&utm_medium=email&utm_source=new_topic
-- Added integrationtest to ensure that we can parse and write out breakpoint example
2012-02-23 12:14:48 -05:00
Menachem Fromer
522ace6d57
CNV discovery is also a long-running job (depending on the number of samples)
2012-02-23 11:28:22 -05:00
Ryan Poplin
2b6c0939ab
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-22 19:00:38 -05:00
Ryan Poplin
8695738400
Bug fix in HaplotypeCaller's GENOTYPE_GIVEN_ALLELES mode for insertions greater than length 1. The allele being genotyped was off by one base pair.
2012-02-22 19:00:04 -05:00
Christopher Hartl
2c1b14d35e
Mostly small changes to my own scala scripts: .vcf.gz compatibility for output files, smarter beagle generation, simple script to scatter-gather combine variants. Whole genome indel calling now uses the gold standard indel set.
2012-02-22 17:20:04 -05:00
Christopher Hartl
9b61a398b3
Merge branch 'master' of ssh://ni.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable
2012-02-22 17:18:10 -05:00
Ryan Poplin
ca7b5e068f
updating HaplotypeCaller integration tests after change to separate insertion and deletion GOP.
2012-02-22 15:23:24 -05:00
Ryan Poplin
e39638323b
Misc cleanup in HaplotypeCaller's HMM code now that we have separate GOP for insertions and deletions
2012-02-22 12:24:43 -05:00
Ryan Poplin
a611f86558
CalibrateGenotypeLikelihoods now accepts any number of external likelihood VCFs. We decided in the dev group to have the assigned name be a combination of the sample name provided in the VCF and the name provided to the rod binding.
2012-02-22 12:23:45 -05:00
Mauricio Carneiro
75783af6fc
int <-> BitSet conversion utils for MathUtils
...
* added unit tests.
2012-02-21 14:10:36 -05:00
Christopher Hartl
685bcaced2
Merge branch 'master' of ssh://ni.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable
2012-02-21 13:53:37 -05:00
Guillermo del Angel
0f5674b95e
Redid fix for corner case when forming consensus with reads that start/end with insertions and that don't agree with each other in inserted bases: since I can't iterate over the elements of a HashMap because keys might change during iteration, and since I can't use ConcurrentHashMaps, the code now copies structure of (bases, number of times seen) into ArrayList, which can be addressed by element index in order to iterate on it.
2012-02-20 09:12:51 -05:00
Ryan Poplin
fe102a5d47
Fix for my renaming of the BQSR walker
2012-02-18 11:13:20 -05:00
Ryan Poplin
3d9eee4942
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-18 10:55:29 -05:00
Ryan Poplin
a8be96f63d
This caching in the BQSR seems to be too slow now that there are so many keys
2012-02-18 10:54:39 -05:00
Ryan Poplin
78718b8d6a
Adding Genotype Given Alleles mode to the HaplotypeCaller. It constructs the possible haplotypes via assembly and then injects the desired allele to be genotyped.
2012-02-18 10:31:26 -05:00
Guillermo del Angel
e724c63f2b
Reverting last commit until I learn how to effectively replicate and debug pipeline test failures, and until I also learn how to effectively remove a kep from a HashMap that's being iterated on
2012-02-17 17:18:43 -05:00
Guillermo del Angel
f2ef8d1d23
Reverting last commit until I learn how to effectively replicate and debug pipeline test failures, and until I also learn how to effectively remove a kep from a HashMap that's being iterated on
2012-02-17 17:15:53 -05:00
Guillermo del Angel
3e031a540f
Solve merge conflict
2012-02-17 10:56:03 -05:00
Guillermo del Angel
cd352f502d
Corner case bug fix: if a read starts with an insertion, when computing the consensus allele for calling the insertion was only added to the last element in the consensus key hash map. Now, an insertion that partially overlaps with several candidate alleles will have their respective count increased for all of them
2012-02-17 10:21:37 -05:00
Guillermo del Angel
2f08846d82
Merged bug fix from Stable into Unstable
2012-02-14 21:26:25 -05:00
Guillermo del Angel
7dc6f73399
Bug fix for validation site selector: records with AC=0 in them were always being thrown out if input vcf was sites-only, even when -ignorePolymorphicStatus flag was set
2012-02-14 21:11:24 -05:00
Ryan Poplin
30085781cf
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-14 14:01:20 -05:00
Ryan Poplin
ae5b42c884
Put base insertion and base deletions in the SAMRecord as a string of quality scores instead of an array of bytes. Start of a proper genotype given alleles mode in HaplotypeCaller
2012-02-14 14:01:04 -05:00
David Roazen
8f7587048c
Update the expected novel TiTv in the HybridSelectionPipelineTest
...
The expected novel TiTv has changed for this set of variants now that
multi-allelic mode is on by default.
2012-02-13 20:25:52 -05:00
David Roazen
dfcdf92afa
Revert "Disable HaplotypeCaller integration tests in Stable"
...
These tests should remain enabled in Unstable.
This reverts commit 15c5b7aee1327f9dc012d2168f127a4700fe5064.
2012-02-13 16:37:31 -05:00
David Roazen
85d31f80a2
Merged bug fix from Stable into Unstable
2012-02-13 16:37:11 -05:00
David Roazen
d5fce22d78
Disable HaplotypeCaller integration tests in Stable
...
These tests use out-of-date files that no longer exist, and only
need to be enabled in Unstable for now.
2012-02-13 16:28:19 -05:00
David Roazen
03e5184741
Fix serious engine bug that could cause reads to be dropped under certain circumstances
...
When aggregating raw BAM file spans into shards, the IntervalSharder tries to combine
file spans when it can. Unfortunately, the method that combines two BAM file
spans was seriously flawed, and would produce a truncated union if the file spans
overlapped in certain ways. This could cause entire regions of the BAM file containing
reads within the requested intervals to be dropped.
Modified GATKBAMFileSpan.union() to correct this problem, and added unit tests
to verify that the correct union is produced regardless of how the file spans
happen to overlap.
Thanks to Khalid, who did at least as much work on this bug as I did.
2012-02-13 16:25:21 -05:00
Ryan Poplin
8742f5e36c
Updating BQSR scala script to take any number of known sites files and to use the scatter count input argument.
2012-02-13 15:44:30 -05:00
Eric Banks
ad90af94ed
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-13 15:10:10 -05:00
Eric Banks
0920a1921e
Minor fixes to splitting multi-allelic records (as regards printing indel alleles correctly); minor code refactoring; adding integration tests to cover +/- splitting multi-allelics.
2012-02-13 15:09:53 -05:00
Eric Banks
14981bed10
Cleaning up VariantsToTable: added docs for supported fields; removed one-off hidden arguments for multi-allelics; default behavior is now to include multi-allelics in one record; added option to split multi-allelics into separate records.
2012-02-13 14:32:03 -05:00
Ryan Poplin
e9338e2c20
Context covariate needs to look in the reverse direction for negative stranded reads.
2012-02-13 13:40:41 -05:00
Ryan Poplin
41ffd08d53
On the fly base quality score recalibration now happens up front in a SAMIterator on input instead of in a lazy-loading fashion if the BQSR table is provided as an engine argument. On the fly recalibration is now completely hooked up and live.
2012-02-13 12:35:09 -05:00
Eric Banks
c8c06c7753
Merge branch 'master' of ssh://gsa1.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-02-11 23:02:19 -05:00
Eric Banks
ac9250b12b
Don't assume chrom20, just pull from the file list
2012-02-11 23:02:05 -05:00
Ryan Poplin
3caa1b83bb
Updating HC integration tests
2012-02-11 11:48:32 -05:00
Ryan Poplin
9b8fd4c2ff
Updating the half of the code that makes use of the recalibration information to work with the new refactoring of the bqsr. Reverting the covariate interface change in the original bqsr because the error model enum was moved to a different class and didn't make sense any more.
2012-02-11 10:57:20 -05:00
Eric Banks
f52f1f659f
Multiallelic implementation of the TDT should be a pairwise list of values as per Mark Daly. Integration tests change because the count in the header is now A instead of 1.
2012-02-10 14:15:59 -05:00
Mauricio Carneiro
f1990981fc
A little BQSR scala script to use with scatter/gather
2012-02-10 14:00:53 -05:00