Christopher Hartl
98f8431b07
Right. Forgot the = true. If only there were some way to silently commit this OH WAIT
2012-01-19 12:36:30 -05:00
Christopher Hartl
7f3ad25b01
Adding a mode to VariantFiltration to invalidate previously-applied filters to allow complete re-filtering of a VCF.
...
T2D VQSR: re-calling now done with appropriate quality settings and using BAQ.
2012-01-19 10:54:48 -05:00
Ryan Poplin
ecdd07b748
updating HaplotypeCaller integration test
2012-01-19 09:31:22 -05:00
Ryan Poplin
7e082c7750
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-19 09:11:23 -05:00
Christopher Hartl
d1c8c38541
A QScript to generate a VQSR of union sites for T2D, using a broad set and a union site set as input.
2012-01-19 02:04:04 -05:00
Christopher Hartl
39e6df5aa9
Fix edge case for very small VCFs
2012-01-19 00:51:28 -05:00
Christopher Hartl
1e037a0ecf
Ensure second-to-last line printed
2012-01-19 00:33:08 -05:00
Christopher Hartl
9946853039
Remove duplicated line
2012-01-19 00:25:22 -05:00
Christopher Hartl
cf9b1d350a
Some minor changes to in-process functions that nobody else uses. CGL now properly ignores no-calls for external VCFs.
2012-01-19 00:20:49 -05:00
Eric Banks
ab8f499bc3
Annotate with FS even for filtered sites
2012-01-18 22:04:51 -05:00
Mauricio Carneiro
b0b0cd9aef
Conforming to the guru's recommendation on library usage ;-)
...
thanks Khalid.
2012-01-18 21:19:16 -05:00
Guillermo del Angel
b123416c4c
Resolve stale merge changes
2012-01-18 20:56:36 -05:00
Guillermo del Angel
2eb45340e1
Initial, raw, mostly untested version of new pool caller that also does allele discovery. Still needs debugging/refining. Main modification is that there is a new operation mode, set by argument -ALLELE_DISCOVERY_MODE, which if true will determine optimal alt allele at each computable site and will compute AC distribution on it. Current implementation is not working yet if there's more than one pool and it will only output biallelic sites, no functionality for true multi-allelics yet
2012-01-18 20:54:10 -05:00
Ryan Poplin
0133d1a901
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-18 09:53:42 -05:00
Ryan Poplin
0268da7560
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-18 09:53:00 -05:00
Ryan Poplin
60024e0d7b
updating TDT integration test
2012-01-18 09:52:50 -05:00
David Roazen
b7c65cb089
Merged bug fix from Stable into Unstable
2012-01-18 09:52:47 -05:00
Ryan Poplin
11982b5a34
We no longer calculate the population-level TDT statistic if there are fewer than 5 trios with full genotype likelihood information. When there is a high degree of missingness the results are skewed or in the worst case come out as NaN.
2012-01-18 09:42:41 -05:00
Mark DePristo
ca11f68303
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-18 08:29:03 -05:00
Mark DePristo
9e77facda5
More analyses for random forest test script forest.R
2012-01-18 08:28:47 -05:00
Mark DePristo
5bd1a45879
Usability improvements to analyzeRunReports
...
-- Print out the name / db of SQL server, not a python connection object
-- Print out the ID, not a python objects, of XML record that fails to convert
2012-01-18 08:27:15 -05:00
Mark DePristo
b52db51599
Don't try to write log to a non-existant file
2012-01-18 08:26:49 -05:00
Mark DePristo
763c81d520
No longer enforce MAX_ALLELE_SIZE in VCF codec
...
-- Instead issue a warning when a large (>1MB) record is encountered
-- Optimized ref.getBytes()[i] => (byte)ref.charAt(i), which avoids an implicit O(n) allocation each iteration through computeReverseClipping()
2012-01-18 07:35:11 -05:00
Mark DePristo
0c7865fdb5
UnitTest for reverseAlleleClipping
...
-- No code modified yet, just implementing a unit test to ensure correctness of the existing code
2012-01-18 07:35:11 -05:00
David Roazen
d5199db8ec
Be explicit about setting the snpEff -onlyCoding option in the pipeline
...
When run without an explicit -onlyCoding option, as we've been doing up to
now, snpEff automatically sets -onlyCoding to "true" provided that there is
at least one transcript marked as "protein_coding", which will always be the
case for us in practice (and indeed, all pipeline runs so far with snpEff
2.0.5 have run with -onlyCoding auto-set to "true").
However, given the disastrous effect on annotation quality setting
"-onlyCoding false" has, we wish to be explicit with this option
rather than relying on snpEff's auto-detection logic.
2012-01-17 20:04:27 -05:00
Christopher Hartl
9770250b72
Fix for Amy W - evidently binding defaults are not null but an unbound object, which caused the improper branch to be entered into.
2012-01-17 17:28:58 -05:00
Mark DePristo
b0560f9440
Rev. tribble to fix BED codec bug in tribble 51
2012-01-17 16:40:26 -05:00
Mark DePristo
62801e430a
Bugfix for unnecessary optimization
...
-- don't cache the ref bytes
2012-01-17 16:40:26 -05:00
Mark DePristo
f2b0575dee
Detect unreasonably large allele strings (>2^16) and throw an error
...
-- samtools can emit alleles where the ref is 42M Ns and this caused the GATK (via tribble) to hang in several places.
-- Tribble was updated so we actually could read the line properly (rev. to 51 here).
-- Still the parsing algorithms in the GATK aren't happy with such a long allele. Instead of optimizing the code around an improper use case I put in a limit of 2^16 bp for any allele, and throw a meaningful exception when encountered.
2012-01-17 16:40:26 -05:00
Menachem Fromer
816dcf9616
Finally got around to adding support for Eric's fix to permit annotation exclusion by VariantAnnotator
2012-01-17 16:35:16 -05:00
Ryan Poplin
8b0ddf0aaf
Adding notes to CountCovariates docs about using interval lists as database of known variation
2012-01-17 16:13:13 -05:00
Mauricio Carneiro
ff2fc514ae
Updated plots to CGL walker
...
a few updates on the CalibrateGenotypeLikelihoods walker output
* Fixed ggplot2 issue with dataset with poor coverage
* Added jitter as default geometry
* Dropped the cut by technology from the graphs
2012-01-17 15:14:47 -05:00
Ryan Poplin
56761297dd
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-17 15:03:32 -05:00
Ryan Poplin
75f87db468
Replacing Mills file with new gold standard indel set in the resource bundle for release with v1.5
2012-01-17 15:02:45 -05:00
Matt Hanna
40ebc17437
Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-17 14:49:17 -05:00
Matt Hanna
41d70abe4e
At chartl's request, add the bwa aln -N and bwa aln -m parameters to the bindings.
2012-01-17 14:47:53 -05:00
Mark DePristo
2390449f0f
Local and S3 archiving scripts now push data to MySQL as well
2012-01-17 14:42:48 -05:00
Ryan Poplin
ae259f81cc
Bug fixing for merging of read fragments when one fragment contained an indel
2012-01-17 14:39:27 -05:00
Menachem Fromer
80a1ae254b
Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-17 14:25:40 -05:00
Menachem Fromer
284a8e9ddc
Fixed to match recent minor updates by Khalid and Eric
2012-01-17 14:24:41 -05:00
Christopher Hartl
cde224746f
Bait Redesign supports baits that overlap, by picking only the start of intervals.
...
CalibrateGenotypeLikelihoods supports using an external VCF as input for genotype likelihoods. Currently can be a per-sample VCF, but has un-implemented methods for allowing a read-group VCF to be used.
Removed the old constrained genotyping code from UGE -- the trellis calculated is exactly the same as that done in the MLE AC estimate; so we should just re-use that one.
2012-01-17 13:51:05 -05:00
Ryan Poplin
8e23c98dd9
Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable
2012-01-17 13:46:28 -05:00
Matt Hanna
32ccde374b
Merged bug fix from Stable into Unstable
2012-01-17 11:08:35 -05:00
Matt Hanna
3ba918aff1
Error message cleanup in BAM indexing code.
2012-01-17 11:05:42 -05:00
Mark DePristo
aa8a885a5b
Generalizing forest.R analysis script
...
-- Support for N tree analyses
-- Testing of NA omit and roughfix options
-- Misc. analyses and refactoring
2012-01-16 09:33:41 -05:00
Mark DePristo
8ddac9a06f
Don't show individual jobs in queueStatus for gsaadm, just count
2012-01-16 09:33:05 -05:00
Mark DePristo
61f82f138f
Extract a high-level GATK version from the SVN / GIT full version numbers in analyzeRunReports
...
-- Maps SVN versions 1.0.5988 for example to 0.5, 1.0.6134 to 0.6, etc
-- Maps GIT versions 1.x-XXX to 1.x
Used in tableau analyses
2012-01-16 09:30:48 -05:00
Mauricio Carneiro
8272c8bd26
Added exceptions to CGL walker
...
* Assert that a user provided a VCF not some other type of ROD
* Assert that the VCF has samples
* Assert that the samples in the BAM exist in the VCF
* Warn the user if not all samples in the BAM are present in the VCF
2012-01-14 14:10:19 -05:00
Mauricio Carneiro
cec7107762
Better location for the downsampling of reads in PrintReads
...
* using the filter() instead of map() makes for a cleaner walker.
* renaming the unit tests to make more sense with the other unit and integration tests
2012-01-14 14:06:09 -05:00
Mauricio Carneiro
3a9d9789ae
Removing old scripts for genotype accuracy
2012-01-13 16:57:05 -05:00