Commit Graph

8620 Commits (98f8431b0774d8aa4d848c890a9e1f082a2a86da)

Author SHA1 Message Date
Christopher Hartl 98f8431b07 Right. Forgot the = true. If only there were some way to silently commit this OH WAIT 2012-01-19 12:36:30 -05:00
Christopher Hartl 7f3ad25b01 Adding a mode to VariantFiltration to invalidate previously-applied filters to allow complete re-filtering of a VCF.
T2D VQSR: re-calling now done with appropriate quality settings and using BAQ.
2012-01-19 10:54:48 -05:00
Ryan Poplin ecdd07b748 updating HaplotypeCaller integration test 2012-01-19 09:31:22 -05:00
Ryan Poplin 7e082c7750 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-19 09:11:23 -05:00
Christopher Hartl d1c8c38541 A QScript to generate a VQSR of union sites for T2D, using a broad set and a union site set as input. 2012-01-19 02:04:04 -05:00
Christopher Hartl 39e6df5aa9 Fix edge case for very small VCFs 2012-01-19 00:51:28 -05:00
Christopher Hartl 1e037a0ecf Ensure second-to-last line printed 2012-01-19 00:33:08 -05:00
Christopher Hartl 9946853039 Remove duplicated line 2012-01-19 00:25:22 -05:00
Christopher Hartl cf9b1d350a Some minor changes to in-process functions that nobody else uses. CGL now properly ignores no-calls for external VCFs. 2012-01-19 00:20:49 -05:00
Eric Banks ab8f499bc3 Annotate with FS even for filtered sites 2012-01-18 22:04:51 -05:00
Mauricio Carneiro b0b0cd9aef Conforming to the guru's recommendation on library usage ;-)
thanks Khalid.
2012-01-18 21:19:16 -05:00
Guillermo del Angel b123416c4c Resolve stale merge changes 2012-01-18 20:56:36 -05:00
Guillermo del Angel 2eb45340e1 Initial, raw, mostly untested version of new pool caller that also does allele discovery. Still needs debugging/refining. Main modification is that there is a new operation mode, set by argument -ALLELE_DISCOVERY_MODE, which if true will determine optimal alt allele at each computable site and will compute AC distribution on it. Current implementation is not working yet if there's more than one pool and it will only output biallelic sites, no functionality for true multi-allelics yet 2012-01-18 20:54:10 -05:00
Ryan Poplin 0133d1a901 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-18 09:53:42 -05:00
Ryan Poplin 0268da7560 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-18 09:53:00 -05:00
Ryan Poplin 60024e0d7b updating TDT integration test 2012-01-18 09:52:50 -05:00
David Roazen b7c65cb089 Merged bug fix from Stable into Unstable 2012-01-18 09:52:47 -05:00
Ryan Poplin 11982b5a34 We no longer calculate the population-level TDT statistic if there are fewer than 5 trios with full genotype likelihood information. When there is a high degree of missingness the results are skewed or in the worst case come out as NaN. 2012-01-18 09:42:41 -05:00
Mark DePristo ca11f68303 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-18 08:29:03 -05:00
Mark DePristo 9e77facda5 More analyses for random forest test script forest.R 2012-01-18 08:28:47 -05:00
Mark DePristo 5bd1a45879 Usability improvements to analyzeRunReports
-- Print out the name / db of SQL server, not a python connection object
-- Print out the ID, not a python objects, of XML record that fails to convert
2012-01-18 08:27:15 -05:00
Mark DePristo b52db51599 Don't try to write log to a non-existant file 2012-01-18 08:26:49 -05:00
Mark DePristo 763c81d520 No longer enforce MAX_ALLELE_SIZE in VCF codec
-- Instead issue a warning when a large (>1MB) record is encountered
-- Optimized ref.getBytes()[i] => (byte)ref.charAt(i), which avoids an implicit O(n) allocation each iteration through computeReverseClipping()
2012-01-18 07:35:11 -05:00
Mark DePristo 0c7865fdb5 UnitTest for reverseAlleleClipping
-- No code modified yet, just implementing a unit test to ensure correctness of the existing code
2012-01-18 07:35:11 -05:00
David Roazen d5199db8ec Be explicit about setting the snpEff -onlyCoding option in the pipeline
When run without an explicit -onlyCoding option, as we've been doing up to
now, snpEff automatically sets -onlyCoding to "true" provided that there is
at least one transcript marked as "protein_coding", which will always be the
case for us in practice (and indeed, all pipeline runs so far with snpEff
2.0.5 have run with -onlyCoding auto-set to "true").

However, given the disastrous effect on annotation quality setting
"-onlyCoding false" has, we wish to be explicit with this option
rather than relying on snpEff's auto-detection logic.
2012-01-17 20:04:27 -05:00
Christopher Hartl 9770250b72 Fix for Amy W - evidently binding defaults are not null but an unbound object, which caused the improper branch to be entered into. 2012-01-17 17:28:58 -05:00
Mark DePristo b0560f9440 Rev. tribble to fix BED codec bug in tribble 51 2012-01-17 16:40:26 -05:00
Mark DePristo 62801e430a Bugfix for unnecessary optimization
-- don't cache the ref bytes
2012-01-17 16:40:26 -05:00
Mark DePristo f2b0575dee Detect unreasonably large allele strings (>2^16) and throw an error
-- samtools can emit alleles where the ref is 42M Ns and this caused the GATK (via tribble) to hang in several places.
-- Tribble was updated so we actually could read the line properly (rev. to 51 here).
-- Still the parsing algorithms in the GATK aren't happy with such a long allele.  Instead of optimizing the code around an improper use case I put in a limit of 2^16 bp for any allele, and throw a meaningful exception when encountered.
2012-01-17 16:40:26 -05:00
Menachem Fromer 816dcf9616 Finally got around to adding support for Eric's fix to permit annotation exclusion by VariantAnnotator 2012-01-17 16:35:16 -05:00
Ryan Poplin 8b0ddf0aaf Adding notes to CountCovariates docs about using interval lists as database of known variation 2012-01-17 16:13:13 -05:00
Mauricio Carneiro ff2fc514ae Updated plots to CGL walker
a few updates on the CalibrateGenotypeLikelihoods walker output

   * Fixed ggplot2 issue with dataset with poor coverage
   * Added jitter as default geometry
   * Dropped the cut by technology from the graphs
2012-01-17 15:14:47 -05:00
Ryan Poplin 56761297dd Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-17 15:03:32 -05:00
Ryan Poplin 75f87db468 Replacing Mills file with new gold standard indel set in the resource bundle for release with v1.5 2012-01-17 15:02:45 -05:00
Matt Hanna 40ebc17437 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-17 14:49:17 -05:00
Matt Hanna 41d70abe4e At chartl's request, add the bwa aln -N and bwa aln -m parameters to the bindings. 2012-01-17 14:47:53 -05:00
Mark DePristo 2390449f0f Local and S3 archiving scripts now push data to MySQL as well 2012-01-17 14:42:48 -05:00
Ryan Poplin ae259f81cc Bug fixing for merging of read fragments when one fragment contained an indel 2012-01-17 14:39:27 -05:00
Menachem Fromer 80a1ae254b Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-17 14:25:40 -05:00
Menachem Fromer 284a8e9ddc Fixed to match recent minor updates by Khalid and Eric 2012-01-17 14:24:41 -05:00
Christopher Hartl cde224746f Bait Redesign supports baits that overlap, by picking only the start of intervals.
CalibrateGenotypeLikelihoods supports using an external VCF as input for genotype likelihoods. Currently can be a per-sample VCF, but has un-implemented methods for allowing a read-group VCF to be used.

Removed the old constrained genotyping code from UGE -- the trellis calculated is exactly the same as that done in the MLE AC estimate; so we should just re-use that one.
2012-01-17 13:51:05 -05:00
Ryan Poplin 8e23c98dd9 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-17 13:46:28 -05:00
Matt Hanna 32ccde374b Merged bug fix from Stable into Unstable 2012-01-17 11:08:35 -05:00
Matt Hanna 3ba918aff1 Error message cleanup in BAM indexing code. 2012-01-17 11:05:42 -05:00
Mark DePristo aa8a885a5b Generalizing forest.R analysis script
-- Support for N tree analyses
-- Testing of NA omit and roughfix options
-- Misc. analyses and refactoring
2012-01-16 09:33:41 -05:00
Mark DePristo 8ddac9a06f Don't show individual jobs in queueStatus for gsaadm, just count 2012-01-16 09:33:05 -05:00
Mark DePristo 61f82f138f Extract a high-level GATK version from the SVN / GIT full version numbers in analyzeRunReports
-- Maps SVN versions 1.0.5988 for example to 0.5, 1.0.6134 to 0.6, etc
-- Maps GIT versions 1.x-XXX to 1.x

Used in tableau analyses
2012-01-16 09:30:48 -05:00
Mauricio Carneiro 8272c8bd26 Added exceptions to CGL walker
* Assert that a user provided a VCF not some other type of ROD
   * Assert that the VCF has samples
   * Assert that the samples in the BAM exist in the VCF
   * Warn the user if not all samples in the BAM are present in the VCF
2012-01-14 14:10:19 -05:00
Mauricio Carneiro cec7107762 Better location for the downsampling of reads in PrintReads
* using the filter() instead of map() makes for a cleaner walker.
   * renaming the unit tests to make more sense with the other unit and integration tests
2012-01-14 14:06:09 -05:00
Mauricio Carneiro 3a9d9789ae Removing old scripts for genotype accuracy 2012-01-13 16:57:05 -05:00