Commit Graph

8748 Commits (45da892ecc9d5da28dd28e7c9d6de1b8aab8cb9d)

Author SHA1 Message Date
Christopher Hartl 15c0c294c1 Adding in this walker to try to debug the 0-byte ref bases 2012-01-23 14:51:24 -05:00
Mark DePristo 02450e4b12 Merged bug fix from Stable into Unstable 2012-01-23 12:08:39 -05:00
Christopher Hartl 798596257b Enable the Genotype Phasing Evaluator. Because it didn't have the same argument structure as the base class, update2 of VariantEvaluator was being called, rather than update2 of the actual module. 2012-01-23 10:50:16 -05:00
Mark DePristo 80a4ce0edf Bugfix for incorrect error messages for missing BAMs and VCFs
-- Missing BAMs were appearing as StingExceptions
-- Missing VCFs were showing up as CommandLineErrors, but it's clearer for them to be CouldNotReadInputFile exceptions
-- Added integration tests to ensure missing BAMs, VCFs, and -L files are properly thrown as CouldNotReadInputFile exceptions
-- Added path to standard b37 BAM to BaseTest
-- Cleaned up code in SAMDataSource, removing my parallel loading code as this just didn't prove to be useful.
2012-01-23 09:52:07 -05:00
Guillermo del Angel 31d2f04368 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-23 09:23:03 -05:00
Guillermo del Angel 966387ca0b Next intermediate commit in the pool caller. Lots of bug fixes and now we can emit true vcf's with calls in discovery mode (still of unknown quality) - old validation mode is temporarily broken,will be fixed in next refactoring. 2012-01-23 09:22:31 -05:00
Christopher Hartl 4a08e8ca6e Minor tweaks to T2D-related qscripts. Replacing old md5s from the BeagleIntegrationTest. All differences boiled down either to the accounting of genotypes changed (./. --> 0/0 is no longer a "changed" genotype, and original genotypes that were ./. are represented as OG=. rather than OG=./. .)
This is somewhat of an arbitrary decision, and is negotiable. I could see treating

GT:PL   ./.:.

differently from

GT:PL   .:0,3,6

but am not sure the worth of doing so.
2012-01-23 08:25:34 -05:00
Ryan Poplin 4d6312d4ea HaplotypeCaller is now an ActiveRegionWalker. 2012-01-22 14:31:01 -05:00
Christopher Hartl 3b1aad4f17 After a minor and abject freakout, alter the T2D script to seek out truth sensitivities between 80 and 100, rather than between 0.8 and 1. Also, don't consider a genotype "changed by beagle" if the initial genotype is a no-call. 2012-01-20 23:43:51 -05:00
Christopher Hartl 9b4f6afa21 Alterations to scripts for better performance. Grid search now expands the sens/spec tradeoff (90 was far too aggressive against hapmap chr20), and 20 max gaussians was too many, and caused errors. For consensus genotypes: remember to gunzip the beagle outputs before converting to VCF. Also, beagle can in fact create 'null' alleles in certain circumstances. I'm not sure what exactly those circumstances are, but those sites should be ignored. When it does, all alleles apear to be set to null, so this should not affect the actual phasing in the output VCF. 2012-01-20 23:07:59 -05:00
Christopher Hartl f3564bbf43 Ugh. Darn intelliJ not telling me I was missing an import statement. 2012-01-20 13:25:11 -05:00
Christopher Hartl b902d778ca . 2012-01-20 13:22:46 -05:00
Christopher Hartl 7c6a9471e8 After ensuring MultiplyLikelihoods does what I want it to do, add a quick and simple integration test to ensure I don't break it. 2012-01-20 13:20:13 -05:00
Christopher Hartl e245cde47f A new beagle script for generating a reference panel from lowpass, exome, and chip data. This is for T2D, but potentially useful. 2012-01-20 12:48:32 -05:00
Christopher Hartl a91dd5d137 Merge branch 'master' of ssh://tin.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable 2012-01-20 12:45:16 -05:00
Christopher Hartl 3fe73f155c Merge branch 'master' of ssh://chartl@tin.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-20 12:44:22 -05:00
Ryan Poplin 4b18786b5d Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-19 22:05:20 -05:00
Ryan Poplin ace9333068 Active region walkers can now see the reads in a buffer around thier active reigons. This buffer size is specified as a walker annotation. Intervals are internally extended by this buffer size so that the extra reads make their way through the traversal engine but the walker author only needs to see the original interval. Also, several corner case bug fixes in active region traversal. 2012-01-19 22:05:08 -05:00
Christopher Hartl cd38110b7b GQs are not always purged with this method of modifying attributes. To drop them, create the Genotype anew. 2012-01-19 20:11:20 -05:00
Christopher Hartl b9f7103d09 Fix edge case where DP annotations (format) were creeping in 2012-01-19 19:41:43 -05:00
Christopher Hartl 72cd0a2450 And do it conditional on having likelihoods in the first place 2012-01-19 18:52:06 -05:00
Christopher Hartl ed5302667b Oops. Let's actually retain the genotype likelihoods. 2012-01-19 18:44:39 -05:00
Christopher Hartl 0644b75089 Remove attribute data from VariantContext and genotypes. 2012-01-19 18:30:32 -05:00
Menachem Fromer fda29ebcbd Merge branch 'master' of ssh://copper.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-19 18:22:04 -05:00
Menachem Fromer 253d6483e1 Updated Batch-merge to retain ALL sites in input (SNPs, indels, regardless of their filtering status), and also optionally go back to the BAMs to perform VariantAnnotation 2012-01-19 18:21:22 -05:00
Menachem Fromer 066da80a3d Added KEEP_UNCONDTIONAL option which permits even sites with only filtered records to be included as unfiltered sites in the output 2012-01-19 18:19:58 -05:00
Christopher Hartl 6e30d715cf Minor changes to T2D VQSR. Adding in a small walker for multiplying likelihoods for generation of a consensus panel. 2012-01-19 18:00:07 -05:00
Aaron McKenna ced6775de3 Changes to allow for external tests
Changes to the build script that allow the external directory to have tests.
This means groups like CGA don't have to reinvent the wheel on testing, and
can instead use the GATKs unit and integration tests.

Signed-off-by: David Roazen <droazen@broadinstitute.org>
2012-01-19 13:04:24 -05:00
Christopher Hartl 98f8431b07 Right. Forgot the = true. If only there were some way to silently commit this OH WAIT 2012-01-19 12:36:30 -05:00
Christopher Hartl 7f3ad25b01 Adding a mode to VariantFiltration to invalidate previously-applied filters to allow complete re-filtering of a VCF.
T2D VQSR: re-calling now done with appropriate quality settings and using BAQ.
2012-01-19 10:54:48 -05:00
Ryan Poplin ecdd07b748 updating HaplotypeCaller integration test 2012-01-19 09:31:22 -05:00
Ryan Poplin 7e082c7750 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-19 09:11:23 -05:00
Christopher Hartl d1c8c38541 A QScript to generate a VQSR of union sites for T2D, using a broad set and a union site set as input. 2012-01-19 02:04:04 -05:00
Christopher Hartl 39e6df5aa9 Fix edge case for very small VCFs 2012-01-19 00:51:28 -05:00
Christopher Hartl 1e037a0ecf Ensure second-to-last line printed 2012-01-19 00:33:08 -05:00
Christopher Hartl 9946853039 Remove duplicated line 2012-01-19 00:25:22 -05:00
Christopher Hartl cf9b1d350a Some minor changes to in-process functions that nobody else uses. CGL now properly ignores no-calls for external VCFs. 2012-01-19 00:20:49 -05:00
Eric Banks ab8f499bc3 Annotate with FS even for filtered sites 2012-01-18 22:04:51 -05:00
Mauricio Carneiro b0b0cd9aef Conforming to the guru's recommendation on library usage ;-)
thanks Khalid.
2012-01-18 21:19:16 -05:00
Guillermo del Angel b123416c4c Resolve stale merge changes 2012-01-18 20:56:36 -05:00
Guillermo del Angel 2eb45340e1 Initial, raw, mostly untested version of new pool caller that also does allele discovery. Still needs debugging/refining. Main modification is that there is a new operation mode, set by argument -ALLELE_DISCOVERY_MODE, which if true will determine optimal alt allele at each computable site and will compute AC distribution on it. Current implementation is not working yet if there's more than one pool and it will only output biallelic sites, no functionality for true multi-allelics yet 2012-01-18 20:54:10 -05:00
Ryan Poplin 0133d1a901 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-18 09:53:42 -05:00
Ryan Poplin 0268da7560 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-18 09:53:00 -05:00
Ryan Poplin 60024e0d7b updating TDT integration test 2012-01-18 09:52:50 -05:00
David Roazen b7c65cb089 Merged bug fix from Stable into Unstable 2012-01-18 09:52:47 -05:00
Ryan Poplin 11982b5a34 We no longer calculate the population-level TDT statistic if there are fewer than 5 trios with full genotype likelihood information. When there is a high degree of missingness the results are skewed or in the worst case come out as NaN. 2012-01-18 09:42:41 -05:00
Mark DePristo ca11f68303 Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-01-18 08:29:03 -05:00
Mark DePristo 9e77facda5 More analyses for random forest test script forest.R 2012-01-18 08:28:47 -05:00
Mark DePristo 5bd1a45879 Usability improvements to analyzeRunReports
-- Print out the name / db of SQL server, not a python connection object
-- Print out the ID, not a python objects, of XML record that fails to convert
2012-01-18 08:27:15 -05:00
Mark DePristo b52db51599 Don't try to write log to a non-existant file 2012-01-18 08:26:49 -05:00