gatk-3.8

Commit Graph

Author	SHA1	Message	Date
Eric Banks	b6839b3049	Added checking in the GATK for mis-encoded quality scores. The check is performed by a Read Transformer that samples (currently set to once every 1000 reads so that we don't hurt overall GATK performance) from the input reads and checks to make sure that none of the base quals is too high (> Q60). If we encounter such a base then we fail with a User Error. * Can be over-ridden with --allow_potentially_misencoded_quality_scores. * Also, the user can choose to fix his quals on the fly (presumably using PrintReads to write out a fixed bam) with the --fix_misencoded_quality_scores argument. Added unit tests.	2012-12-03 11:18:41 -05:00
Eric Banks	6f523a1ea0	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-12-03 08:41:21 -05:00
Eric Banks	59fc7456cf	Updated expectations for novel TiTv in HSP after Mark's fixes to the exact model	2012-12-03 08:41:13 -05:00
Mark DePristo	f0a4710247	Callset summary now includes a table for the consensus itself	2012-12-02 16:40:12 -05:00
Mark DePristo	ce9a323c04	NA12878 knowledge base automatically filters duplicate records out in the SiteIterator -- Now it doesn't matter if there are duplicate records (all fields equal up to the date) in the knowledge base	2012-12-02 14:21:29 -05:00
Mark DePristo	1828d33a5a	Bugfix to AssessNA12878 -- Wasn't handling indel overlaps correctly in SiteIterator.getSitesBefore, causing it to incorrectly skip variants underlying indels (the getSitesBefore was considering both start and stop [not the correct behavior]) causing it to only get sites up to the first record whose stop overlapped the requested start.	2012-12-02 11:09:15 -05:00
Eric Banks	d7b951b6f3	Finished up my reviews for megabase chr20:10M-11M. Fixed out of order record from earlier.	2012-12-01 23:35:21 -05:00
Mark DePristo	2849889af5	Updating md5 for UG	2012-12-01 14:24:19 -05:00
depristo	3105f13df3	Merge pull request #4 from jsilter/master Remove validate, add note to put it back in when public gatk catches up	2012-11-30 13:24:44 -08:00
Mark DePristo	1100f0733b	Reviews for all unique omni poly sites on chr20 Updated setup script to includes these and ebanks reviews as well. Eric -- your file is currently not sorted, fyi	2012-11-30 16:23:27 -05:00
Jacob Silterra	02e98fa516	Remove validate, add note to put it back in when public gatk catches up	2012-11-30 16:08:00 -05:00
Mark DePristo	8020ba14db	Minor cleanup of SAMDataSource as part of my system review -- Changed a few function from public to protected, as they are only used by the package contents, to simplify the SAMDataSource interface	2012-11-30 15:04:41 -05:00
Mark DePristo	66bbe46e5b	MongoDBManager prints out meaningful information with toString	2012-11-30 15:04:41 -05:00
Mark DePristo	3248ca3f91	Validate MongoVariantContext on creation	2012-11-30 15:04:40 -05:00
Mark DePristo	79dbcc205c	Minor cleanup for working version of igv	2012-11-30 15:04:40 -05:00
Mark DePristo	6b6a14cc6d	Moving ConsensusSummarizer to its appropriate home in core of NA12878KB	2012-11-30 15:04:40 -05:00
Mauricio Carneiro	db2a045321	Useful walker to establish minimum depth necessary for confident calling of different types of variants	2012-11-30 00:42:05 -05:00
Mauricio Carneiro	fc7fab5f3b	Fixed ReadBackedPileup downsampling Downsampling in the PerSampleReadBackedPileup was broken, it didn't downsample anything, always returning a copy the original pileup.	2012-11-30 00:42:05 -05:00
Eric Banks	0e1287a843	Adding reviews for 1st 400kb of my target megabase (10-11) on chr20	2012-11-29 16:15:45 -05:00
Joel Thibault	97d29f203e	Add walltime changes to LSF - Check whether the specified attribute is available - Add pipeline test (disabled due to missing attribute)	2012-11-29 15:23:37 -05:00
Johan Dahlberg	daf6269b65	Setting the walltime Signed-off-by: Joel Thibault <thibault@broadinstitute.org>	2012-11-29 15:23:36 -05:00
Mark DePristo	f837e6ced7	Refactored entire NA12878KB to allow us to easily build a na12878kb.jar for IGV integration -- Just separated infrastructure into core package, away from the walkers themselves. -- Added na12878kb.jar target that builds a jar that can run a test main function (see testNA12878kbJar.csh)	2012-11-29 14:38:09 -05:00
Mark DePristo	52a6df4f1a	Add SummarizeConsensus walker that spits out information about the callsets in the KB -- Added summary to update consensus as well, so you can see what's been added as well	2012-11-29 13:07:46 -05:00
depristo	ed7a89c0c7	Merge pull request #3 from jsilter/master Fix NA12878DBArgumentCollectionUnitTest	2012-11-29 08:52:38 -08:00
Jacob Silterra	d9e8a414ef	Fix NA12878DBArgumentCollectionUnitTest so it uses testng, and testCompareLocalRemoteLocators compare the right things	2012-11-29 11:03:21 -05:00
David Roazen	df2c26b554	Rename NA12878DBArgumentCollectionTest to NA12878DBArgumentCollectionUnitTest Otherwise this test won't get run as part of the test suite...	2012-11-28 22:57:04 -05:00
David Roazen	b06e71cedf	Use build jars in test classpaths by default -Allows packaged resource files to be accessed within tests -Guards against packaging errors in dist/ jars by testing the jars that actually get run rather than unpackaged class files. Previously we were only protected against packaging errors in the monolithic jars posted to our website, not the dist/ jars used in everyday runs. -"ant fasttest" still uses the unpackaged class files for speed (don't want to have to rebuild the jars in fasttest). Relies on dubious methods to get at the resource files that would end up in the jars. -Eliminated the stupid separate "test" ivy config. Now we only invoke ivy ONCE during an ant build that includes tests.	2012-11-28 22:57:04 -05:00
Eric Banks	add1ab5d0e	Fix status of largeScaleValidationPools for NA12878-KB	2012-11-28 20:34:13 -05:00
Mark DePristo	b9be8850e2	Bugfixes to NA12878DBArgumentCollection and JSON and the GATK argument value injection system -- Functions that depend on the value of variables that have GATK injection values must be initialized lazy, not at object creation time. Previous version broken dbToUse and useLocal arguments. Fixed	2012-11-28 19:02:07 -05:00
Mark DePristo	7b74bf6677	Excluding large scale validation callsets from KB until further reviewed, rebuilding production server now	2012-11-28 18:41:49 -05:00
Mark DePristo	4729f0858d	ExtractConsensusSites -include and -exclude callsets now works on supporting callsets not the actual name -- Allows you to include / exclude callsets that appear in other callsets (as one would expect)	2012-11-28 18:41:16 -05:00
Mark DePristo	65357d26bc	New walker ExtractConsensusSites that extracts a VCF from the NA12878 Knowledge Base meeting criteria -- See @link http://gatkforums.broadinstitute.org/discussion/1848/using-the-na12878-knowledge-base for more information	2012-11-28 18:13:07 -05:00
Mark DePristo	de7049463c	New walker ExtractConsensusSites that extracts a VCF from the NA12878 Knowledge Base meeting criteria -- See @link http://gatkforums.broadinstitute.org/discussion/1848/using-the-na12878-knowledge-base for more information	2012-11-28 17:19:22 -05:00
Eric Banks	ff8b3904e2	Added many new resources to the NA12878 KB truth set	2012-11-28 17:18:24 -05:00
David Roazen	b2e699169c	Update GATK packaging settings to package arbitrary resources With the newly-added support for packaging arbitrary resources, the resources were getting packaged in a normal build but not when creating a standalone GATK jar. This corrects this oversight.	2012-11-28 15:26:05 -05:00
droazen	a359d43f54	Merge pull request #2 from jsilter/master Abstract connection to MongoDB so we can specify it through JSON file	2012-11-28 12:05:47 -08:00
David Roazen	26d9c41615	Allow arbitrary resources to be packaged in the GATK jar, selecting among public/private/protected appropriately -Resources must be in a "resources" or "templates" subdirectory within the Java package hierarchy -Remove direct inclusion of private resources from the main jar packaging target added in Jacob's patch: this would break builds where the private directory was absent, and did not respect build settings (include.private, etc.)	2012-11-28 14:53:08 -05:00
Joel Thibault	c76c808268	Reads are required to be sorted - Remove the extended_only case because it's outside intervals	2012-11-28 13:59:58 -05:00
Joel Thibault	198923b597	Add ActiveRegionReadState handling	2012-11-28 13:59:57 -05:00
Ryan Poplin	f0395b457a	Adding the work-in-progress, experimental RepeatLengthCovariate to the BQSR so Chris can continue the development.	2012-11-28 13:56:32 -05:00
Eric Banks	3463774f2a	Merged bug fix from Stable into Unstable	2012-11-28 13:26:52 -05:00
Eric Banks	6030605242	Added quick check for creation of bad BAQ values associated with badly encoded base qualities; hopefully this can help us debug the non-reproducible issue seen by many users.	2012-11-28 13:26:31 -05:00
Mark DePristo	c676853731	Merged bug fix from Stable into Unstable. Updating md5s Conflicts: protected/java/test/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperIntegrationTest.java	2012-11-28 12:54:36 -05:00
Mark DePristo	a1d6461121	Critical bugfix to AFCalcResult affecting UG/HC quality score emission thresholds As reported by Menachem Fromer: a critical bug in AFCalcResult: Specifically, the implementation: public boolean isPolymorphic(final Allele allele, final double log10minPNonRef) { return getLog10PosteriorOfAFGt0ForAllele(allele) >= log10minPNonRef; } seems incorrect and should probably be: getLog10PosteriorOfAFEq0ForAllele(allele) <= log10minPNonRef The issue here is that the 30 represents a Phred-scaled probability of error and it's currently being compared to a log probability of non-error. Instead, we need to require that our probability of error be less than the error threshold. This bug has only a minor impact on the calls -- hardly any sites change -- which is good. But the inverted logic effects multi-allelic sites significantly. Basically you only hit this logic with multiple alleles, and in that case it'\s including extra alt alleles incorrectly, and throwing out good ones. Change was to create a new function that properly handles thresholds that are PhredScaled quality scores: /** * Same as #isPolymorphic but takes a phred-scaled quality score as input */ public boolean isPolymorphicPhredScaledQual(final Allele allele, final double minPNonRefPhredScaledQual) { if ( minPNonRefPhredScaledQual < 0 ) throw new IllegalArgumentException("phredScaledQual " + minPNonRefPhredScaledQual + " < 0 "); final double log10Threshold = Math.log10(QualityUtils.qualToProb(minPNonRefPhredScaledQual)); return isPolymorphic(allele, log10Threshold); }	2012-11-28 12:08:02 -05:00
Mark DePristo	fb1e525e0f	Merge branch 'master' of github.com:broadinstitute/gsa-unstable	2012-11-28 07:57:05 -05:00
Mark DePristo	9fd1a3cb9c	Remove echo from startNA12878kbServer.csh	2012-11-28 07:56:50 -05:00
Menachem Fromer	79bc878e6a	Allow debugging to be set from the command line	2012-11-27 22:37:41 -05:00
Jacob Silterra	1cc0b48caa	Abstract connection to MongoDB so we can specify it through JSON file. Include 2 JSON spec files in GenomeAnalysisTK.jar Create MongoDBManager, which keeps track of connections based on Locator class. Locators can be instantiated directly, or read from JSON files (NA12878DBArgumentCollection uses the GSon library)	2012-11-27 17:44:55 -05:00
Menachem Fromer	31069ffced	Add HC pruning parameter option, as per Ryan's advice	2012-11-27 17:21:22 -05:00
depristo	6f1eb65ec8	Merge pull request #1 from jsilter/master Modifications to NA12878KB classes can so they can more easily be used as a library	2012-11-27 12:35:18 -08:00

1 2 3 4 5 ...

11215 Commits (b6839b30496daab74ea2d2b08690ff9ca4100508) All Branches Search

11215 Commits (b6839b30496daab74ea2d2b08690ff9ca4100508)

All Branches