Commit Graph

11188 Commits (add1ab5d0e7726e80b62aa5989c4c130c75364f8)

Author SHA1 Message Date
Eric Banks add1ab5d0e Fix status of largeScaleValidationPools for NA12878-KB 2012-11-28 20:34:13 -05:00
Mark DePristo b9be8850e2 Bugfixes to NA12878DBArgumentCollection and JSON and the GATK argument value injection system
-- Functions that depend on the value of variables that have GATK injection values must be initialized lazy, not at object creation time.  Previous version broken dbToUse and useLocal arguments.  Fixed
2012-11-28 19:02:07 -05:00
Mark DePristo 7b74bf6677 Excluding large scale validation callsets from KB until further reviewed, rebuilding production server now 2012-11-28 18:41:49 -05:00
Mark DePristo 4729f0858d ExtractConsensusSites -include and -exclude callsets now works on supporting callsets not the actual name
-- Allows you to include / exclude callsets that appear in other callsets (as one would expect)
2012-11-28 18:41:16 -05:00
Mark DePristo 65357d26bc New walker ExtractConsensusSites that extracts a VCF from the NA12878 Knowledge Base meeting criteria
-- See @link http://gatkforums.broadinstitute.org/discussion/1848/using-the-na12878-knowledge-base for more information
2012-11-28 18:13:07 -05:00
Mark DePristo de7049463c New walker ExtractConsensusSites that extracts a VCF from the NA12878 Knowledge Base meeting criteria
-- See @link http://gatkforums.broadinstitute.org/discussion/1848/using-the-na12878-knowledge-base for more information
2012-11-28 17:19:22 -05:00
Eric Banks ff8b3904e2 Added many new resources to the NA12878 KB truth set 2012-11-28 17:18:24 -05:00
David Roazen b2e699169c Update GATK packaging settings to package arbitrary resources
With the newly-added support for packaging arbitrary resources, the
resources were getting packaged in a normal build but not when
creating a standalone GATK jar. This corrects this oversight.
2012-11-28 15:26:05 -05:00
droazen a359d43f54 Merge pull request #2 from jsilter/master
Abstract connection to MongoDB so we can specify it through JSON file
2012-11-28 12:05:47 -08:00
David Roazen 26d9c41615 Allow arbitrary resources to be packaged in the GATK jar, selecting among public/private/protected appropriately
-Resources must be in a "resources" or "templates" subdirectory within the Java package hierarchy

-Remove direct inclusion of private resources from the main jar packaging target added in Jacob's
patch: this would break builds where the private directory was absent, and did not respect build
settings (include.private, etc.)
2012-11-28 14:53:08 -05:00
Joel Thibault c76c808268 Reads are required to be sorted
- Remove the extended_only case because it's outside intervals
2012-11-28 13:59:58 -05:00
Joel Thibault 198923b597 Add ActiveRegionReadState handling 2012-11-28 13:59:57 -05:00
Ryan Poplin f0395b457a Adding the work-in-progress, experimental RepeatLengthCovariate to the BQSR so Chris can continue the development. 2012-11-28 13:56:32 -05:00
Eric Banks 3463774f2a Merged bug fix from Stable into Unstable 2012-11-28 13:26:52 -05:00
Eric Banks 6030605242 Added quick check for creation of bad BAQ values associated with badly encoded base qualities; hopefully this can help us debug the non-reproducible issue seen by many users. 2012-11-28 13:26:31 -05:00
Mark DePristo c676853731 Merged bug fix from Stable into Unstable. Updating md5s
Conflicts:
	protected/java/test/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperIntegrationTest.java
2012-11-28 12:54:36 -05:00
Mark DePristo a1d6461121 Critical bugfix to AFCalcResult affecting UG/HC quality score emission thresholds
As reported by Menachem Fromer: a critical bug in AFCalcResult:

Specifically, the implementation:
    public boolean isPolymorphic(final Allele allele, final double log10minPNonRef) {
        return getLog10PosteriorOfAFGt0ForAllele(allele) >= log10minPNonRef;
    }

seems incorrect and should probably be:

getLog10PosteriorOfAFEq0ForAllele(allele) <= log10minPNonRef

The issue here is that the 30 represents a Phred-scaled probability of *error* and it's currently being compared to a log probability of *non-error*.

Instead, we need to require that our probability of error be less than the error threshold.
This bug has only a minor impact on the calls -- hardly any sites change -- which is good.  But the inverted logic effects multi-allelic sites significantly.  Basically you only hit this logic with multiple alleles, and in that case it'\s including extra alt alleles incorrectly, and throwing out good ones.

Change was to create a new function that properly handles thresholds that are PhredScaled quality scores:

    /**
     * Same as #isPolymorphic but takes a phred-scaled quality score as input
     */
    public boolean isPolymorphicPhredScaledQual(final Allele allele, final double minPNonRefPhredScaledQual) {
        if ( minPNonRefPhredScaledQual < 0 ) throw new IllegalArgumentException("phredScaledQual " + minPNonRefPhredScaledQual + " < 0 ");
        final double log10Threshold = Math.log10(QualityUtils.qualToProb(minPNonRefPhredScaledQual));
        return isPolymorphic(allele, log10Threshold);
    }
2012-11-28 12:08:02 -05:00
Mark DePristo fb1e525e0f Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-11-28 07:57:05 -05:00
Mark DePristo 9fd1a3cb9c Remove echo from startNA12878kbServer.csh 2012-11-28 07:56:50 -05:00
Menachem Fromer 79bc878e6a Allow debugging to be set from the command line 2012-11-27 22:37:41 -05:00
Jacob Silterra 1cc0b48caa Abstract connection to MongoDB so we can specify it through JSON file. Include 2 JSON spec files in GenomeAnalysisTK.jar
Create MongoDBManager, which keeps track of connections based on Locator class. Locators can be instantiated directly, or read from JSON files (NA12878DBArgumentCollection uses the GSon library)
2012-11-27 17:44:55 -05:00
Menachem Fromer 31069ffced Add HC pruning parameter option, as per Ryan's advice 2012-11-27 17:21:22 -05:00
depristo 6f1eb65ec8 Merge pull request #1 from jsilter/master
Modifications to NA12878KB classes can so they can more easily be used as a library
2012-11-27 12:35:18 -08:00
Jacob Silterra b15edd9eb3 Modifications so these classes can more easily be used as a library. In particular:
0. Add additional create method to MongoVariantContext as convenience, if we want a custom TruthStatus (and change "type" to less ambiguous "truthStatus")
1. Have NA12878KnowledgeBase return WriteResults from insert methods, so caller can know if there's an error
2. Provide constructors for NA12878DBArgumentCollection, since we need to be able to create this class for NA12878DBKnowledgeBase
2012-11-27 14:49:56 -05:00
Mark DePristo d10b858e0b Finalizing setupNA12878kb script for use in cron 2012-11-27 14:44:08 -05:00
Eric Banks b40d3eb8aa Merged bug fix from Stable into Unstable 2012-11-27 14:41:07 -05:00
Eric Banks 01abcc3e0f Tests didn't like my note to Geraldine in the output logs; apparently it's tested in integration tests 2012-11-27 14:40:49 -05:00
Mark DePristo ffb232bdf0 NA12878 Knowledge Base modules use DEV by default if they modify the database, while accessors use PRODUCTION
-- Added script that starts KB server
2012-11-27 14:26:23 -05:00
Mark DePristo 7e4b9c9e6e Fix failing unit tests for VariantContextUtilsUnitTest
-- Previous version was adding multiple samples with the same name to the variant context
2012-11-27 14:26:23 -05:00
Mark DePristo 4281498c2c Improvements to NA12878KnowledgeBase system
-- Cleaned up code for SiteIterator.
-- Added a generic error handling system for the SiteIterator.  Created approaches to simply throw errors when invalid records are found, to log them, and to remove them from the sites collection.
-- By default getCalls() produces a SiteIterator that removes incorrectly formatted records from the DB
-- Created NA12878KnowledgeBaseServer GATK walker that (1) continually finds newly added records to the sites database and rebuilds the consensus as needed and (2) archives the reviewed sites to a VCF file upon server termination
-- More, better unit tests everywhere
-- Adding infrastructure to find only newly added sites to the NA12878KnowledgeBase.  Uses mongos ordering of _id to obtain the records (and the sites) of variants newly added to the sites collection.  This is essential infrastructure to write a NA12878KnowledgeBase server that continually keeps the consensus records updated as new sites are added to the database
2012-11-27 14:26:23 -05:00
Joel Thibault 9bfe39411e Equal overlap should match right/later region 2012-11-27 13:03:13 -05:00
Joel Thibault d83ad906ef Add profile range contract 2012-11-27 13:03:13 -05:00
Joel Thibault cc550b4145 Add a read and interval on a different contig 2012-11-27 13:03:13 -05:00
Eric Banks 9531e58445 Merged bug fix from Stable into Unstable 2012-11-27 11:00:50 -05:00
Eric Banks 4543ece088 Fixing parsing of genomelocs that contain colons in the contig names (which is allowed by the spec) as reported on the forum. Added unit test for this case. 2012-11-27 11:00:33 -05:00
Eric Banks a82ec7ad80 Merged bug fix from Stable into Unstable 2012-11-27 10:27:08 -05:00
Eric Banks e199562c25 I have pulled out all of the documentation URLs and put them into the HelpUtils class as static variables; this way, Appistry can change links as needed to point commercial users to their own internal forum without having to muck things up all over our source. Added some TODOs for Geraldine to update links in the GATK docs that still point to the old wiki. Sorry that I am pushing into stable, but that's what Appistry is pulling from for their release next week (and unstable has been failing forever). 2012-11-27 10:26:17 -05:00
Mauricio Carneiro 97fd5de260 Merging latest CMI updates with UNSTABLE 2012-11-27 09:08:00 -05:00
Eric Banks b1969a66bd Update docs 2012-11-27 08:24:41 -05:00
Eric Banks cc72aaefeb Minor efficiency: use >= instead of > in test 2012-11-27 01:11:23 -05:00
Eric Banks 405f3c675d Fix for GSA-649: GenomeLocSortedSet.overlaps is crazy slow. Also improved GenomeLocSortedSet.sizeBeforeLoc. 2012-11-27 01:07:00 -05:00
Ryan Poplin e27d677c13 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-11-26 12:20:32 -05:00
Ryan Poplin 59cef880d1 Updating HC integration tests because experimental, HC-specific annotations have been removed. 2012-11-26 12:20:07 -05:00
Ryan Poplin c3b7dd1374 Misc cleanup in the HaplotypeCaller. Cleaning up unused arguments after recent changes to HC-GenotypingEngine 2012-11-26 12:19:11 -05:00
Eric Banks 4f7fa3009a I forget why I thought that the VariantAnnotator couldn't run multi-threaded because it works just fine. Now you can specify -nt with VA. 2012-11-26 11:34:59 -05:00
Mauricio Carneiro c0261f75ce Merging master and develop together
(because I forgot to do so when I merged in nov 14th, now develop has a few extra commits not present in master).
2012-11-26 11:31:47 -05:00
Mauricio Carneiro a3f5932501 Fixed null pointer exception in Integration Tests
When running Utils.setupWriter with NO_PG_TAG set, the writer was attempting to create a program record with the null pointer. Fixed.
2012-11-26 11:12:27 -05:00
Eric Banks b15b62157a Use correct path in imports 2012-11-26 10:09:13 -05:00
Menachem Fromer 3784bb5258 Fixes to process all SNPs and indels simultaneously (even those at same site) 2012-11-26 03:59:36 -05:00
Ryan Poplin fedc4fde6c Merged bug fix from Stable into Unstable 2012-11-25 21:55:55 -05:00