Commit Graph

11215 Commits (b6839b30496daab74ea2d2b08690ff9ca4100508)

Author SHA1 Message Date
Eric Banks b6839b3049 Added checking in the GATK for mis-encoded quality scores.
The check is performed by a Read Transformer that samples (currently set to once
every 1000 reads so that we don't hurt overall GATK performance) from the input
reads and checks to make sure that none of the base quals is too high (> Q60). If
we encounter such a base then we fail with a User Error.

* Can be over-ridden with --allow_potentially_misencoded_quality_scores.
* Also, the user can choose to fix his quals on the fly (presumably using PrintReads
  to write out a fixed bam) with the --fix_misencoded_quality_scores argument.

Added unit tests.
2012-12-03 11:18:41 -05:00
Eric Banks 6f523a1ea0 Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-12-03 08:41:21 -05:00
Eric Banks 59fc7456cf Updated expectations for novel TiTv in HSP after Mark's fixes to the exact model 2012-12-03 08:41:13 -05:00
Mark DePristo f0a4710247 Callset summary now includes a table for the consensus itself 2012-12-02 16:40:12 -05:00
Mark DePristo ce9a323c04 NA12878 knowledge base automatically filters duplicate records out in the SiteIterator
-- Now it doesn't matter if there are duplicate records (all fields equal up to the date) in the knowledge base
2012-12-02 14:21:29 -05:00
Mark DePristo 1828d33a5a Bugfix to AssessNA12878
-- Wasn't handling indel overlaps correctly in SiteIterator.getSitesBefore, causing it to incorrectly skip variants underlying indels (the getSitesBefore was considering both start and stop [not the correct behavior]) causing it to only get sites up to the first record whose stop overlapped the requested start.
2012-12-02 11:09:15 -05:00
Eric Banks d7b951b6f3 Finished up my reviews for megabase chr20:10M-11M. Fixed out of order record from earlier. 2012-12-01 23:35:21 -05:00
Mark DePristo 2849889af5 Updating md5 for UG 2012-12-01 14:24:19 -05:00
depristo 3105f13df3 Merge pull request #4 from jsilter/master
Remove validate, add note to put it back in when public gatk catches up
2012-11-30 13:24:44 -08:00
Mark DePristo 1100f0733b Reviews for all unique omni poly sites on chr20
Updated setup script to includes these and ebanks reviews as well.  Eric -- your file is currently not sorted, fyi
2012-11-30 16:23:27 -05:00
Jacob Silterra 02e98fa516 Remove validate, add note to put it back in when public gatk catches up 2012-11-30 16:08:00 -05:00
Mark DePristo 8020ba14db Minor cleanup of SAMDataSource as part of my system review
-- Changed a few function from public to protected, as they are only used by the package contents, to simplify the SAMDataSource interface
2012-11-30 15:04:41 -05:00
Mark DePristo 66bbe46e5b MongoDBManager prints out meaningful information with toString 2012-11-30 15:04:41 -05:00
Mark DePristo 3248ca3f91 Validate MongoVariantContext on creation 2012-11-30 15:04:40 -05:00
Mark DePristo 79dbcc205c Minor cleanup for working version of igv 2012-11-30 15:04:40 -05:00
Mark DePristo 6b6a14cc6d Moving ConsensusSummarizer to its appropriate home in core of NA12878KB 2012-11-30 15:04:40 -05:00
Mauricio Carneiro db2a045321 Useful walker to establish minimum depth necessary for confident calling of different types of variants 2012-11-30 00:42:05 -05:00
Mauricio Carneiro fc7fab5f3b Fixed ReadBackedPileup downsampling
Downsampling in the PerSampleReadBackedPileup was broken, it didn't downsample anything, always returning a copy the original pileup.
2012-11-30 00:42:05 -05:00
Eric Banks 0e1287a843 Adding reviews for 1st 400kb of my target megabase (10-11) on chr20 2012-11-29 16:15:45 -05:00
Joel Thibault 97d29f203e Add walltime changes to LSF
- Check whether the specified attribute is available
- Add pipeline test (disabled due to missing attribute)
2012-11-29 15:23:37 -05:00
Johan Dahlberg daf6269b65 Setting the walltime
Signed-off-by: Joel Thibault <thibault@broadinstitute.org>
2012-11-29 15:23:36 -05:00
Mark DePristo f837e6ced7 Refactored entire NA12878KB to allow us to easily build a na12878kb.jar for IGV integration
-- Just separated infrastructure into core package, away from the walkers themselves.
-- Added na12878kb.jar target that builds a jar that can run a test main function (see testNA12878kbJar.csh)
2012-11-29 14:38:09 -05:00
Mark DePristo 52a6df4f1a Add SummarizeConsensus walker that spits out information about the callsets in the KB
-- Added summary to update consensus as well, so you can see what's been added as well
2012-11-29 13:07:46 -05:00
depristo ed7a89c0c7 Merge pull request #3 from jsilter/master
Fix NA12878DBArgumentCollectionUnitTest
2012-11-29 08:52:38 -08:00
Jacob Silterra d9e8a414ef Fix NA12878DBArgumentCollectionUnitTest so it uses testng, and testCompareLocalRemoteLocators compare the right things 2012-11-29 11:03:21 -05:00
David Roazen df2c26b554 Rename NA12878DBArgumentCollectionTest to NA12878DBArgumentCollectionUnitTest
Otherwise this test won't get run as part of the test suite...
2012-11-28 22:57:04 -05:00
David Roazen b06e71cedf Use build jars in test classpaths by default
-Allows packaged resource files to be accessed within tests

-Guards against packaging errors in dist/ jars by testing the
jars that actually get run rather than unpackaged class files.
Previously we were only protected against packaging errors in the
monolithic jars posted to our website, not the dist/ jars used in
everyday runs.

-"ant fasttest" still uses the unpackaged class files for speed
(don't want to have to rebuild the jars in fasttest). Relies on
dubious methods to get at the resource files that would end up
in the jars.

-Eliminated the stupid separate "test" ivy config. Now we only
invoke ivy ONCE during an ant build that includes tests.
2012-11-28 22:57:04 -05:00
Eric Banks add1ab5d0e Fix status of largeScaleValidationPools for NA12878-KB 2012-11-28 20:34:13 -05:00
Mark DePristo b9be8850e2 Bugfixes to NA12878DBArgumentCollection and JSON and the GATK argument value injection system
-- Functions that depend on the value of variables that have GATK injection values must be initialized lazy, not at object creation time.  Previous version broken dbToUse and useLocal arguments.  Fixed
2012-11-28 19:02:07 -05:00
Mark DePristo 7b74bf6677 Excluding large scale validation callsets from KB until further reviewed, rebuilding production server now 2012-11-28 18:41:49 -05:00
Mark DePristo 4729f0858d ExtractConsensusSites -include and -exclude callsets now works on supporting callsets not the actual name
-- Allows you to include / exclude callsets that appear in other callsets (as one would expect)
2012-11-28 18:41:16 -05:00
Mark DePristo 65357d26bc New walker ExtractConsensusSites that extracts a VCF from the NA12878 Knowledge Base meeting criteria
-- See @link http://gatkforums.broadinstitute.org/discussion/1848/using-the-na12878-knowledge-base for more information
2012-11-28 18:13:07 -05:00
Mark DePristo de7049463c New walker ExtractConsensusSites that extracts a VCF from the NA12878 Knowledge Base meeting criteria
-- See @link http://gatkforums.broadinstitute.org/discussion/1848/using-the-na12878-knowledge-base for more information
2012-11-28 17:19:22 -05:00
Eric Banks ff8b3904e2 Added many new resources to the NA12878 KB truth set 2012-11-28 17:18:24 -05:00
David Roazen b2e699169c Update GATK packaging settings to package arbitrary resources
With the newly-added support for packaging arbitrary resources, the
resources were getting packaged in a normal build but not when
creating a standalone GATK jar. This corrects this oversight.
2012-11-28 15:26:05 -05:00
droazen a359d43f54 Merge pull request #2 from jsilter/master
Abstract connection to MongoDB so we can specify it through JSON file
2012-11-28 12:05:47 -08:00
David Roazen 26d9c41615 Allow arbitrary resources to be packaged in the GATK jar, selecting among public/private/protected appropriately
-Resources must be in a "resources" or "templates" subdirectory within the Java package hierarchy

-Remove direct inclusion of private resources from the main jar packaging target added in Jacob's
patch: this would break builds where the private directory was absent, and did not respect build
settings (include.private, etc.)
2012-11-28 14:53:08 -05:00
Joel Thibault c76c808268 Reads are required to be sorted
- Remove the extended_only case because it's outside intervals
2012-11-28 13:59:58 -05:00
Joel Thibault 198923b597 Add ActiveRegionReadState handling 2012-11-28 13:59:57 -05:00
Ryan Poplin f0395b457a Adding the work-in-progress, experimental RepeatLengthCovariate to the BQSR so Chris can continue the development. 2012-11-28 13:56:32 -05:00
Eric Banks 3463774f2a Merged bug fix from Stable into Unstable 2012-11-28 13:26:52 -05:00
Eric Banks 6030605242 Added quick check for creation of bad BAQ values associated with badly encoded base qualities; hopefully this can help us debug the non-reproducible issue seen by many users. 2012-11-28 13:26:31 -05:00
Mark DePristo c676853731 Merged bug fix from Stable into Unstable. Updating md5s
Conflicts:
	protected/java/test/org/broadinstitute/sting/gatk/walkers/genotyper/UnifiedGenotyperIntegrationTest.java
2012-11-28 12:54:36 -05:00
Mark DePristo a1d6461121 Critical bugfix to AFCalcResult affecting UG/HC quality score emission thresholds
As reported by Menachem Fromer: a critical bug in AFCalcResult:

Specifically, the implementation:
    public boolean isPolymorphic(final Allele allele, final double log10minPNonRef) {
        return getLog10PosteriorOfAFGt0ForAllele(allele) >= log10minPNonRef;
    }

seems incorrect and should probably be:

getLog10PosteriorOfAFEq0ForAllele(allele) <= log10minPNonRef

The issue here is that the 30 represents a Phred-scaled probability of *error* and it's currently being compared to a log probability of *non-error*.

Instead, we need to require that our probability of error be less than the error threshold.
This bug has only a minor impact on the calls -- hardly any sites change -- which is good.  But the inverted logic effects multi-allelic sites significantly.  Basically you only hit this logic with multiple alleles, and in that case it'\s including extra alt alleles incorrectly, and throwing out good ones.

Change was to create a new function that properly handles thresholds that are PhredScaled quality scores:

    /**
     * Same as #isPolymorphic but takes a phred-scaled quality score as input
     */
    public boolean isPolymorphicPhredScaledQual(final Allele allele, final double minPNonRefPhredScaledQual) {
        if ( minPNonRefPhredScaledQual < 0 ) throw new IllegalArgumentException("phredScaledQual " + minPNonRefPhredScaledQual + " < 0 ");
        final double log10Threshold = Math.log10(QualityUtils.qualToProb(minPNonRefPhredScaledQual));
        return isPolymorphic(allele, log10Threshold);
    }
2012-11-28 12:08:02 -05:00
Mark DePristo fb1e525e0f Merge branch 'master' of github.com:broadinstitute/gsa-unstable 2012-11-28 07:57:05 -05:00
Mark DePristo 9fd1a3cb9c Remove echo from startNA12878kbServer.csh 2012-11-28 07:56:50 -05:00
Menachem Fromer 79bc878e6a Allow debugging to be set from the command line 2012-11-27 22:37:41 -05:00
Jacob Silterra 1cc0b48caa Abstract connection to MongoDB so we can specify it through JSON file. Include 2 JSON spec files in GenomeAnalysisTK.jar
Create MongoDBManager, which keeps track of connections based on Locator class. Locators can be instantiated directly, or read from JSON files (NA12878DBArgumentCollection uses the GSon library)
2012-11-27 17:44:55 -05:00
Menachem Fromer 31069ffced Add HC pruning parameter option, as per Ryan's advice 2012-11-27 17:21:22 -05:00
depristo 6f1eb65ec8 Merge pull request #1 from jsilter/master
Modifications to NA12878KB classes can so they can more easily be used as a library
2012-11-27 12:35:18 -08:00