Commit Graph

8989 Commits (359090c4b79f814f36b5004ec253cda0d2dfb830)

Author SHA1 Message Date
Eric Banks 359090c4b7 Updating dbsnp to v135 2012-03-12 13:17:58 -04:00
Eric Banks 7e9a535c4d Updated the bundle to use the official filtered (final) indel calls 2012-03-12 12:12:24 -04:00
Eric Banks 05ef5863cf Don't assume files have .bai and .bas associated with them 2012-03-12 11:47:48 -04:00
Mark DePristo 6bc92d2bbf Bugfix for analyzeRunReports.py: now handles case where hostname is missing 2012-03-12 09:46:29 -04:00
Mark DePristo a63d1f58b6 analyzeRunReports cleanup for new minimal GATKRunReport structure
-- No more command lines or working directories
-- Added failing and successful gatkrunreports to public/testdata for testing
2012-03-12 09:46:26 -04:00
Ryan Poplin 03223029e3 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-12 09:42:37 -04:00
Eric Banks 04cafffaa7 Merge remote-tracking branch 'unstable/master' 2012-03-12 08:43:43 -04:00
Eric Banks b4749757f8 Fixes for SLOD: 1) didn't work properly for multi-allelics (randomly chose an allele, possibly one that wasn't genotyped in the full context); 2) in cases when there were more alt alleles than the max allowed and the user is calculating SB, we would recompute the best alt alleles(s); 3) for some reason, we were recomputing the LOD for the full context when we'd already done that. Given that this passes integration tests on my end, this should be the last commit before the release. 2012-03-12 01:07:07 -04:00
Ryan Poplin 191a5860cf updating HaplotypeCaller integration tests 2012-03-11 16:23:25 -04:00
Ryan Poplin 2836c161ee Moving trimToVariableRegion out of reduced reads and into a public static ReadClipper function. HaplotypeCaller clips reads to the active region boundries before passing to the HMM. The philosophy of the HC is moving towards genotyping the entire haplotype sequence contained within the active region as a single allele. 2012-03-11 14:45:59 -04:00
Ryan Poplin 8db11eb781 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-10 21:00:55 -05:00
Ryan Poplin 0cadd4d732 Banding approach for PairHMM utility class. Adds bands to a work queue when distinct likelihood masses are detected in the HMM matrices. Works over a wide range of haplotype sizes and GOP values. 2012-03-10 20:59:35 -05:00
Mark DePristo 1ee46e5c06 Collect only the bare essentials in the GATKRunReport
Now looks like:
<GATK-run-report>
   <id>D7D31ULwTSxlAwnEOSmW6Z4PawXwMxEz</id>
   <start-time>2012/03/10 20.21.19</start-time>
   <end-time>2012/03/10 20.21.19</end-time>
   <run-time>0</run-time>
   <walker-name>CountReads</walker-name>
   <svn-version>1.4-483-g63ecdb2</svn-version>
   <total-memory>85000192</total-memory>
   <max-memory>129957888</max-memory>
   <user-name>depristo</user-name>
   <host-name>10.0.1.10</host-name>
   <java>Apple Inc.-1.6.0_26</java>
   <machine>Mac OS X-x86_64</machine>
   <iterations>105</iterations>
</GATK-run-report>

No longer capturing command line or directory information, to minimize people's concerns with phone home and privacy
2012-03-10 20:27:14 -05:00
Ryan Poplin 92bbb9bbdd Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-10 10:09:57 -05:00
Mark DePristo bd883031a4 Final version of QualQuantizer
-- Docs everywhere
-- Contracts everywhere
-- More unit tests
-- Better error checking
-- Marginally nicer interface to QuantizeQualsWalker
2012-03-09 16:00:08 -05:00
Mark DePristo 4b404cae48 Final evaluation script for quantizing quality scores 2012-03-09 16:00:08 -05:00
Mark DePristo e2c62572f9 Further upgrades to CalibrateGenotypeLikelihoods.R
- Uses modified yates correction of e + 1 / n + 2 to estimate error rates
- Now shows ALL and per read group information
- Better limits on diff plots so we can see more information
2012-03-09 16:00:08 -05:00
Mark DePristo fceb2bf25b Updating CalibrateGenotypeLikelihoods.R to display Q93 not filter them out 2012-03-09 16:00:07 -05:00
Mark DePristo 3ba2e5667c CalibrateGenotypesLikelihoods include pOfDGivenD now 2012-03-09 16:00:07 -05:00
Mark DePristo 1011f3862b CalibrateGenotypeLikelihoods now emits the position of the variant for debugging
-- Refactored some duplicated code (FYI, code duplication = root of all evil) into shared functions
-- Added long-missing integrationtests
-- CHRIS/RYAN -- it would be very good to add an integration test covering external VCF files as I believe we rely on this functionality and it's not tested at all
2012-03-09 16:00:07 -05:00
Mark DePristo 8158348e01 Prints xlim = 30 and xlim = 99 in CalibrateGenotypeLikelihoods.R 2012-03-09 16:00:07 -05:00
David Roazen 91d10431d3 BAMScheduler: detect contigs from the interval list that are not in the merged BAM header's sequence dictionary
This is a quick-and-dirty patch for the null pointer error Mauricio reported earlier.

Later on we might want to address in a more general way the fact that we validate user intervals
against the reference but not against the merged BAM header produced by the engine at runtime.
2012-03-09 15:20:16 -05:00
David Roazen bc65f6326f Detect incomplete reads from BAM schedule file in BAMSchedule before they become buffer underflows
This fix is similar, but distinct from the earlier fix to GATKBAMIndex. If we fail to read in
a complete 3-integer bin header from the BAM schedule file that the engine has written, throw a
ReviewedStingException (since this is our problem, not the user's) rather than allowing a
cryptic buffer underflow error to occur.

Note that this change does not fix the underlying problem in the engine, if there is one
(there may be an as-yet-undetected bug in the code that writes the bam schedule). It will
just make it easier for us to identify what's going wrong in the future.
2012-03-09 12:33:48 -05:00
David Roazen 32dee7ed9b Avoid buffer underflow in GATKBAMIndex by detecting premature EOF in BAM indices
GATKBAMIndex would allow an extremely confusing BufferUnderflowException to be
thrown when a BAM index file was truncated or corrupt. Now, a UserException is
thrown in this situation instructing the user to re-index the BAM.

Added a unit test for this case as well.
2012-03-08 15:30:44 -05:00
Guillermo del Angel c04853eae6 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-08 12:30:04 -05:00
Guillermo del Angel 858acf8616 Hidden mode in ValidationAmplicons to support ILMN output format (same as Sequenom, with just shuffled columns) 2012-03-08 12:29:44 -05:00
Andrey Sivachenko 56f074b520 docs updated 2012-03-07 18:47:15 -05:00
Andrey Sivachenko 117ea605ac Merge branch 'master' of ssh://gsa1/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-07 18:35:07 -05:00
Andrey Sivachenko 497a1b059e transition to JEXL completed, old parameters setting individual cutoffs now deprecated 2012-03-07 18:34:11 -05:00
Andrey Sivachenko fbd2f04a04 JEXL support added; intermediate commit, not yet functional 2012-03-07 17:29:42 -05:00
Mark DePristo 20d10dfa35 EvalQuantizedQuals now tests the impact on reduced reads as well 2012-03-07 13:10:08 -05:00
Mark DePristo 0376d73ece Improved, public version of ErrorRateByCycle
-- A cleaner table output (molten).  For those interested in seeing how this can be done with GATKReports look here for a nice clean example
-- Integration tests
-- Minor improvements to GATKReportTable with methods to getPrimaryKeys
2012-03-07 13:10:08 -05:00
Christopher Hartl a6a8fc0521 Merge branch 'master' of ssh://ni.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable 2012-03-07 10:05:43 -05:00
Eric Banks c4824a77f5 Some to-do items for the reduced reads calling script 2012-03-07 10:03:10 -05:00
Christopher Hartl 155839e901 Commit of VQSRV3 with Random Forest Bridge and Decision Tree engines. Lots of code duplication with the variant recalibrator in public, but also some subtle changes (i.e. to the engines and data manager). Code worked when it overwrote the stuff in public, but couldn't commit that. Will push if it works for private as well. 2012-03-07 09:46:43 -05:00
Mark DePristo 26dcec08d5 Bugfix for QualQuantizerUnitTest
-- Enabled failing provider
-- Fixed incorrect expectation in unit test
2012-03-07 09:30:03 -05:00
Mark DePristo 8ef654aa77 Minor improvements to QuantizeQuals
-- Commenting out excessive debugging in the walker
-- Scala script to quantize BAM, run calibrate genotype likelihoods, call snps, and compare them to the full bam call set for 1, 2, 4, 8, 16, 32, and 64 quantization levels
2012-03-06 16:56:59 -05:00
Mark DePristo 569be953b9 Bugfix for VariantEval
-- We weren't properly handling the case where a site had both a SNP and indel in both eval and comp.  These would naturally pair off as SNP x SNP and INDEL x INDEL in eval, but we'd still invoke update2 with (null, SNP) and (null, INDEL) resulting most conspicously as incorrect false negatives in the validation report.
-- Updating misc. integrationtests, as the counting of comps (in particular for dbSNP) was inflated because of this effect.
2012-03-06 16:56:59 -05:00
Mark DePristo 5f35f5d338 QualQuantizer scales the penalty by the log of the two error rates
-- Old equation was |E1 - E*| * N1.  New equation is |log10(E1) - log10(E2)| * N1 which is equivalent to E1 * N1/E2
2012-03-06 16:56:58 -05:00
Mark DePristo 8d2db3f249 Emit and visualize quality histogram in QualQuantizer 2012-03-06 16:56:58 -05:00
Mark DePristo b7089a3b05 Improvements to QualQuantizer; Walker to quantize quals in BAM file
-- QualQuantizer now tracks merge order and level in the QualInterval for debugging / visualization
-- Write out QualIntervals tree for visualization
-- visualizeQuantizedQuals.R r script for basic visualization of the quality score quantization
2012-03-06 16:56:58 -05:00
David Roazen 811f871f78 Do not fail tests that require the GATK private key if the user does not have permission to read it
Several of the unit tests for the new key authorization feature require
read access to the GATK master private key file. Since this file is only
readable by members of the group gsagit, this makes it hard for people
outside the group to run the test suite.

Now, we skip tests that require the master private key if the private
key exists (since not existing would be a true error) but is not readable
by the user running the test suite

Bamboo, of course, will always be able to run these tests.
2012-03-06 15:57:02 -05:00
Christopher Hartl 67def6acc8 Merge branch 'master' of ssh://ni.broadinstitute.org/humgen/gsa-scr1/chartl/dev/unstable 2012-03-06 14:23:14 -05:00
Christopher Hartl 20c1fbaf0f Fixing a merge (turning off downsampling on DoC) 2012-03-06 14:22:45 -05:00
Ryan Poplin 46b470cc69 Minor misc updates 2012-03-06 10:14:45 -05:00
David Roazen 0702ee1587 Public-key authorization scheme to restrict use of NO_ET
-Running the GATK with the -et NO_ET or -et STDOUT options now
 requires a key issued by us. Our reasons for doing this, and the
 procedure for our users to request keys, are documented here:
 http://www.broadinstitute.org/gsa/wiki/index.php/Phone_home

-A GATK user key is an email address plus a cryptographic signature
 signed using our private key, all wrapped in a GZIP container.
 User keys are validated using the public key we now distribute with
 the GATK. Our private key is kept in a secure location.

-Keys are cryptographically secure in that valid keys definitely
 came from us and keys cannot be fabricated, however keys are not
 "copy-protected" in any way.

-Includes private, standalone utilities to create a new GATK user key
 (GenerateGATKUserKey) and to create a new master public/private key
 pair (GenerateKeyPair). Usage of these tools will be documented on
 the internal wiki shortly.

-Comprehensive unit/integration tests, including tests to ensure the
 continued integrity of the GATK master public/private key pair.

-Generation of new user keys and the new unit/integration tests both
 require access to the GATK private key, which can only be read by
 members of the group "gsagit".
2012-03-06 00:09:43 -05:00
Lechu 027843d791 I've simply added a "library(grid)" call at the beginning of the R script generation since R 2.14.2 doesn't seem to load the "grid" package as default. I haven't tested it on previous R versions (you may edit the R version comment to be more precise if desired), but I'm almost certain that this library call shouldn't do any harm on them.
Signed-off-by: Ryan Poplin <rpoplin@broadinstitute.org>
2012-03-05 21:27:03 -05:00
Ryan Poplin f6905630bb Adding Unit test for Haplotype class. Used in HC's genotype given alleles mode. 2012-03-05 21:08:07 -05:00
Ryan Poplin 9b53250bef Adding Unit test for Haplotype class. Used in HC's genotype given alleles mode. 2012-03-05 21:07:36 -05:00
Ryan Poplin b37461587d Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-03-05 17:54:59 -05:00