Commit Graph

1930 Commits (143e92b79790cdbed7f60b8e9ecd87f9085b3f04)

Author SHA1 Message Date
Guillermo del Angel 143e92b797 Rebasing 2012-04-18 20:05:43 -04:00
Ryan Poplin 4999ae87ad Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-18 15:02:42 -04:00
Ryan Poplin dcc4871468 minor misc optimizations to PairHMM 2012-04-18 15:02:26 -04:00
Eric Banks d3c84e7b1f This should be a User Error since it's provided from the DoC command-line arguments 2012-04-18 13:09:23 -04:00
Eric Banks 392f1903f7 Handling some of the NumberFormatExceptions seen via Tableau that are really user errors. 2012-04-18 12:57:37 -04:00
Ryan Poplin 8a84456626 Following Eric's awesome update to change the VQSR recal file into a VCF file, the ApplyRecalibration step is now scatter/gather-able and tree reducible. 2012-04-18 11:24:04 -04:00
Eric Banks 4448a3ea76 Final tweaks. Added an integration test to cover the case of SNPs and indels that start at the same position. 2012-04-17 23:54:10 -04:00
Eric Banks c1f52b773a Minor tweaks and updated integration tests MD5s 2012-04-17 23:17:28 -04:00
Eric Banks 6d03bce0d3 Important refactoring of the VQSR recal file format: we now use a VCF instead of a CSV file.
The most important reason for this change is that we no longer need to read the entire recal file into memory up front in ApplyRecalibration.  For 1000G calling this was prohibitive in terms of memory requirements.  Now we go through the rod system and pull in just the records we need at a given position.

As an added bonus, once BCF2 is live we can drastically cut down the sizes of these recal files (which can grow large for whole genome calling).
2012-04-17 22:38:18 -04:00
Eric Banks ea793d8e27 Khalid pressured me into adding an integration test that makes sure we don't fail on reads with adjacent I and D events. 2012-04-17 21:21:29 -04:00
Mauricio Carneiro 46a212d8e9 Added "simplify reads" option to PrintReads. 2012-04-17 19:32:34 -04:00
Mauricio Carneiro f0c81b59b0 Implementation of the new BQSR plotting infrastructure
* removed low quality bases from the recalibration report.
   * refactored the Datum (Recal and Accuracy) class structure
   * created a new plotting csv table for optimized performance with the R script
   * added a datum object that carries the accuracy information (AccuracyDatum) for plotting
   * added mean reported quality score to all covariates
   * added QualityScore as a covariate for plotting purposes
   * added unit test to the key manager to operate with one required covariate and multiple optional covariates
   * integrated the plotting into BQSR (automatically generates the pdf with the recalibration tearsheet)
2012-04-17 19:23:55 -04:00
Ryan Poplin 952280bef1 Merge branch 'master' of ssh://nickel.broadinstitute.org/humgen/gsa-scr1/gsa-engineering/git/unstable 2012-04-17 17:00:14 -04:00
Ryan Poplin cf705f6c62 Adding read position rank sum test to the list of annotations that get produced with the HaplotypeCaller 2012-04-17 17:00:00 -04:00
Eric Banks 13c800417e Handle NPE in UG indel code: deletions immediately preceding insertions were not handled well in the code. 2012-04-17 15:51:23 -04:00
Khalid Shakir 91cb654791 AggregateMetrics:
- By porting from jython to java now accessible to Queue via automatic extension generation.
- Better handling for problematic sample names by using PicardAggregationUtils.
GATKReportTable looks up keys using arrays instead of dot-separated strings, which is useful when a sample has a period in the name.
CombineVariants has option to suppress the header with the command line, which is now invoked during VCF gathering.
Added SelectHeaders walker for filtering headers for dbGAP submission.
Generated command line for read filters now correctly prefixes the argument name as --read_filter instead of -read_filter.
Latest WholeGenomePipeline.
Other minor cleanup to utility methods.
2012-04-17 11:45:32 -04:00
Ryan Poplin 1a2e92f8db Merged bug fix from Stable into Unstable 2012-04-17 10:23:05 -04:00
Ryan Poplin adad76b36f Fixing NPE in VQSR for the case of very small callsets. 2012-04-17 10:20:43 -04:00
Mark DePristo 3f6b2423d8 Update VE IT to reflect new fields and bugfixes 2012-04-13 17:00:37 -04:00
Mark DePristo f9190b6fcd VariantEvalUnitTest is better named VariantEvalWalkerUnitTest 2012-04-13 17:00:37 -04:00
Mark DePristo 23ccf772d4 IndelSummary now emits all of the underlying counts for ratios, percentages, etc it computes 2012-04-13 17:00:36 -04:00
Mark DePristo 84d1e8713a Infrastructure for combining VariantEvaluations
-- Not hooked up yet, so the output of VariantEval should be the same as before
-- Implemented a VariantEvalUnitTest that tests the low level strat / eval combinatorics and counting routines
-- Better docs throughout
2012-04-13 17:00:36 -04:00
Mark DePristo 38986e4240 Documentation for StratificationManager 2012-04-13 17:00:36 -04:00
Mark DePristo ab06d53867 Useful test constructor or Unit tests in RefMetaDataTracker 2012-04-13 17:00:36 -04:00
Mark DePristo 285e61a227 Bugfix for IndelSummary
-- multi allelic count should be % not ratio
2012-04-13 17:00:35 -04:00
Mark DePristo e6d5cb46d2 Improvements and bugfixes to IndelSummary
-- Now properly includes both bi and multi-allelic variants.  These are actually counted as well, and emitted as counts and % of sites with multiple alleles
-- Bug fix for gold standard rate
2012-04-13 17:00:35 -04:00
Mark DePristo bfa966a4e9 Bugfix for OneBPIndel
-- Previously was only including 1 bp insertions in stratification
2012-04-13 17:00:35 -04:00
Mark DePristo 2aa2d9aec0 Merged bug fix from Stable into Unstable 2012-04-13 09:25:43 -04:00
Mark DePristo 27e7e17dc7 New way to handle exceptions in multi-threaded GATK
-- HMS no longer tries to grab and throw all exceptions.  Exceptions are just thrown directly now.
-- Proper error handling is handled by functions in HMS, which are used by ShardTraverser and TreeReducer
-- Better printing of stack traces in WalkerTest
2012-04-13 09:23:33 -04:00
Mark DePristo e85e9a8cf5 More extensive testing of type of error thrown in multi-threaded walker test
-- Unfortunately the result of the multi-threaded test is non-deterministic so run the test 10x times to see if the right expection is always thrown
-- Now prints the stack trace and exception message of the caught exception of the wrong type, if this occurs
2012-04-13 09:23:33 -04:00
Eric Banks 297afc7911 Added unit test to ensure that we genotype correctly cases with really large GLs 2012-04-12 15:43:14 -04:00
Eric Banks 818e8c2fb9 Resolving merge conflicts 2012-04-12 15:19:44 -04:00
Eric Banks 0dd571928d Let's not have the indel model emit more than the max possible number of genotypable alt alleles (since we may not be able to subset down to the best ones). 2012-04-12 15:16:29 -04:00
Eric Banks f77a6d18b8 Bad conflict merge before 2012-04-12 09:56:49 -04:00
Eric Banks 33a8bdd75f Resolving merge conflicts 2012-04-12 09:51:55 -04:00
Eric Banks b659b16b31 Generate User Error for bad POS value 2012-04-12 09:49:35 -04:00
Eric Banks cc71baf691 Don't allow users to try to genotype more than the max possible value (catch and throw a User Error at startup). Better docs explaining that users shouldn't play with this value unless they know what they are doing. 2012-04-12 09:18:44 -04:00
Eric Banks 5bf9dd2def A framework to get annotations working in the HaplotypeCaller (and ART walkers in general).
Adding support for active-region-based annotation for most standard annotations.  I need to discuss with Ryan what to do about tests that require offsets into the reads (since I don't have access to the offsets) like e.g. the ReadPosRankSumTest.

IMPORTANT NOTE: this is still very much a dev effort and can only be accessed through private walkers (i.e. the HaplotypeCaller).  The interface is in flux and so we are making no attempt at all to make it clean or to merge this with the Locus-Traversal-based annotation system.  When we are satisfied that it's working properly and have settled on the proper interface, we will clean it up then.
2012-04-11 16:22:12 -04:00
Eric Banks 5b7da3831f Not sure why this didn't make it into the last push, but here's a working MD5 for the NDA annotation in UG 2012-04-11 13:49:50 -04:00
Eric Banks 7aa654d13f New interface for some dev work that Ryan and I are doing; only accessible from private walkers right now 2012-04-11 13:49:09 -04:00
Eric Banks dc90508104 Adding a new annotation to UG calls: NDA = number of discovered (but not necessarily genotyped) alleles for the site. This could help downstream analysis esp. of indels for wonky sites (since we only use the top 2-3 alleles). Not enabled by default but we can change that if this turns out to be useful. 2012-04-11 13:47:10 -04:00
Eric Banks d2142c3aa7 Adding integration test for Flag Stat 2012-04-10 22:40:38 -04:00
Eric Banks f560611fe8 Merged bug fix from Stable into Unstable 2012-04-10 22:26:53 -04:00
Eric Banks f46f7d0590 Fix the stats coming out of FlagStat. I will add an integration test in unstable 2012-04-10 22:26:10 -04:00
Mauricio Carneiro cd842b650e Optimizing DiagnoseTargets
* Fixed output format to get a valid vcf
   * Optimzed the per sample pileup routine O(n^2) => O(n) pileup for samples
   * Added support to overlapping intervals
   * Removed expand target functionality (for now)
   * Removed total depth (pointless metric)
2012-04-10 17:43:59 -04:00
Ryan Poplin 1df0adf862 Fixing ActivityProfile unit test. 2012-04-10 15:28:27 -04:00
Ryan Poplin e3cc7cc59c Resolving merge conflict. 2012-04-10 14:50:27 -04:00
Ryan Poplin a4634624b7 There are now three triggering options in the HaplotypeCaller. The default (mismatches, insertions, deletions, high quality soft clips), an external alleles file (from the UG for example), or extended triggers which include low quality soft clips, bad mates and unmapped mates. Added better algorithm for band pass filtering an ActivityProfile and breaking them apart when they get too big. Greatly increased the specificity of the caller by battening down the hatches on things like base quality and mapping quality thresholds for both the assembler and the likelihood function. 2012-04-10 14:48:23 -04:00
Eric Banks 10e74a71eb We now allow arbitrary annotations other than dbSNP (e.g. HM3) to come out of the Unified Genotyper. This was already set up in the Variant Annotator Engine and was just a matter of hooking UG up to it. Added integration test to ensure correct behavior. 2012-04-10 12:30:35 -04:00
Mark DePristo b43d21056b Merged bug fix from Stable into Unstable 2012-04-10 09:42:09 -04:00