Commit Graph

  • fdc7cc555b Removed extra column name from geliHeaderString that was mislabeling the 10 genotype likelihoods by shifting them over by onex andrewk 2009-07-30 21:42:02 +0000
  • f3e63f00bc Exclude secondary base caller code from playground jar. Still TODO: figure out how do deal with the playground jar. hanna 2009-07-30 21:02:46 +0000
  • 0087234ed7 small code cleanup, a couple of little changes to SSGGenotypeCall aaron 2009-07-30 19:47:37 +0000
  • fbc7d44bc7 don't allow users to input priors anymore; they should be using heterozygosity and having the SSG calculate priors. Note that nothing was changed for dnSNP/hapmap priors (not sure what we want to do with these yet - any thoughts?) ebanks 2009-07-30 19:10:33 +0000
  • b282635b05 Complete reworking of Fisher's exact test for strand bias: - fixed math bug (pValue needs to be initialized to pCutoff, not 0) - perform factorial calculations in log space so that huge numbers don't explode - cache factorial calculations so that each value needs to be computed just once for any given instance of the filter ebanks 2009-07-30 18:52:13 +0000
  • 4033c718d2 moving some code around for better organizations, some fixes to the fields out of SSG aaron 2009-07-30 15:09:43 +0000
  • 4366ce16e0 Made sure all RODs have a (good) toString() method - and use it in the Venn walker. (thanks, Mark) ebanks 2009-07-30 14:53:27 +0000
  • 9cd53d3273 some initial changes from the first review of the genotype redesign, more to come. aaron 2009-07-30 07:04:05 +0000
  • feb7238f10 Wasn't always returning the correct alt base ebanks 2009-07-30 03:08:04 +0000
  • 5429b4d4a8 A bit of reorganization to help with more flexible output streams. Pushed construction of data sources and post-construction validation back into the GATKEngine, leaving the MicroScheduler to just microschedule. hanna 2009-07-29 23:00:15 +0000
  • bca894ebce Adding the intial changes for the new Genotyping interface. The bullet points are: aaron 2009-07-29 19:43:59 +0000
  • c5c11d5d1c First attempt at modifying the VFW interfaces to support direct emission of relevant training data per feature and exclusion criterion. This way, you could run the program once, get the training sets, and then feed that training set back to the filters and have them automatically choose the optimal thresholds for themselves. This current version is pretty ugly right now... kiran 2009-07-29 19:29:03 +0000
  • 3554897222 allow filters to specify whether they want to work with mapping quality zero reads; the VariantFiltrationWalker passes in the appropriate contextual reads ebanks 2009-07-29 17:38:15 +0000
  • 7a13647c35 Support for specifying SAMFileReaders and SAMFileWriters as @Arguments directly. *Very* rough initial implementation, but should provide enough support so that people can stop creating SAMFileWriters in reduceInit. hanna 2009-07-29 16:11:45 +0000
  • 56f769f2ce Output improvements to GenotypeConcordance calculations depristo 2009-07-29 12:54:46 +0000
  • 72dda0b85c Fixed calculations for Mark ebanks 2009-07-29 03:21:43 +0000
  • f0378db9b7 added accuracy numbers ebanks 2009-07-29 01:38:33 +0000
  • a5a56f1315 At this point, we are convinced that the new priors are the way to go... ebanks 2009-07-28 17:25:25 +0000
  • df4fd498c5 Improvements and bug fixes galore. (1) Now properly handles Q0 bases, filtering them out, you can disable this if you need to (2) support for three-state base probabilities (see email), which is disabled by default (still experimental) but appears to be more emppowered to detect variants (see email too) depristo 2009-07-28 13:21:38 +0000
  • 46643d3724 Improvements and bug fixes galore. (1) Now properly handles Q0 bases, filtering them out, you can disable this if you need to (2) support for three-state base probabilities (see email), which is disabled by default (still experimental) but appears to be more emppowered to detect variants (see email too) depristo 2009-07-28 13:21:27 +0000
  • d665d9714f By default now writes output to JOBID.lsf.output instead of going to email -- based on recommendations from the cancer group depristo 2009-07-28 13:18:58 +0000
  • 3c4410f104 -add basic indel metrics to variant eval -variants need a length method (can't assume it's a SNP)! ebanks 2009-07-28 03:25:03 +0000
  • 1d6d99ed9c walk by reference kcibul 2009-07-27 20:21:04 +0000
  • 089ae85be7 1. output grep-able strings for genotype eval 2. free DB coverage from isSNP restriction ebanks 2009-07-27 17:36:59 +0000
  • 1bca9409a4 calculate freestanding intervals kcibul 2009-07-27 16:40:27 +0000
  • 2499c09256 added minIndelCount (short: minCnt) command line argument. The call is made only if the number of reads supporting the consensus indel is equal or greater than the specified value (default: 0, so only minFraction filter is on in default runs!) asivache 2009-07-27 15:22:51 +0000
  • 73ddf21bb7 SNPs no longer fail this filter if they are actually hom in reads ebanks 2009-07-27 15:20:43 +0000
  • f2b3fa83ac fix for another bug found by Eric: some indels were printed into the output stream twice (when there's another indel within MISMATCH_WINDOW bases and that other indel requires delayed print in order to accumulate coverage) asivache 2009-07-27 15:07:07 +0000
  • f1109e9070 Added the interator to SAMDataSource to prevent seeing dupplicate reads, only in a byReads traversal. The iterator discards any reads in the current interval that would have been seen in the previous interval. aaron 2009-07-25 22:36:29 +0000
  • 5eca4c353c IndelGenotyper now uses GATK::getMergedReadGroupsByReaders() to sort out which read in the merged stream is for normal, and which is for tumor (in --somatic mode, apparently) asivache 2009-07-24 23:01:18 +0000
  • a361e7b342 SAMDataSource is now exposed by GATK engine; SamFileHeaderMerger is exposed from Resources all the way up to SAMDataSource, so now we can see underlying individual readers should we need them; GATK engine has new methods getSamplesByReaders(), getLibrariesByReaders(), and getMergedReadGroupsByReaders(): each of these methods returns a list of sets, with each element (set) holding, respectively, samples, libraries, or (merged) read groups coming from an individual input bam file (so now when using multiple -I options we can still find out which of the input bams each read comes from) asivache 2009-07-24 22:59:49 +0000
  • 2024fb3e32 Better division of responsibilities between sources and type descriptors. hanna 2009-07-24 22:15:57 +0000
  • 64221907a2 fixed a bug found by Eric: genotyper would crash in the case of an indel too close to the window end, with the next read mapping sufficiently far away on the ref asivache 2009-07-24 21:00:31 +0000
  • 2db86b7829 Move the cleaned read injector test from playground to core. Remove CovariateCounterTest's dependency on the CleanedReadInjector. Start doing a bit of cleanup on the CLP's FieldParsers. hanna 2009-07-24 19:44:04 +0000
  • e2ec703a32 Added indel cleaner and quality scores recalibrator to the GATK package. hanna 2009-07-24 16:20:38 +0000
  • df44bdce7d Retire the pooled caller...its been eclipsed by other walkers in the tree. hanna 2009-07-24 14:49:03 +0000
  • 884806fc16 Broken and unused. It goes away now. kiran 2009-07-24 14:26:52 +0000
  • d044681fbe change paths to new ones ebanks 2009-07-24 07:28:43 +0000
  • 59f0c00d77 -set indel cleaning walkers to be in core package -move Andrey's alignment utility classes to core ebanks 2009-07-24 05:23:29 +0000
  • bb20462a7c A better way: down-scale second-base ratios until the infinities disappear. This way, high-coverage sites don't cause binomialProbability to explode. kiran 2009-07-24 03:02:00 +0000
  • 0b16253db3 an iterator to fix the problem where read-based interval traversals are getting duplicate reads because reads span the two intervals. aaron 2009-07-23 23:59:48 +0000
  • 7c20be157c Added ability to sample from a list *without* replacement. kiran 2009-07-23 21:00:19 +0000
  • 038cbcf80e If the result from the secondary-base test is 0.0, replace the result with a minimum likelihood such that the log-likelihood doesn't underflow. kiran 2009-07-23 20:59:52 +0000
  • 093550a3f2 Removed secondary-base test from SingleSampleGenotyper. It now lives in the variant filtration system. kiran 2009-07-23 20:58:41 +0000
  • 477502338f moved major indel cleaning pieces to core (yippee!) ebanks 2009-07-23 19:59:51 +0000
  • 4efe26c59a Major: allow genotyper to optionally output in 1KG format, including outputting the samples in which indels are found. Minor: refactor 454 filtering ebanks 2009-07-23 19:53:51 +0000
  • f7168bd7cf added the abilty to build the jar's to a different location, like the following: aaron 2009-07-23 04:06:58 +0000
  • f8b1dbe3b3 getBestGenotype() does not necessarily return hets in alphabetical order; the string (unfortunately) needs to be sorted for lookup in the table (otherwise we throw a NullPointerException) TO DO: have the table be smarter instead of sorting each genotype string ebanks 2009-07-23 01:58:47 +0000
  • ee8ed534e0 print full genotype for alt allele ebanks 2009-07-23 01:35:23 +0000
  • 298cc24524 Fix minor bug introduced in filtration, and cleaned up the artificial sam records so that they use SAMRecord.NO_ALIGNMENT_REFERENCE_INDEX and SAMRecord.NO_ALIGNMENT_START rather than hardcoded -1's. hanna 2009-07-22 22:37:41 +0000
  • cac04a407a For Manny: filter out reads where the the ref index == NO_ALIGNMENT_REFERENCE_INDEX but the alignment start != NO_ALIGNMENT_START. hanna 2009-07-22 21:19:24 +0000
  • 9c12c02768 AlleleBalance and on/off primary base filters -- version 0.0.1 -- for experimental use only depristo 2009-07-22 17:54:44 +0000
  • 00f9bcd6d1 CoverageEval.py tool right before some major changes to the core of the code andrewk 2009-07-22 16:58:23 +0000
  • 24e81e3e7b moved to wiki ebanks 2009-07-22 16:35:23 +0000
  • c54fd1da09 Beautify the genotype concordance printouts ebanks 2009-07-22 02:53:02 +0000
  • 6e4fd8db4a Better formatting of available walkers, and only output them along with help. Cleanup JVMUtils. hanna 2009-07-21 22:23:28 +0000
  • 761d70faa1 Better printing of multiple rods -- now produces a comma-separated set of values depristo 2009-07-21 21:58:27 +0000
  • 8588f75eb6 Better printing with toSimpleString() -- now prints out chip-genotype string depristo 2009-07-21 21:57:59 +0000
  • 1843684cd2 Cleanup: GATKEngine no longer needs to be lazy loaded, b/c the plugin directory no longer exists. hanna 2009-07-21 18:50:51 +0000
  • b43925c01e Switched to Reflections (http://code.google.com/p/reflections/) project for inspecting the source tree and loading walkers, rather than trying to roll our own by hand. hanna 2009-07-21 18:32:22 +0000
  • 436a196e2b Bug fixes to support hapmap genotyping concordance. kiran 2009-07-21 16:20:10 +0000
  • 7e04313b4e Bug fixes and improvements to CoverageHistogram. Now displays the frequency of the bin. Also correctly prints out the last element in the coverage histogram (<= vs. <) depristo 2009-07-21 11:55:05 +0000
  • f13a1e8591 adding a couple of small changes to support contract with VariantEval aaron 2009-07-21 03:49:15 +0000
  • b4adb5133a GLF rod as a AllelicVariant object. aaron 2009-07-21 00:55:52 +0000
  • f314ef8d84 Features and exclusion criteria are now instantiated in VariantFiltrationWalker's initialize() method, rather than in every map() call. This means the features and exclusion criteria will only ever be initialized once. kiran 2009-07-20 22:47:21 +0000
  • 54fce98056 duh, don't print newline ebanks 2009-07-20 03:04:27 +0000
  • 1d2b545608 add FLT toString method (to be used in PrintRODs) and add it to ROD list ebanks 2009-07-20 02:47:50 +0000
  • 8da754eb4e First implementation of a primary base filter. Assumes distribution of on/off bases is distributed according to a binomial. mmelgar 2009-07-17 18:43:35 +0000
  • 24ebfee604 don't print traversal stats ebanks 2009-07-17 16:13:28 +0000
  • 387316ebe1 added indel rod ebanks 2009-07-17 16:05:51 +0000
  • da4af3b620 print indels in the format required for 1KG submissions ebanks 2009-07-17 15:59:18 +0000
  • d45c90b166 ROD to represent simple output from IndelGenotyper ebanks 2009-07-17 14:36:12 +0000
  • f978b04633 A very simple walker to print out (using the ROD's toString method) all of the RODs it sees. This is the easiest solution to get around the (temporary) bug of reads being seen multiple times by reads walkers when close intervals are passed to them (i.e. process full contigs and then use a ref walker to filter the ones within your intervals of choice) ebanks 2009-07-17 14:03:34 +0000
  • 129ad97ce5 performance improvement to GenomeLocParser -- moved regex pattern compile out of local field kcibul 2009-07-17 02:56:25 +0000
  • df1c61e049 Re-add the plugin path. hanna 2009-07-16 22:48:44 +0000
  • 7c30c30d26 Cleaned up some duplicate code in preparation for making plugin dir configurable. hanna 2009-07-16 22:02:21 +0000
  • 31f3f466ca Improvements to support GLF generation -- now correctly handles GLF depristo 2009-07-16 21:10:39 +0000
  • 107f42a01e Hacks for getting GLFs support in the Rod system working depristo 2009-07-16 21:03:47 +0000
  • 0548026a2e Now understanding GLFs for calculating genotyping results like callable bases, as well as avoids emitting stupid amounts of data when doing a genotype evaluation (i.e., ignores non-SNP() calls) depristo 2009-07-16 21:03:26 +0000
  • c5f6ab3dd5 CoverageHistogram now sees 0 coverage sites depristo 2009-07-16 20:58:41 +0000
  • 8bc0832215 Generate chip concordance table. This should work, although I need to test it with some real GLFs ebanks 2009-07-16 17:44:47 +0000
  • 88ffb08af4 Need to return real values for some of the AllelicVariant methods ebanks 2009-07-16 02:31:10 +0000
  • 045d74d09c Cleanup my pathetic prose. hanna 2009-07-15 21:35:13 +0000
  • a04f205a7f GATK readme. hanna 2009-07-15 21:00:08 +0000
  • e1055bcc4c moving to new external repository kcibul 2009-07-15 20:46:08 +0000
  • 4a730adfc1 committing latest changes before moving repositories kcibul 2009-07-15 20:44:02 +0000
  • 692b1e206f stop throwing an exception here: we don't always have allele counts ebanks 2009-07-15 20:34:01 +0000
  • a245ee32fa A walker to split 2 call sets into their intersection/union/disjoint (sub)sets. Yes, the name is retarded, but I'm under pressure here... ebanks 2009-07-15 20:20:47 +0000
  • ba349e8d52 add FLT ROD ebanks 2009-07-15 19:40:50 +0000
  • 800f7e6360 make AllelicVariant extend ReferenceOrderedDatum (not Comparable) since ROD itself is Comparable. Then we can generalize RMD tags. Blame Matt if this doesn't work - he said it wouldn't break anything. ebanks 2009-07-15 19:25:06 +0000
  • 00d49976fb committing latest changes before moving repositories kcibul 2009-07-15 18:41:52 +0000
  • 5be5e1d45f added conversion from iupac format and new rod to deal with FLT file format ebanks 2009-07-15 18:34:41 +0000
  • 702cdd087f Actually listens to justPrint now depristo 2009-07-15 16:52:46 +0000
  • d36e232ed3 adding GLF rods to the module list aaron 2009-07-15 15:42:34 +0000
  • 9ecb3e0015 adding GLFRods with tests and some other code changes aaron 2009-07-15 15:30:19 +0000
  • c25f84a01c Regression: we lost our hack to work around BAM files with index problems (affects BAM files created before 23 Apr 2009 and traversed by interval). Added the hack back in, along with a much more explicit comment about why its there. hanna 2009-07-15 14:41:37 +0000
  • 1798aff01b VariantEval now understands the difference between a population-level analysis and a genotype analysis, and handles both. All analyses annotated as supporting one or the other or both. Preparation for genotype chip concordance calculations as well as called sites, etc analyses depristo 2009-07-15 14:07:13 +0000
  • 513d43b5f3 now implements AllelicVariant ebanks 2009-07-15 14:06:25 +0000
  • d369136bda depricate this ROD yet again ebanks 2009-07-15 13:33:03 +0000
  • 5d1345539d Packaging cleanup. hanna 2009-07-14 21:00:02 +0000