fdc7cc555bRemoved extra column name from geliHeaderString that was mislabeling the 10 genotype likelihoods by shifting them over by onex
andrewk
2009-07-30 21:42:02 +0000
f3e63f00bcExclude secondary base caller code from playground jar. Still TODO: figure out how do deal with the playground jar.
hanna
2009-07-30 21:02:46 +0000
0087234ed7small code cleanup, a couple of little changes to SSGGenotypeCall
aaron
2009-07-30 19:47:37 +0000
fbc7d44bc7don't allow users to input priors anymore; they should be using heterozygosity and having the SSG calculate priors. Note that nothing was changed for dnSNP/hapmap priors (not sure what we want to do with these yet - any thoughts?)
ebanks
2009-07-30 19:10:33 +0000
b282635b05Complete reworking of Fisher's exact test for strand bias: - fixed math bug (pValue needs to be initialized to pCutoff, not 0) - perform factorial calculations in log space so that huge numbers don't explode - cache factorial calculations so that each value needs to be computed just once for any given instance of the filter
ebanks
2009-07-30 18:52:13 +0000
4033c718d2moving some code around for better organizations, some fixes to the fields out of SSG
aaron
2009-07-30 15:09:43 +0000
4366ce16e0Made sure all RODs have a (good) toString() method - and use it in the Venn walker. (thanks, Mark)
ebanks
2009-07-30 14:53:27 +0000
9cd53d3273some initial changes from the first review of the genotype redesign, more to come.
aaron
2009-07-30 07:04:05 +0000
feb7238f10Wasn't always returning the correct alt base
ebanks
2009-07-30 03:08:04 +0000
5429b4d4a8A bit of reorganization to help with more flexible output streams. Pushed construction of data sources and post-construction validation back into the GATKEngine, leaving the MicroScheduler to just microschedule.
hanna
2009-07-29 23:00:15 +0000
bca894ebceAdding the intial changes for the new Genotyping interface. The bullet points are:
aaron
2009-07-29 19:43:59 +0000
c5c11d5d1cFirst attempt at modifying the VFW interfaces to support direct emission of relevant training data per feature and exclusion criterion. This way, you could run the program once, get the training sets, and then feed that training set back to the filters and have them automatically choose the optimal thresholds for themselves. This current version is pretty ugly right now...
kiran
2009-07-29 19:29:03 +0000
3554897222allow filters to specify whether they want to work with mapping quality zero reads; the VariantFiltrationWalker passes in the appropriate contextual reads
ebanks
2009-07-29 17:38:15 +0000
7a13647c35Support for specifying SAMFileReaders and SAMFileWriters as @Arguments directly. *Very* rough initial implementation, but should provide enough support so that people can stop creating SAMFileWriters in reduceInit.
hanna
2009-07-29 16:11:45 +0000
56f769f2ceOutput improvements to GenotypeConcordance calculations
depristo
2009-07-29 12:54:46 +0000
72dda0b85cFixed calculations for Mark
ebanks
2009-07-29 03:21:43 +0000
a5a56f1315At this point, we are convinced that the new priors are the way to go...
ebanks
2009-07-28 17:25:25 +0000
df4fd498c5Improvements and bug fixes galore. (1) Now properly handles Q0 bases, filtering them out, you can disable this if you need to (2) support for three-state base probabilities (see email), which is disabled by default (still experimental) but appears to be more emppowered to detect variants (see email too)
depristo
2009-07-28 13:21:38 +0000
46643d3724Improvements and bug fixes galore. (1) Now properly handles Q0 bases, filtering them out, you can disable this if you need to (2) support for three-state base probabilities (see email), which is disabled by default (still experimental) but appears to be more emppowered to detect variants (see email too)
depristo
2009-07-28 13:21:27 +0000
d665d9714fBy default now writes output to JOBID.lsf.output instead of going to email -- based on recommendations from the cancer group
depristo
2009-07-28 13:18:58 +0000
3c4410f104-add basic indel metrics to variant eval -variants need a length method (can't assume it's a SNP)!
ebanks
2009-07-28 03:25:03 +0000
1d6d99ed9cwalk by reference
kcibul
2009-07-27 20:21:04 +0000
089ae85be71. output grep-able strings for genotype eval 2. free DB coverage from isSNP restriction
ebanks
2009-07-27 17:36:59 +0000
2499c09256added minIndelCount (short: minCnt) command line argument. The call is made only if the number of reads supporting the consensus indel is equal or greater than the specified value (default: 0, so only minFraction filter is on in default runs!)
asivache
2009-07-27 15:22:51 +0000
73ddf21bb7SNPs no longer fail this filter if they are actually hom in reads
ebanks
2009-07-27 15:20:43 +0000
f2b3fa83acfix for another bug found by Eric: some indels were printed into the output stream twice (when there's another indel within MISMATCH_WINDOW bases and that other indel requires delayed print in order to accumulate coverage)
asivache
2009-07-27 15:07:07 +0000
f1109e9070Added the interator to SAMDataSource to prevent seeing dupplicate reads, only in a byReads traversal. The iterator discards any reads in the current interval that would have been seen in the previous interval.
aaron
2009-07-25 22:36:29 +0000
5eca4c353cIndelGenotyper now uses GATK::getMergedReadGroupsByReaders() to sort out which read in the merged stream is for normal, and which is for tumor (in --somatic mode, apparently)
asivache
2009-07-24 23:01:18 +0000
a361e7b342SAMDataSource is now exposed by GATK engine; SamFileHeaderMerger is exposed from Resources all the way up to SAMDataSource, so now we can see underlying individual readers should we need them; GATK engine has new methods getSamplesByReaders(), getLibrariesByReaders(), and getMergedReadGroupsByReaders(): each of these methods returns a list of sets, with each element (set) holding, respectively, samples, libraries, or (merged) read groups coming from an individual input bam file (so now when using multiple -I options we can still find out which of the input bams each read comes from)
asivache
2009-07-24 22:59:49 +0000
2024fb3e32Better division of responsibilities between sources and type descriptors.
hanna
2009-07-24 22:15:57 +0000
64221907a2fixed a bug found by Eric: genotyper would crash in the case of an indel too close to the window end, with the next read mapping sufficiently far away on the ref
asivache
2009-07-24 21:00:31 +0000
2db86b7829Move the cleaned read injector test from playground to core. Remove CovariateCounterTest's dependency on the CleanedReadInjector. Start doing a bit of cleanup on the CLP's FieldParsers.
hanna
2009-07-24 19:44:04 +0000
e2ec703a32Added indel cleaner and quality scores recalibrator to the GATK package.
hanna
2009-07-24 16:20:38 +0000
df44bdce7dRetire the pooled caller...its been eclipsed by other walkers in the tree.
hanna
2009-07-24 14:49:03 +0000
884806fc16Broken and unused. It goes away now.
kiran
2009-07-24 14:26:52 +0000
d044681fbechange paths to new ones
ebanks
2009-07-24 07:28:43 +0000
59f0c00d77-set indel cleaning walkers to be in core package -move Andrey's alignment utility classes to core
ebanks
2009-07-24 05:23:29 +0000
bb20462a7cA better way: down-scale second-base ratios until the infinities disappear. This way, high-coverage sites don't cause binomialProbability to explode.
kiran
2009-07-24 03:02:00 +0000
0b16253db3an iterator to fix the problem where read-based interval traversals are getting duplicate reads because reads span the two intervals.
aaron
2009-07-23 23:59:48 +0000
7c20be157cAdded ability to sample from a list *without* replacement.
kiran
2009-07-23 21:00:19 +0000
038cbcf80eIf the result from the secondary-base test is 0.0, replace the result with a minimum likelihood such that the log-likelihood doesn't underflow.
kiran
2009-07-23 20:59:52 +0000
093550a3f2Removed secondary-base test from SingleSampleGenotyper. It now lives in the variant filtration system.
kiran
2009-07-23 20:58:41 +0000
477502338fmoved major indel cleaning pieces to core (yippee!)
ebanks
2009-07-23 19:59:51 +0000
4efe26c59aMajor: allow genotyper to optionally output in 1KG format, including outputting the samples in which indels are found. Minor: refactor 454 filtering
ebanks
2009-07-23 19:53:51 +0000
f7168bd7cfadded the abilty to build the jar's to a different location, like the following:
aaron
2009-07-23 04:06:58 +0000
f8b1dbe3b3getBestGenotype() does not necessarily return hets in alphabetical order; the string (unfortunately) needs to be sorted for lookup in the table (otherwise we throw a NullPointerException) TO DO: have the table be smarter instead of sorting each genotype string
ebanks
2009-07-23 01:58:47 +0000
ee8ed534e0print full genotype for alt allele
ebanks
2009-07-23 01:35:23 +0000
298cc24524Fix minor bug introduced in filtration, and cleaned up the artificial sam records so that they use SAMRecord.NO_ALIGNMENT_REFERENCE_INDEX and SAMRecord.NO_ALIGNMENT_START rather than hardcoded -1's.
hanna
2009-07-22 22:37:41 +0000
cac04a407aFor Manny: filter out reads where the the ref index == NO_ALIGNMENT_REFERENCE_INDEX but the alignment start != NO_ALIGNMENT_START.
hanna
2009-07-22 21:19:24 +0000
9c12c02768AlleleBalance and on/off primary base filters -- version 0.0.1 -- for experimental use only
depristo
2009-07-22 17:54:44 +0000
00f9bcd6d1CoverageEval.py tool right before some major changes to the core of the code
andrewk
2009-07-22 16:58:23 +0000
24e81e3e7bmoved to wiki
ebanks
2009-07-22 16:35:23 +0000
c54fd1da09Beautify the genotype concordance printouts
ebanks
2009-07-22 02:53:02 +0000
6e4fd8db4aBetter formatting of available walkers, and only output them along with help. Cleanup JVMUtils.
hanna
2009-07-21 22:23:28 +0000
761d70faa1Better printing of multiple rods -- now produces a comma-separated set of values
depristo
2009-07-21 21:58:27 +0000
8588f75eb6Better printing with toSimpleString() -- now prints out chip-genotype string
depristo
2009-07-21 21:57:59 +0000
1843684cd2Cleanup: GATKEngine no longer needs to be lazy loaded, b/c the plugin directory no longer exists.
hanna
2009-07-21 18:50:51 +0000
b43925c01eSwitched to Reflections (http://code.google.com/p/reflections/) project for inspecting the source tree and loading walkers, rather than trying to roll our own by hand.
hanna
2009-07-21 18:32:22 +0000
436a196e2bBug fixes to support hapmap genotyping concordance.
kiran
2009-07-21 16:20:10 +0000
7e04313b4eBug fixes and improvements to CoverageHistogram. Now displays the frequency of the bin. Also correctly prints out the last element in the coverage histogram (<= vs. <)
depristo
2009-07-21 11:55:05 +0000
f13a1e8591adding a couple of small changes to support contract with VariantEval
aaron
2009-07-21 03:49:15 +0000
b4adb5133aGLF rod as a AllelicVariant object.
aaron
2009-07-21 00:55:52 +0000
f314ef8d84Features and exclusion criteria are now instantiated in VariantFiltrationWalker's initialize() method, rather than in every map() call. This means the features and exclusion criteria will only ever be initialized once.
kiran
2009-07-20 22:47:21 +0000
1d2b545608add FLT toString method (to be used in PrintRODs) and add it to ROD list
ebanks
2009-07-20 02:47:50 +0000
8da754eb4eFirst implementation of a primary base filter. Assumes distribution of on/off bases is distributed according to a binomial.
mmelgar
2009-07-17 18:43:35 +0000
387316ebe1added indel rod
ebanks
2009-07-17 16:05:51 +0000
da4af3b620print indels in the format required for 1KG submissions
ebanks
2009-07-17 15:59:18 +0000
d45c90b166ROD to represent simple output from IndelGenotyper
ebanks
2009-07-17 14:36:12 +0000
f978b04633A very simple walker to print out (using the ROD's toString method) all of the RODs it sees. This is the easiest solution to get around the (temporary) bug of reads being seen multiple times by reads walkers when close intervals are passed to them (i.e. process full contigs and then use a ref walker to filter the ones within your intervals of choice)
ebanks
2009-07-17 14:03:34 +0000
129ad97ce5performance improvement to GenomeLocParser -- moved regex pattern compile out of local field
kcibul
2009-07-17 02:56:25 +0000
df1c61e049Re-add the plugin path.
hanna
2009-07-16 22:48:44 +0000
7c30c30d26Cleaned up some duplicate code in preparation for making plugin dir configurable.
hanna
2009-07-16 22:02:21 +0000
31f3f466caImprovements to support GLF generation -- now correctly handles GLF
depristo
2009-07-16 21:10:39 +0000
107f42a01eHacks for getting GLFs support in the Rod system working
depristo
2009-07-16 21:03:47 +0000
0548026a2eNow understanding GLFs for calculating genotyping results like callable bases, as well as avoids emitting stupid amounts of data when doing a genotype evaluation (i.e., ignores non-SNP() calls)
depristo
2009-07-16 21:03:26 +0000
c5f6ab3dd5CoverageHistogram now sees 0 coverage sites
depristo
2009-07-16 20:58:41 +0000
8bc0832215Generate chip concordance table. This should work, although I need to test it with some real GLFs
ebanks
2009-07-16 17:44:47 +0000
88ffb08af4Need to return real values for some of the AllelicVariant methods
ebanks
2009-07-16 02:31:10 +0000
045d74d09cCleanup my pathetic prose.
hanna
2009-07-15 21:35:13 +0000
a04f205a7fGATK readme.
hanna
2009-07-15 21:00:08 +0000
e1055bcc4cmoving to new external repository
kcibul
2009-07-15 20:46:08 +0000
4a730adfc1committing latest changes before moving repositories
kcibul
2009-07-15 20:44:02 +0000
692b1e206fstop throwing an exception here: we don't always have allele counts
ebanks
2009-07-15 20:34:01 +0000
a245ee32faA walker to split 2 call sets into their intersection/union/disjoint (sub)sets. Yes, the name is retarded, but I'm under pressure here...
ebanks
2009-07-15 20:20:47 +0000
ba349e8d52add FLT ROD
ebanks
2009-07-15 19:40:50 +0000
800f7e6360make AllelicVariant extend ReferenceOrderedDatum (not Comparable) since ROD itself is Comparable. Then we can generalize RMD tags. Blame Matt if this doesn't work - he said it wouldn't break anything.
ebanks
2009-07-15 19:25:06 +0000
00d49976fbcommitting latest changes before moving repositories
kcibul
2009-07-15 18:41:52 +0000
5be5e1d45fadded conversion from iupac format and new rod to deal with FLT file format
ebanks
2009-07-15 18:34:41 +0000
702cdd087fActually listens to justPrint now
depristo
2009-07-15 16:52:46 +0000
d36e232ed3adding GLF rods to the module list
aaron
2009-07-15 15:42:34 +0000
9ecb3e0015adding GLFRods with tests and some other code changes
aaron
2009-07-15 15:30:19 +0000
c25f84a01cRegression: we lost our hack to work around BAM files with index problems (affects BAM files created before 23 Apr 2009 and traversed by interval). Added the hack back in, along with a much more explicit comment about why its there.
hanna
2009-07-15 14:41:37 +0000
1798aff01bVariantEval now understands the difference between a population-level analysis and a genotype analysis, and handles both. All analyses annotated as supporting one or the other or both. Preparation for genotype chip concordance calculations as well as called sites, etc analyses
depristo
2009-07-15 14:07:13 +0000