Commit Graph

  • 1daa011387 Interval-based traversals were bleeding file handles. Fixed. hanna 2009-04-26 18:35:54 +0000
  • 1e2e78265d Inadvertently removed interval file support in new TbLbR. Fixed. hanna 2009-04-26 18:15:42 +0000
  • c9e9731495 More cleanup. hanna 2009-04-26 17:46:52 +0000
  • 4036f24909 Documentation and cleanup work in preparation for parallelism. hanna 2009-04-26 17:42:00 +0000
  • 0c76a70313 Renamed traversal by "interval" to "locusWindow" ebanks 2009-04-26 02:26:08 +0000
  • 9a299c11d3 Oops, typo and build problems. FYI, fixing typos is better than packing... depristo 2009-04-25 01:37:17 +0000
  • ce470702fc consistency with java naming conventions depristo 2009-04-24 21:44:48 +0000
  • bfce0c93ab removing bad file depristo 2009-04-24 21:40:04 +0000
  • 05c6679321 Enabled ReduceByInterval depristo 2009-04-24 21:39:44 +0000
  • ee2f022c71 Make new TraverseByLociByReference the default. hanna 2009-04-24 19:50:11 +0000
  • e50ae97fe1 Introduce new index-based fasta reader. Clean up MicroManager code, pushing necessary code back into TraversalEngine. hanna 2009-04-24 19:40:21 +0000
  • 3739682bef Actually has working version of the python script to merge multiple bam files depristo 2009-04-24 19:15:55 +0000
  • 40a2b3eeb3 Basic logistic regression support for calibrating qualities; mostly for Andrew to experiment with depristo 2009-04-24 19:09:50 +0000
  • 38c2f73457 LogRegression.py script that converts parameter files for each dinucleotide regression into one file to be read in by correction script. andrewk 2009-04-24 18:31:26 +0000
  • 061f4328b1 Covariate counter now outputs files used by R to do logistic regression. andrewk 2009-04-24 17:11:57 +0000
  • 4e4fd33584 First draft of actual pooled EM caller. jmaguire 2009-04-24 13:43:41 +0000
  • dd408a2a9a First draft of actual pooled EM caller. jmaguire 2009-04-24 13:42:15 +0000
  • 13d4692d2e 1. Added a by-interval traversal. 2. Added a shell for the indel cleaner walker (it's currently being used to test the interval traversal). 3. Fixed small bug in downsampling (make sure to downsample the offsets too) 4. GenomeAnalysisTK.execute => anyone object to my change to "instanceof" instead of trying to catch a ClassCastException (yuck)? ebanks 2009-04-24 04:33:35 +0000
  • 1984bb2d13 Made num_loci_total public because I'm lazy. I'll change it back later. kiran 2009-04-24 03:57:23 +0000
  • 7ce11e152b Simplified. Added option to perform four-base retest of a putative variant. kiran 2009-04-24 03:56:15 +0000
  • 135d3eabeb Now only distributes 80% of the residual probability to the secondary base, 10% each to the other two bases. Nicer labelling for stringified probability distribution output. kiran 2009-04-24 03:34:43 +0000
  • 3cda85f2e3 New implementation of binomial probability that accurately computes values down to around 1e-237. kiran 2009-04-24 03:32:04 +0000
  • 305584b69e Test class for MathUtils with a test for binomialProbability(). kiran 2009-04-24 03:31:02 +0000
  • bd4cacb832 Added code to make a read group and sample name for BAM files that don't annotate them on reads. The defaults for both are now the filename, but this may be shortened in the future. aaron 2009-04-24 00:31:00 +0000
  • 45d962e491 I understood the contig index incorrectly when I initially wrote this code. Fixed. hanna 2009-04-23 22:31:43 +0000
  • 635bfd8604 Added a little bit of hack to get the header back to the walker by initialization time, which was before sharding in the last version. aaron 2009-04-23 21:07:11 +0000
  • 0208d201c7 Forgot this in the last commit... aaron 2009-04-23 20:47:22 +0000
  • 3dc2afd7ab Added the ability to get a merged header in a LociByReference traversal aaron 2009-04-23 20:34:52 +0000
  • 282f1d88b8 Make the operation 'read from the iterator and place on the queue' atomic with respect to hasNext(), next(). hanna 2009-04-23 20:16:26 +0000
  • 998763950c Oops, contig index is a zero not one based value aaron 2009-04-23 19:08:16 +0000
  • 8c13940c5a A lot of changes to support by-read sharding and some from debugging of the by loci traversals aaron 2009-04-23 19:03:14 +0000
  • 32715a6c47 First check-in of walker that produces tables showing covariation of read cycle, and dinucleotide with quality score in a format usable for R analysis and for doing logistic regression. andrewk 2009-04-23 18:58:25 +0000
  • 0720d248ce Adding the test case for by reads sharding of BAM data sources aaron 2009-04-23 18:01:22 +0000
  • cae54ec52d Walker for creating intervals to be used in the indel cleaner ebanks 2009-04-23 17:58:19 +0000
  • 96db1477d4 I meant for default lod threshold to be 5.0, not 0.0. kiran 2009-04-23 17:46:08 +0000
  • ca66cccd2f Privatized constructor to prevent instantiation. kiran 2009-04-23 17:45:39 +0000
  • 77e1e9e2f1 Added a static class to house useful math methods. All this has at the moment are methods for comparing doubles and floats, but I suggest that the bulk of our little math methods should be added here to avoid filling up Utils.java with so much random stuff. kiran 2009-04-23 17:45:19 +0000
  • 3d7575bbb8 Oops...omitted walker.initialize(). hanna 2009-04-23 17:35:28 +0000
  • 11e85f1969 Four-base mode now estimates the genotype using the one-base method and retests the site if the one-base method suggests the site is a het. kiran 2009-04-23 17:23:24 +0000
  • bd719f9c06 When checking that values are not infinite, also prints out the position so that I know which site was giving the error and I can just go there and debug it. kiran 2009-04-23 17:21:58 +0000
  • efba30f1a1 Added a constructor in which the lod threshold can be set. kiran 2009-04-23 17:20:48 +0000
  • 8c1905c7d9 Simple walker to print all of the sample names present in a merged bam file. jmaguire 2009-04-23 12:26:56 +0000
  • ef4a107548 Updated the hello world document to reflect system changes. aaron 2009-04-22 23:25:15 +0000
  • a3a1c9dae8 Suppressed emission of duplicate paths through a four-base pileup. kiran 2009-04-22 21:08:45 +0000
  • 6cef8bd76c added k-best quality path enumeration. jmaguire 2009-04-22 20:26:51 +0000
  • b8a6f6e830 Support for indexBAM command depristo 2009-04-22 19:39:07 +0000
  • d99d67d51c Refactored to clean it up a bit ebanks 2009-04-22 19:18:46 +0000
  • 1bf4d040d8 Increase default shard size from 5 to 100000. hanna 2009-04-22 18:29:44 +0000
  • 3af66a462e Make PrintLocusContextWalker less verbose. hanna 2009-04-22 18:28:02 +0000
  • ffcd672c1c Intermediate commit while working on getting four-base probs to work in the single sample genotyper. Has infrastructure for the new combinatorial approach and just choosing the best base more intelligently given a probability distribution over bases and the reference base. kiran 2009-04-22 18:06:50 +0000
  • 4cafb95be8 TraverseByLoci / TraverseByLociByReference suffered from the same sam-triggered off-by-one (?) bug as TraverseByReference; it was just less obvious here because these versions don't shard. hanna 2009-04-22 15:48:20 +0000
  • cb2f621d01 reverting accidental commit of change to shard size kcibul 2009-04-22 00:33:28 +0000
  • b820130dce * added ability to load multiple BAM files from command line kcibul 2009-04-22 00:28:08 +0000
  • 5b8502745a Added an epsilon (1e-4) to the tertiary and quaternary base hypotheses. kiran 2009-04-22 00:01:37 +0000
  • 2ac240d78b Removed an extraneous print statement. kiran 2009-04-21 23:36:36 +0000
  • 0149c887ff Fixed a bug wherein the residual probability was not being distributed properly when a file had secondary probs and the best and next-best base agreed. kiran 2009-04-21 23:36:09 +0000
  • 5abfc7d079 Added an argument ('extended' or 'ext') that outputs the four-base probs in a long format. kiran 2009-04-21 22:27:26 +0000
  • dac76f041b Added some methods to retreive the probability distributions of individual bases. kiran 2009-04-21 22:26:25 +0000
  • 5b2a7c9c23 Added some methods to complement a single simple base ([AaCcGgTt]) and reverse-complement a byte-array of bases. kiran 2009-04-21 22:25:33 +0000
  • 521e202a10 updated interface asivache 2009-04-21 21:07:20 +0000
  • 55ca272919 reimplemented; now implements Genotype interface instead of AllelicVariant asivache 2009-04-21 21:06:42 +0000
  • 5f37ba8f26 now can be asked to log at INFO level all concordant or discordant sites, or both asivache 2009-04-21 21:03:44 +0000
  • 1f84b9647d auxiliary data structure for mendelian concordance reporting; it's nice to have the latest version checked in in order for the code to compile... asivache 2009-04-21 21:02:40 +0000
  • ece3e9969e one trivial walker to filter reads; bam in -> filter -> bam out asivache 2009-04-21 20:39:29 +0000
  • 61e855200d latest version... asivache 2009-04-21 20:38:37 +0000
  • 64b2fd866f * extracted core quality-score based genotype likelihood code * precompute expensive operations (log/pow) based on Picard experience kcibul 2009-04-21 18:58:43 +0000
  • 11c520b283 completed my old draft of the old school single sample genotype walker jmaguire 2009-04-21 05:38:04 +0000
  • b8233d92c8 Simple IO walker to test / crush file systems and evalute I/O performance in general depristo 2009-04-20 14:07:14 +0000
  • bf76eab955 whoops; fix a comment line. jmaguire 2009-04-19 17:54:54 +0000
  • bcba1ff424 Fix a minor rounding bug and putz around with fractional counts in the pooled caller. jmaguire 2009-04-19 17:52:24 +0000
  • af6788fa3d Misc: 1. Added logGamma function to utils 2. Required asserts to be enabled in the allele caller (run with java -ea) 3. put checks and asserts of NaN and Infinity in AlleleFrequencyEstimate 4. Added option FRACTIONAL_COUNTS to the pooled caller (not working right yet) jmaguire 2009-04-19 15:35:07 +0000
  • eafb4633ba Temporary workaround for samtools index bug: there seems to be an off-by-one error. Will file bug report. hanna 2009-04-17 23:14:41 +0000
  • 758db73b98 Fixed SLOWNESS issue. ebanks 2009-04-17 20:10:34 +0000
  • 2a937fa8d3 set SAM file header's sorting order to unsorted, hopefully it will help to speed things up asivache 2009-04-17 19:32:24 +0000
  • 03ec3452f2 a first, simplest version of a walker that filters out reads based on user-specified criteria and writes remaining reads into a new bam file asivache 2009-04-17 18:51:39 +0000
  • 1660379753 Matt's current status. hanna 2009-04-17 18:03:11 +0000
  • 01d000411d added some updates to the omniplan aaron 2009-04-17 17:44:04 +0000
  • f2f9fa3ed4 doc added asivache 2009-04-17 16:43:25 +0000
  • d639ec3776 Remove some copied code to make sure the traversal engine stays in sync with the locus context provider. hanna 2009-04-17 16:41:56 +0000
  • df5aae5ed4 got read of a couple of warnings and added percentage(x,base) methods asivache 2009-04-17 15:15:21 +0000
  • 50ae1763f7 Support for -continue_after_errors flag in the validating pileup walker in case you want to see errors as they arise, rather than aborting greedily depristo 2009-04-17 03:13:11 +0000
  • ee5ab9536f trivial checking / flagging issues to enable testing of merging iterator performance depristo 2009-04-17 03:11:59 +0000
  • dbf2344cef Fixes for including duplicate reads in the locus traversal; now checks that the ref arg is provided when needed depristo 2009-04-17 01:27:36 +0000
  • e842b543c9 Better validation scripts depristo 2009-04-16 23:18:00 +0000
  • 01be8f09e3 Exception cleanup. All our non-runtime exceptions should extend from StingException, StingException needs to be lower in the tree to build. hanna 2009-04-16 22:17:25 +0000
  • e5c80e59dc fixed the case when you're not seeking, it didn't initalize aaron 2009-04-16 22:16:03 +0000
  • f47f640df6 Better debugging output and testing depristo 2009-04-16 21:54:56 +0000
  • 165e504d1c Turn on new TraverseLociByReference is now only dependent on the -et flag. REGION_STR does not matter. hanna 2009-04-16 19:45:47 +0000
  • 12e1f192c4 Fixed a bug in this code where it would eat reads that didn't start at the beginning of the provided interval. This should fix / help fix Kristian problem aaron 2009-04-16 18:42:00 +0000
  • 835f1067d8 added isHom() and isHet() queries to the Genotype interface (with the obvious meaning) asivache 2009-04-16 18:41:39 +0000
  • 55537c0d1e chnage class name, now it compiles... asivache 2009-04-16 16:51:00 +0000
  • 4f9bc7206f some cleanup, also ensuring that all reads get written into output asivache 2009-04-16 16:49:25 +0000
  • e8a6cdb386 renamed standalone main asivache 2009-04-16 15:56:46 +0000
  • 832afd3d60 renamed standalone main asivache 2009-04-16 15:56:27 +0000
  • 85308f4ddc resurrected indel tool's standalone main asivache 2009-04-16 15:55:52 +0000
  • 6f56938d42 * added a bit more debugging output kcibul 2009-04-16 15:20:26 +0000
  • d35a542bb9 * fixed bug where the merged header was not being set on the read (although the read group was) kcibul 2009-04-16 12:53:07 +0000
  • 240eb18564 fix a few related issues when not all the reads were written into the output files. now cleaned output still contains all reads either with modified alignments or untouched asivache 2009-04-16 03:56:47 +0000
  • 0d324354ae separate interface for genotypes as opposed to (population) allelic variants asivache 2009-04-16 03:55:16 +0000
  • 7e05b43f40 * added some error checking for read groups kcibul 2009-04-16 03:22:49 +0000