Commit Graph

  • 12752cf893 Added a bunch of fixes: MSRI wasn't working, sharding had broken edge cases, and SAMBAM DS needed to close the file handles. aaron 2009-04-09 00:20:15 +0000
  • 8efedacabf Bump sam jdk to svn rev 207. hanna 2009-04-08 22:16:46 +0000
  • 089bf30cf4 Send things to the out file via the logger. kiran 2009-04-08 21:49:03 +0000
  • 6db9a00a0b SAMFileWriter doesn't appear to flush the buffer when its destructor is called. You have to call the close() method. Also, choose a random base for Ns in the forward and reverse strands so that samtools doesn't pitch a fit. kiran 2009-04-08 21:48:24 +0000
  • eb2f0ebd62 If the first base of a read is 'N', and the alignment cigar says every base matches, samtools calls shennanigans. Now I just output an A, but the real way to do this is to modify the cigar string accordingly. kiran 2009-04-08 19:58:18 +0000
  • 0e7d962eca Oops. Slight twiddle of the math here so that I'm not asking if bestBase == nextBestBase. kiran 2009-04-08 19:56:54 +0000
  • d4ab95c098 Added a constructor, took out a copy constructor, and changed some SAMBAM code. aaron 2009-04-08 19:53:20 +0000
  • 0b81a76420 added support for Picard IntervalList files to --interval_file kcibul 2009-04-08 16:49:43 +0000
  • 295c269a64 Remove the main() I put in for debugging aaron 2009-04-08 16:43:44 +0000
  • d517245beb Fixes for shattering, added JUnit test case aaron 2009-04-08 16:37:34 +0000
  • 62ac7366ed A quick hack to ensure that the sequence, qualities, and secondary qualities are in accordance with the strand flag. kiran 2009-04-08 15:57:28 +0000
  • 25474ebe7e Computes the read error rate for a bam file. Ignores reads with indels, treats low-quality and high-quality reference bases the same. Does not count ambiguous reference bases as mismatches. Optionally allows for best two bases in read to be used. kiran 2009-04-08 15:56:10 +0000
  • 59b2e6a90f Added some stuff for retreiving the base index and probability of a compressed base. kiran 2009-04-08 15:52:58 +0000
  • 8d48bdc9ec it walks... the version committed actually counts snps only asivache 2009-04-08 02:00:41 +0000
  • 62d75ced3c nothing fancy, just a wrapper (aka struct) to pass around a bunch of counts asivache 2009-04-08 01:58:57 +0000
  • 453d13415d count variant as biallelic if it's just a non-ref homogeneous site! asivache 2009-04-08 01:57:27 +0000
  • b49f713336 Enabled multiple argument for GATK driver; first step towards generalized -rods <name> <type> <file> argument structure depristo 2009-04-08 01:52:13 +0000
  • 1ade22121b cruel hack: new toolkit-wide optional cmdline arguments added to allow for loading trio genotyping tracks; to be moved back to walker when walkers can register their data needs with the toolkit asivache 2009-04-07 22:33:26 +0000
  • 8ec427ab66 latest version... still under dev/testing asivache 2009-04-07 22:31:06 +0000
  • 202c501939 Added a sample xml marshaller / unmarshaller. hanna 2009-04-07 22:28:16 +0000
  • abe2d25f10 Added castor dependency. hanna 2009-04-07 22:27:39 +0000
  • 9d35f0ca67 The system now requires a dictionary file for a fasta file, or it throws an error. You can't just operate without a sequence dictionary any longer. We will transition to a GenomeLoc system that assumes a dictionary is available. depristo 2009-04-07 22:21:57 +0000
  • 00722e19bc The system now requires a dictionary file for a fasta file, or it throws an error. You can't just operate without a sequence dictionary any longer. We will transition to a GenomeLoc system that assumes a dictionary is available. depristo 2009-04-07 22:19:54 +0000
  • 9c4fc633aa Make it symmetric: if there is no sequence dictionary, also send a message to the logger, just like we do when we find the dict asivache 2009-04-07 21:44:39 +0000
  • b64e4d1a04 seekForwardOffset changed (improved?): first, compareContigs does *not*, in general, return -1,0 or 1 if no dictionary is available; second, be more flexible in trying to jump to the right contig (current implementation of FastaFile2 will still through an exception if there's no dictionary, but iterator itself behaves transparently) asivache 2009-04-07 21:42:33 +0000
  • 2663ac3e4a documentation fix aaron 2009-04-07 21:39:50 +0000
  • 8a357a88a2 right...exponential should be exponential, so I might want to increment the exponent aaron 2009-04-07 20:12:05 +0000
  • 6ce9e0f941 delete the old strategy aaron 2009-04-07 19:40:03 +0000
  • 08fddd43af -Replaced adaptive and linear strategies with an adaptive linear strategy -Added the exponential growth strategy -Added factory code that allows you to transitition between strategies, so if you want to move from linear to exp at a point, and then back when you've hit a runtime threshold, it will take care of it for you. -Changed the code to return a Shard instead of a GenomeLoc aaron 2009-04-07 19:37:38 +0000
  • 6369d23b43 renamed; these files are more strategy than actual shards aaron 2009-04-07 16:50:56 +0000
  • e95f427965 Added isReference() to AllelicVariant and updated rodDbSNP accordingly asivache 2009-04-07 14:49:20 +0000
  • 99579a1ef8 Math correction. kiran 2009-04-07 02:18:13 +0000
  • 9be978e006 Intermediate commit (debugging info). kiran 2009-04-07 01:20:15 +0000
  • b42d8df646 the new shatter method, independent of the underlying data. The only thing needed to create a Shard is the reference seq, which may be a problem in reference less traversals, so the builder class is there so we can make different construction schemes. aaron 2009-04-07 00:32:57 +0000
  • 0baa8c0f76 We need a base exception so we can differentiate between exceptions we've generated and those external to our code. All our exceptions should extend this exception. I'll migrate the ones I can find later on. aaron 2009-04-07 00:13:45 +0000
  • 150bca30aa typO in the documentation... aaron 2009-04-06 23:05:59 +0000
  • 4aa9c0d591 Matt make a good point that the Reference Iterator we were using wasn't bounded; The BoundedReferenceIterator takes a GenomeLoc to bound the iterations by aaron 2009-04-06 23:03:56 +0000
  • 5a5c6d1276 Added some debugging stuff (writes model parameters to one file per cycle). kiran 2009-04-06 22:00:58 +0000
  • 0fc8a90553 removing some files from the old approach to dataSource aaron 2009-04-06 21:57:34 +0000
  • 5feb7ee627 temperary fix, relying on a old reference order data constructor aaron 2009-04-06 21:38:41 +0000
  • af5a443e5a add Synchronized to the has_next and next methods aaron 2009-04-06 21:17:11 +0000
  • 97d14abe85 Interface check-in for Matt aaron 2009-04-06 21:14:19 +0000
  • 820cf09198 Updated with last week / next week for 6 April. hanna 2009-04-06 14:05:20 +0000
  • d1c5e986d5 Another check to deal with bad reads (BWA output throws bad exceptions) ebanks 2009-04-06 04:58:22 +0000
  • 3f75fc4e83 Unfortunately, because BWA occasionally outputs crazy reads, we need to make sure not to have an ArrayIndexOutOfBoundsException thrown. ebanks 2009-04-06 03:51:35 +0000
  • f12d40dde8 Simplified SAMRecord construction and emission. kiran 2009-04-05 04:48:31 +0000
  • 0d25e71953 a declaration is made generic asivache 2009-04-04 21:55:02 +0000
  • 551ce9130f added isBiallelic() to the AllelicVariant interface and to rodDbSNP implementation. We probably don't really know how to deal with non-biallelic sites just as yet... asivache 2009-04-04 21:31:16 +0000
  • 2e89d5e46f That was an annoying bug to find. Mark, I want a beer. ebanks 2009-04-03 20:05:24 +0000
  • 4eac3193f7 Added RefMetaDataTracker system as a replacement for the List<RefenenceOrderedData> going into walkers. This system allows you to more easily get a tracker for processing using the lookup(name, default) system. See Pileup for an example. depristo 2009-04-03 19:54:54 +0000
  • c1abcfb014 Fixed problem where we were considering reads out of order because their stop positions where out of order, but with equal starts. This involved a change in the ordering feature of GenomeLoc, which now no longer sorts by both start and stop. So as long as the start positions are equal, things are considered "in order". Perhaps this isn't a good idea to change... depristo 2009-04-03 19:53:33 +0000
  • ef06924f73 JavaDocs! kiran 2009-04-03 19:19:17 +0000
  • 42eb356782 1. modifed by read traversals with indexes to be more general 2. GenomeLocs for reads should have ends spanning the read (moved it to GenomeLoc from Utils) 3. Got rid of those stupid unmappable characters from comments in various files ebanks 2009-04-03 18:24:08 +0000
  • 86fc18e9fc Fixed merge bug andrewk 2009-04-03 17:41:58 +0000
  • bef475778f - Updated --hapmap switch to --hapmap-chip to reflect the data being chip data for an individual rather than population allele frequency data in Hapmap - Corrected some bugs to get metrics logging working - Added a switch --force_1base_probs to ignore 4-base probalities if they exist andrewk 2009-04-03 17:32:31 +0000
  • edc44807af rod's now have names. Use getName() to access it. Next step is better interface to accessing rods depristo 2009-04-03 16:41:33 +0000
  • 5019971290 Now outputs four-base SAM record (read name prefixed with KIR) and bustard SAM record (prefixed with BUS) for easy debugging. kiran 2009-04-03 15:48:51 +0000
  • 15151ac125 Corrected the use of the prior. kiran 2009-04-03 15:47:47 +0000
  • b854c24575 Oops. I gave this method the wrong name first time around. kiran 2009-04-03 15:46:26 +0000
  • 9bbce32064 Basic dbSNP and HapMap frequency aware SNP caller... still in progress kcibul 2009-04-03 14:24:09 +0000
  • f031d882c6 ByReference traversals! depristo 2009-04-03 13:23:18 +0000
  • e3ac0cb500 - A lot of code cleaned up; separated metrics code from AlleleFrequencyMetricsWalker into AlleleMetrics and eliminated the former class. AFMW (aside from being a name so long that it warrants an acronym) can now be implemented by passing an option to AlleleFreqeuncyWalker that logs metrics to a file. - AlleleMetrics and AlleleMetricrsWalker are now ready to take a list of clasess that implement the AllelicVariant interface - Switched a genome location in AlleleFrequencyEstimate from String to GenomeLoc which makes way more sense. andrewk 2009-04-03 02:09:10 +0000
  • c6ab60ee04 change variable type to Boolean from boolean to make cmdline parser happy asivache 2009-04-02 22:35:30 +0000
  • 16aa979e34 make -A a true flag not an argument that asks for 'true/false' value! asivache 2009-04-02 22:23:46 +0000
  • decf45664a Removed now-useless FourBaseCaller jar target (FourBaseRecaller replaces it). kiran 2009-04-02 22:13:51 +0000
  • 7d889c0661 Refactored into oblivon. kiran 2009-04-02 22:12:15 +0000
  • dffc879240 Should now be appropriately using Bustard data to call bases (there are some mathematical subtleties that arise when no longer using ICs as initialization data. Also writes some more relevant fields in the SAM records. WAAAAAY simpler than old version. Like, super way. kiran 2009-04-02 22:10:13 +0000
  • 59334b0270 A convenience class for manipulation base probability distributions. kiran 2009-04-02 22:08:31 +0000
  • 399d9b8c1e A class that represents the model parameters for all of the Gaussian models for all cycles. kiran 2009-04-02 22:08:10 +0000
  • f0f94b6c72 A class that represents the model parameters for all of the Gaussian models at a given cycle. Handles the accumulation of parameter initialization data and provides for efficient computation of base probability distribution. kiran 2009-04-02 22:07:47 +0000
  • a8a6c63a32 A class with some static methods that aid the manipulation of quality scores and probabilities (including a method to compress a base and quality score into a byte for SAM output. kiran 2009-04-02 22:06:15 +0000
  • b7a67da775 Expose the underlying SAM reader to the walkers. jmaguire 2009-04-02 21:38:00 +0000
  • 8ce4dabd7c Print coverage per reference base for each sample in a merged BAM file. This is a good example for how to untangle a merged BAM file. jmaguire 2009-04-02 21:35:31 +0000
  • 5d9b068b8b generic declarations added here and there to eliminate a few annoying warnings; no consequential changes asivache 2009-04-02 20:53:01 +0000
  • 4bc035d919 half-way through making rodDbSNP implement AllelicVariant interface; does not work yet asivache 2009-04-02 20:48:59 +0000
  • 4faa680887 *Massive* speed-up for interval-based by-read traversals. [Could do more optimizing, but this simple fix was good enough for now] ebanks 2009-04-02 20:19:39 +0000
  • c192a95998 changes in three files to make the HapMap RODs work: kcibul 2009-04-02 19:55:19 +0000
  • b4cdd1d9a1 correct package name asivache 2009-04-02 18:09:31 +0000
  • 93fc768c38 Fixing problems with SAMQueryIterator and reads depristo 2009-04-02 18:04:28 +0000
  • d202264b23 initial add of pooled calling experiment walker. jmaguire 2009-04-02 17:55:40 +0000
  • 3248176118 Die with appropriate error message if we try to read past the end of a chromosome. ebanks 2009-04-02 16:44:32 +0000
  • 24e8581c30 Slight improvements to allele caller interface; fixed problem with printing progress depristo 2009-04-02 16:44:12 +0000
  • 20d4bcbb2e I said - delete! asivache 2009-04-02 16:21:21 +0000
  • 25ace306b9 GenomeAnalysisTK: better documentation of validation option. AlleleFrequencyWalker: output the last reference interval if it's left hanging open. jmaguire 2009-04-02 16:11:20 +0000
  • 816e768a74 move interface from playground asivache 2009-04-02 15:58:01 +0000
  • f26055c926 interface representing allele variants/genotype calls asivache 2009-04-02 15:57:19 +0000
  • f42b75da72 restore GFF_OUTPUT_FILE to a required argument. jmaguire 2009-04-02 14:34:08 +0000
  • 2cd9a1597f Simple improvements to allele caller depristo 2009-04-02 14:09:14 +0000
  • d952790258 GFF now parses attributes correctly and efficiently. Slightly better interface to Utils.join depristo 2009-04-01 22:54:38 +0000
  • ce57fed2fb Hack to work around an Apache CLI bug, where core arguments couldn't be commingled with walker arguments. These arguments can commingle now. Everybody into the pool. hanna 2009-04-01 20:56:42 +0000
  • 7ce280723f Updated todo list depristo 2009-04-01 20:31:42 +0000
  • 6cc2fa24d5 Added ability to downsample to a particular coverage ebanks 2009-04-01 20:27:06 +0000
  • bb3dbb5756 change default onTraversalDone to use the new output streams jmaguire 2009-04-01 19:50:31 +0000
  • 4faacac315 Now handle the case where we don't actually SEE all of the positions. jmaguire 2009-04-01 19:50:07 +0000
  • 675505646d now makes confident reference intervals. jmaguire 2009-04-01 18:46:14 +0000
  • 6994cca988 added precision ebanks 2009-04-01 16:21:29 +0000
  • 16c2ea4673 Invalid arguments are not always flagged when stopAtNonOption is false. Make sure stopAnNonOption is true when we do final argument validation. hanna 2009-04-01 15:58:57 +0000
  • 7ee792df04 Print correct help if core arguments (--input-file et al) aren't correctly specified. hanna 2009-04-01 15:16:49 +0000
  • 3af4290a49 Added iterator to randomly downsample to a given fraction of the reads. Also, updated sort iterator to allow user to input max sorts. Put in placeholder for downsampling to given coverage. ebanks 2009-04-01 02:11:13 +0000
  • 385736469c High performance pileup code and utilities depristo 2009-04-01 00:47:47 +0000