Commit Graph

  • 56f6847456 Changed interface from contig,pos,length to more common contig,start,stop interface. hanna 2009-04-16 00:04:41 +0000
  • 6c9d110eb9 better description depristo 2009-04-15 22:06:36 +0000
  • 2eabcfedb7 Fixed potential bug with next() operation returning empty contexts when a read contains a large deletion. We can now use the look ahead safely... depristo 2009-04-15 21:41:30 +0000
  • 7261787b71 Fixed potential bug with next() operation returning empty contexts when a read contains a large deletion. We can now use the look ahead safely... depristo 2009-04-15 21:38:28 +0000
  • e70aecf518 bug fix, but important aaron 2009-04-15 21:07:20 +0000
  • feebd8cd55 Latest version of sam. hanna 2009-04-15 21:02:25 +0000
  • 76d833e39b Meaningful assert messages depristo 2009-04-15 19:12:28 +0000
  • 67ea66c866 Bug fix aaron 2009-04-15 19:12:18 +0000
  • 1edfe48194 Better debugging output with .debug depristo 2009-04-15 19:09:18 +0000
  • 9cc808104e Fixed subtle bug in permitting EXPAND_WINDOW to be > 1. We now use the right window size so we avoid including empty hangers. There's still a rare bug to sort out, which occurs in the case where a read with an indel can generate empty hangers. depristo 2009-04-15 19:08:26 +0000
  • 180ff13290 Added a bunch of changes to support the new MicroManager code aaron 2009-04-15 18:29:38 +0000
  • 339261c4a9 Load the dictionary and sanity check it against the index. hanna 2009-04-15 18:04:13 +0000
  • 26e84d7fd6 Added index iteration for ReferenceSequenceFile interface compatibility. Added better error checking for querying past the end of a contig. Lots more testing. hanna 2009-04-15 17:17:11 +0000
  • 3fda8613c3 * minor formatting changes * support for "extended" output kcibul 2009-04-15 15:11:05 +0000
  • 12407b5b1a Deleted the old file aaron 2009-04-15 13:55:01 +0000
  • 6db9127f90 Added changes to shattering, refactored SAMBAM into SAM aaron 2009-04-15 13:52:56 +0000
  • 182626576f Basic indexed fasta POC in place. Requires a more complete implementation of the ReferenceSequenceFile interface, and much more testing. hanna 2009-04-15 13:46:56 +0000
  • 7949e377e4 Intermediate commit. Refactored some simple base manipulation stuff into BaseUtils.java. Generalized some likelihood computation logic to make future possible EM-ing easier. kiran 2009-04-15 04:18:07 +0000
  • d0b8d311e6 Can now optionally print the read and the alignment region of the reference. kiran 2009-04-15 04:10:30 +0000
  • d4aaa1bef4 * fixed (with Matt's help) the argument parsing * outputting UCSC wiggle format kcibul 2009-04-15 02:17:39 +0000
  • 24722a442e Slight code cleanup depristo 2009-04-14 22:21:36 +0000
  • e6fb122d7d Added some fixes and new iterator tests --This lin e, and those below, will be ignored-- aaron 2009-04-14 22:19:36 +0000
  • 13b0995d54 Adding an iterator that bounds the number of reads aaron 2009-04-14 22:18:31 +0000
  • 72a3d84ed2 General purpose pileup code -- you can use these features to obtain detailed pileup data from reads and offsets. Useful for all pileup based walkers. Expanded support for rodSAMPileup to enable the new ValidatingPileupWalker, which takes a samtools pileup output and checks that GATK gives identical output as samtools on a per base and per qual pileup. It's going to be a very useful validation tool. depristo 2009-04-14 22:13:10 +0000
  • baae98c6d5 and don't allocate new 200M string every time please, just pass byte array! asivache 2009-04-14 21:55:33 +0000
  • 9d56355abe bug fixed when reference name was passed as a string instead of actual reference bases asivache 2009-04-14 21:46:27 +0000
  • 222c4e5865 Commented out some debugging lines kiran 2009-04-14 20:15:41 +0000
  • 49d76014d1 Commented out a debugging line kiran 2009-04-14 20:15:11 +0000
  • b39e584787 Primary or secondary bases that got a quality score of literally zero led to unfortunate infinities. Added an epsilon (1e-5) to every prob. kiran 2009-04-14 20:04:49 +0000
  • d28e9f9b98 search over q's for finding argmax[q] p(D|q) jmaguire 2009-04-14 19:15:45 +0000
  • 96248cdec4 Added some output to all the classes, including build in runtime analysis aaron 2009-04-14 19:14:53 +0000
  • 647827b18c Transitioned indel code to use GATK and Walkers ebanks 2009-04-14 19:14:15 +0000
  • b363eedd2c Deal with screwy reads by changing logic to determine whether we are past the last interval ebanks 2009-04-14 19:13:16 +0000
  • 0629f79049 Moved fasta support files into their own package. hanna 2009-04-14 18:13:23 +0000
  • 7a4a5a17c0 Made sequence index compatible with Aaron's junit changes. hanna 2009-04-14 17:53:20 +0000
  • 88ebf1a05b Fied some documentation aaron 2009-04-14 17:41:38 +0000
  • 186c799ffc Class to read an .fai file. hanna 2009-04-14 17:37:18 +0000
  • 704f1bd634 It helps if I check the new base class in with my changes aaron 2009-04-14 17:18:16 +0000
  • 4b3578e1de Added the base test case, fixed the rest of the test cases to follow suit. Added more verbose output to ant for junit tests. aaron 2009-04-14 17:11:38 +0000
  • 961dbbd4ef Now output bases and qhat and qstar into the GFF. jmaguire 2009-04-14 15:23:00 +0000
  • dafdff1974 All bases are now indexed as A:0, C:1, G:2, T:3. kiran 2009-04-14 14:49:43 +0000
  • 40ea22eb17 Added some methods to return the cross-talk partner base of a given base or base index. kiran 2009-04-14 14:49:12 +0000
  • eb4b4a053b A bunch of updates to the SAM/BAM data source, along with test cases for the merging of multiple files (it works!). aaron 2009-04-14 14:19:20 +0000
  • 30121534ed Outputs the secondary bases and quals (if available) in verbose mode. Prefixed with the tag 'SQ='. kiran 2009-04-14 13:58:28 +0000
  • 998fad76c6 Some utility methods for creating pileups of secondary bases and secondary quals. kiran 2009-04-14 13:57:54 +0000
  • 8b2c2e677b Uses the cleaner new GenomeLoc(read) syntax depristo 2009-04-14 00:55:43 +0000
  • 1cee7948ab Added lots of assertions to check for problems. depristo 2009-04-14 00:55:19 +0000
  • 794360c410 Added verbose option to show mapping qualities and base qualities as ints! depristo 2009-04-14 00:54:48 +0000
  • cc75e8f712 Uses the cleaner new GenomeLoc(read) syntax depristo 2009-04-14 00:53:58 +0000
  • 11377ef390 Added lots of assertions to check for problems. The current GenomeLoc needs to be cleaned up and refactored but at least it runs. We need unit tests ASAP depristo 2009-04-14 00:53:08 +0000
  • bb666ce392 Added mappingQualPileup function for use in the verbose mode of Pileup depristo 2009-04-14 00:51:26 +0000
  • bc43c0eefc there are really cases when we can not merge until we get just two pilesant now we do not crash in those cases but print a warning and just show the resulting n piles even when n>2 asivache 2009-04-14 00:45:47 +0000
  • 8e6093d5a5 remove mom/dad/kid cmd line arguments that were needed for mendelian walker; now we can use generic track binding!! asivache 2009-04-14 00:45:34 +0000
  • f838a5e511 Changed some double comparisons of the form a == b to abs(a - b) <= precision. Now we shouldn't be passing or failing some if conditions due to floating-point precision. kiran 2009-04-13 20:05:46 +0000
  • 887adcfc7f Some minor fixes to the last check-in aaron 2009-04-13 18:24:51 +0000
  • f2d0d73309 removed old shard strategy code aaron 2009-04-13 18:13:45 +0000
  • dd604799dc Added some new code for shard support over reads aaron 2009-04-13 18:11:43 +0000
  • d44c30154a added MAX_READ_LENGTH - now we can ignore long reads (454?); a bad idea in general, but the performance hit is to hard to take, at least for preliminary testing runs... asivache 2009-04-13 16:53:12 +0000
  • e91a429c58 A class to print out as much context about the given locus site as is possible. Useful for testing traversal engines -- run old and new code across a given region and diff the output to make sure they have the same context. hanna 2009-04-13 15:29:55 +0000
  • 6652f13a17 more verbose gff output! jmaguire 2009-04-13 15:21:23 +0000
  • cf929a8275 Get rid of test case's dependence on transient methods. hanna 2009-04-13 15:16:42 +0000
  • 6e180ed44e Unified caller is go. jmaguire 2009-04-13 12:29:51 +0000
  • f39092526d Added function RandomSubset jmaguire 2009-04-13 12:14:53 +0000
  • b4136b6d6e a few tweaks to make it more robust: ignore reads with cigars containing anything but I,D,M; don't set up contig ordering manually, rely upon reference sequence and its dictionary; don't die if a record does not have NM tag, but faal back to direct counting instead; now requires reference as a cmdline arg asivache 2009-04-13 04:49:19 +0000
  • 32e000bbfe Added MatchSQTagToStrand jar target. kiran 2009-04-13 00:50:36 +0000
  • 756e6c61d8 Strictness args are presented as lowercase in the help, but only accepted if uppercase. Changed help to list the valid arguments in uppercase. kiran 2009-04-13 00:50:19 +0000
  • c51f51f255 Make sure we always write at least 1000 points per base in each cycle's scatterplot. Print the disagreement rate between Bustard and FourBaseRecaller. kiran 2009-04-13 00:49:41 +0000
  • 1fb16d54e0 For SAM files that have no alignments and when no reference is specified, contigInfo.getSequence() is null, causing an error when getSequenceName() is called on the resulting null pointer. Check for null instead and return that instead of barfing here. kiran 2009-04-13 00:48:21 +0000
  • 5e96ab6161 Helpful functions for converting a base (char) to a base index (A:0, C:1, G:2, T:3, alphabetical and consistent with Illumina conventions to minimize confusion. kiran 2009-04-13 00:46:23 +0000
  • 35fc002d5d Debugging information is now written in such a way to make it easier to import into R. kiran 2009-04-12 19:45:33 +0000
  • 6ee4fe5a20 Fixed a Bustard/Firecrest file synchronization bug. kiran 2009-04-12 19:44:07 +0000
  • 817278be46 If a SAMRecord is on the negative strand, reverse complement the SQ tag. kiran 2009-04-12 19:42:24 +0000
  • 1d5a22cacf Extracts a Fastq file and the SQ tags to a separate file. kiran 2009-04-12 19:41:44 +0000
  • e410c005c0 A debugging tool to ensure the SQ tag in a four-prob SAM file matches the SAMRecord strand orientation. kiran 2009-04-12 19:40:42 +0000
  • 9c37400c4f Added basic performance testing so I can make sure concurrent access doesn't slow down overall fasta access. hanna 2009-04-12 18:05:56 +0000
  • c7777d46d6 * re-enabled setting of sequence dictionary information on GenomeLoc kcibul 2009-04-12 02:44:14 +0000
  • ce72932a45 * refactored GenomeLoc to use contigIndex internally for performance and fixed several calling classes * added basic unit test for GenomeLoc * fixed bug when parsing genome locations like chr1:5000 the start position was being left as maxint rather than being set to the same as the stop position. kcibul 2009-04-12 02:25:17 +0000
  • 49fd951d8c Initial test suite for FastaSequenceFile2, so I can add parallelism support with abandon. hanna 2009-04-11 21:10:42 +0000
  • 608a66e6ab TbyLocibyRef previously didn't seem to support traversals with no interval specified. Put in a temporary fix until the threaded approach is in place. hanna 2009-04-10 22:14:06 +0000
  • c2669021b8 Cleanup, and support either by-interval traversals or full traversals in data source-backed code. hanna 2009-04-10 22:09:01 +0000
  • 2322bb7d86 Workaround: use a single ReferenceIterator for an entire micromanaged traversal. We'll have to do something about ReferenceIterator thread safety later. hanna 2009-04-10 20:50:28 +0000
  • 95753e1b34 Should've been calling queryOverlapping in locus mode. hanna 2009-04-10 20:22:04 +0000
  • a2a38a4bbb Removed RepairBadlyCombinedSamFile jar target. kiran 2009-04-10 04:21:19 +0000
  • 2b59110dca CombineSamAndFourProbs is better. kiran 2009-04-10 04:19:53 +0000
  • 56aa98ad30 Ignore null values. kiran 2009-04-10 04:18:20 +0000
  • 2ef2c9e121 Fixed an issue wherein the SQ field was only being pulled from the first read of the pileup, no matter what. Fixed an issue wherein Andrew enumerates his bases as A:0, C:1, T:2, G:3, and Kiran's QualityUtils methods enumerate bases as A:0, C:1, G:2, T:3 (we should standardize this). Fixed an issue wherein the remaining probability was being divided by 3 rather than 2 when four-base probs are enabled. kiran 2009-04-10 04:17:53 +0000
  • 17b3d5b554 New ROD accessing system, including a generalized interface for binding ROD on the command line that doesn't require you to chance GenomeAnalysisTK.java depristo 2009-04-09 22:04:59 +0000
  • f5cc2d8b0b Commented out import of IlluminaParser. kiran 2009-04-09 21:30:29 +0000
  • 0d825ccfc1 Oops. Fixed duplicate reference to the reference. hanna 2009-04-09 21:27:57 +0000
  • 9afa101465 Add interval support to the aaron 2009-04-09 21:23:43 +0000
  • c5220c0822 Four-base probs are now decoded with the relevant method in QualityUtils kiran 2009-04-09 20:52:17 +0000
  • 9bc763a835 A better (aka 'working') tool for combining four-base probs with an aligned sam file. kiran 2009-04-09 20:51:37 +0000
  • b7a2e82b46 Can optionally process raw or corrected intensities. kiran 2009-04-09 20:50:11 +0000
  • 6cdad10dd1 Make output type identical to the bustard parser so the values can be easily swapped for one another. kiran 2009-04-09 20:49:34 +0000
  • d0ce56e018 Remember to take the strand flag into account when calculating error rate per cycle as a surrogate for instrument performance. kiran 2009-04-09 20:48:45 +0000
  • 8a1207e4db Bringing up scaffolding for integration of locus traversals by reference with Aaron's data source code. Reverts to original TraverseByLociByReference behavior unless a special combination of command-line flags are used. hanna 2009-04-09 20:28:17 +0000
  • 49b2622e3d Helper utility for merging BAM files depristo 2009-04-09 20:10:41 +0000
  • 8e2f5471a1 Some cleanup to the data source, and another JUnit test case. aaron 2009-04-09 14:58:05 +0000
  • d56193b6df Cleanup of a couple of output statements aaron 2009-04-09 14:09:07 +0000
  • c556a97f17 Skeleton of Somatic Coverage tool kcibul 2009-04-09 02:34:03 +0000