12752cf893Added a bunch of fixes: MSRI wasn't working, sharding had broken edge cases, and SAMBAM DS needed to close the file handles.
aaron
2009-04-09 00:20:15 +0000
8efedacabfBump sam jdk to svn rev 207.
hanna
2009-04-08 22:16:46 +0000
089bf30cf4Send things to the out file via the logger.
kiran
2009-04-08 21:49:03 +0000
6db9a00a0bSAMFileWriter doesn't appear to flush the buffer when its destructor is called. You have to call the close() method. Also, choose a random base for Ns in the forward and reverse strands so that samtools doesn't pitch a fit.
kiran
2009-04-08 21:48:24 +0000
eb2f0ebd62If the first base of a read is 'N', and the alignment cigar says every base matches, samtools calls shennanigans. Now I just output an A, but the real way to do this is to modify the cigar string accordingly.
kiran
2009-04-08 19:58:18 +0000
0e7d962ecaOops. Slight twiddle of the math here so that I'm not asking if bestBase == nextBestBase.
kiran
2009-04-08 19:56:54 +0000
d4ab95c098Added a constructor, took out a copy constructor, and changed some SAMBAM code.
aaron
2009-04-08 19:53:20 +0000
0b81a76420added support for Picard IntervalList files to --interval_file
kcibul
2009-04-08 16:49:43 +0000
295c269a64Remove the main() I put in for debugging
aaron
2009-04-08 16:43:44 +0000
d517245bebFixes for shattering, added JUnit test case
aaron
2009-04-08 16:37:34 +0000
62ac7366edA quick hack to ensure that the sequence, qualities, and secondary qualities are in accordance with the strand flag.
kiran
2009-04-08 15:57:28 +0000
25474ebe7eComputes the read error rate for a bam file. Ignores reads with indels, treats low-quality and high-quality reference bases the same. Does not count ambiguous reference bases as mismatches. Optionally allows for best two bases in read to be used.
kiran
2009-04-08 15:56:10 +0000
59b2e6a90fAdded some stuff for retreiving the base index and probability of a compressed base.
kiran
2009-04-08 15:52:58 +0000
8d48bdc9ecit walks... the version committed actually counts snps only
asivache
2009-04-08 02:00:41 +0000
62d75ced3cnothing fancy, just a wrapper (aka struct) to pass around a bunch of counts
asivache
2009-04-08 01:58:57 +0000
453d13415dcount variant as biallelic if it's just a non-ref homogeneous site!
asivache
2009-04-08 01:57:27 +0000
b49f713336Enabled multiple argument for GATK driver; first step towards generalized -rods <name> <type> <file> argument structure
depristo
2009-04-08 01:52:13 +0000
1ade22121bcruel hack: new toolkit-wide optional cmdline arguments added to allow for loading trio genotyping tracks; to be moved back to walker when walkers can register their data needs with the toolkit
asivache
2009-04-07 22:33:26 +0000
8ec427ab66latest version... still under dev/testing
asivache
2009-04-07 22:31:06 +0000
202c501939Added a sample xml marshaller / unmarshaller.
hanna
2009-04-07 22:28:16 +0000
abe2d25f10Added castor dependency.
hanna
2009-04-07 22:27:39 +0000
9d35f0ca67The system now requires a dictionary file for a fasta file, or it throws an error. You can't just operate without a sequence dictionary any longer. We will transition to a GenomeLoc system that assumes a dictionary is available.
depristo
2009-04-07 22:21:57 +0000
00722e19bcThe system now requires a dictionary file for a fasta file, or it throws an error. You can't just operate without a sequence dictionary any longer. We will transition to a GenomeLoc system that assumes a dictionary is available.
depristo
2009-04-07 22:19:54 +0000
9c4fc633aaMake it symmetric: if there is no sequence dictionary, also send a message to the logger, just like we do when we find the dict
asivache
2009-04-07 21:44:39 +0000
b64e4d1a04seekForwardOffset changed (improved?): first, compareContigs does *not*, in general, return -1,0 or 1 if no dictionary is available; second, be more flexible in trying to jump to the right contig (current implementation of FastaFile2 will still through an exception if there's no dictionary, but iterator itself behaves transparently)
asivache
2009-04-07 21:42:33 +0000
8a357a88a2right...exponential should be exponential, so I might want to increment the exponent
aaron
2009-04-07 20:12:05 +0000
6ce9e0f941delete the old strategy
aaron
2009-04-07 19:40:03 +0000
08fddd43af-Replaced adaptive and linear strategies with an adaptive linear strategy -Added the exponential growth strategy -Added factory code that allows you to transitition between strategies, so if you want to move from linear to exp at a point, and then back when you've hit a runtime threshold, it will take care of it for you. -Changed the code to return a Shard instead of a GenomeLoc
aaron
2009-04-07 19:37:38 +0000
6369d23b43renamed; these files are more strategy than actual shards
aaron
2009-04-07 16:50:56 +0000
e95f427965Added isReference() to AllelicVariant and updated rodDbSNP accordingly
asivache
2009-04-07 14:49:20 +0000
b42d8df646the new shatter method, independent of the underlying data. The only thing needed to create a Shard is the reference seq, which may be a problem in reference less traversals, so the builder class is there so we can make different construction schemes.
aaron
2009-04-07 00:32:57 +0000
0baa8c0f76We need a base exception so we can differentiate between exceptions we've generated and those external to our code. All our exceptions should extend this exception. I'll migrate the ones I can find later on.
aaron
2009-04-07 00:13:45 +0000
150bca30aatypO in the documentation...
aaron
2009-04-06 23:05:59 +0000
4aa9c0d591Matt make a good point that the Reference Iterator we were using wasn't bounded; The BoundedReferenceIterator takes a GenomeLoc to bound the iterations by
aaron
2009-04-06 23:03:56 +0000
5a5c6d1276Added some debugging stuff (writes model parameters to one file per cycle).
kiran
2009-04-06 22:00:58 +0000
0fc8a90553removing some files from the old approach to dataSource
aaron
2009-04-06 21:57:34 +0000
5feb7ee627temperary fix, relying on a old reference order data constructor
aaron
2009-04-06 21:38:41 +0000
af5a443e5aadd Synchronized to the has_next and next methods
aaron
2009-04-06 21:17:11 +0000
97d14abe85Interface check-in for Matt
aaron
2009-04-06 21:14:19 +0000
820cf09198Updated with last week / next week for 6 April.
hanna
2009-04-06 14:05:20 +0000
d1c5e986d5Another check to deal with bad reads (BWA output throws bad exceptions)
ebanks
2009-04-06 04:58:22 +0000
3f75fc4e83Unfortunately, because BWA occasionally outputs crazy reads, we need to make sure not to have an ArrayIndexOutOfBoundsException thrown.
ebanks
2009-04-06 03:51:35 +0000
f12d40dde8Simplified SAMRecord construction and emission.
kiran
2009-04-05 04:48:31 +0000
0d25e71953a declaration is made generic
asivache
2009-04-04 21:55:02 +0000
551ce9130fadded isBiallelic() to the AllelicVariant interface and to rodDbSNP implementation. We probably don't really know how to deal with non-biallelic sites just as yet...
asivache
2009-04-04 21:31:16 +0000
2e89d5e46fThat was an annoying bug to find. Mark, I want a beer.
ebanks
2009-04-03 20:05:24 +0000
4eac3193f7Added RefMetaDataTracker system as a replacement for the List<RefenenceOrderedData> going into walkers. This system allows you to more easily get a tracker for processing using the lookup(name, default) system. See Pileup for an example.
depristo
2009-04-03 19:54:54 +0000
c1abcfb014Fixed problem where we were considering reads out of order because their stop positions where out of order, but with equal starts. This involved a change in the ordering feature of GenomeLoc, which now no longer sorts by both start and stop. So as long as the start positions are equal, things are considered "in order". Perhaps this isn't a good idea to change...
depristo
2009-04-03 19:53:33 +0000
42eb3567821. modifed by read traversals with indexes to be more general 2. GenomeLocs for reads should have ends spanning the read (moved it to GenomeLoc from Utils) 3. Got rid of those stupid unmappable characters from comments in various files
ebanks
2009-04-03 18:24:08 +0000
bef475778f- Updated --hapmap switch to --hapmap-chip to reflect the data being chip data for an individual rather than population allele frequency data in Hapmap - Corrected some bugs to get metrics logging working - Added a switch --force_1base_probs to ignore 4-base probalities if they exist
andrewk
2009-04-03 17:32:31 +0000
edc44807afrod's now have names. Use getName() to access it. Next step is better interface to accessing rods
depristo
2009-04-03 16:41:33 +0000
5019971290Now outputs four-base SAM record (read name prefixed with KIR) and bustard SAM record (prefixed with BUS) for easy debugging.
kiran
2009-04-03 15:48:51 +0000
15151ac125Corrected the use of the prior.
kiran
2009-04-03 15:47:47 +0000
b854c24575Oops. I gave this method the wrong name first time around.
kiran
2009-04-03 15:46:26 +0000
9bbce32064Basic dbSNP and HapMap frequency aware SNP caller... still in progress
kcibul
2009-04-03 14:24:09 +0000
e3ac0cb500- A lot of code cleaned up; separated metrics code from AlleleFrequencyMetricsWalker into AlleleMetrics and eliminated the former class. AFMW (aside from being a name so long that it warrants an acronym) can now be implemented by passing an option to AlleleFreqeuncyWalker that logs metrics to a file. - AlleleMetrics and AlleleMetricrsWalker are now ready to take a list of clasess that implement the AllelicVariant interface - Switched a genome location in AlleleFrequencyEstimate from String to GenomeLoc which makes way more sense.
andrewk
2009-04-03 02:09:10 +0000
c6ab60ee04change variable type to Boolean from boolean to make cmdline parser happy
asivache
2009-04-02 22:35:30 +0000
16aa979e34make -A a true flag not an argument that asks for 'true/false' value!
asivache
2009-04-02 22:23:46 +0000
7d889c0661Refactored into oblivon.
kiran
2009-04-02 22:12:15 +0000
dffc879240Should now be appropriately using Bustard data to call bases (there are some mathematical subtleties that arise when no longer using ICs as initialization data. Also writes some more relevant fields in the SAM records. WAAAAAY simpler than old version. Like, super way.
kiran
2009-04-02 22:10:13 +0000
59334b0270A convenience class for manipulation base probability distributions.
kiran
2009-04-02 22:08:31 +0000
399d9b8c1eA class that represents the model parameters for all of the Gaussian models for all cycles.
kiran
2009-04-02 22:08:10 +0000
f0f94b6c72A class that represents the model parameters for all of the Gaussian models at a given cycle. Handles the accumulation of parameter initialization data and provides for efficient computation of base probability distribution.
kiran
2009-04-02 22:07:47 +0000
a8a6c63a32A class with some static methods that aid the manipulation of quality scores and probabilities (including a method to compress a base and quality score into a byte for SAM output.
kiran
2009-04-02 22:06:15 +0000
b7a67da775Expose the underlying SAM reader to the walkers.
jmaguire
2009-04-02 21:38:00 +0000
8ce4dabd7cPrint coverage per reference base for each sample in a merged BAM file. This is a good example for how to untangle a merged BAM file.
jmaguire
2009-04-02 21:35:31 +0000
5d9b068b8bgeneric declarations added here and there to eliminate a few annoying warnings; no consequential changes
asivache
2009-04-02 20:53:01 +0000
4bc035d919half-way through making rodDbSNP implement AllelicVariant interface; does not work yet
asivache
2009-04-02 20:48:59 +0000
4faa680887*Massive* speed-up for interval-based by-read traversals. [Could do more optimizing, but this simple fix was good enough for now]
ebanks
2009-04-02 20:19:39 +0000
c192a95998changes in three files to make the HapMap RODs work:
kcibul
2009-04-02 19:55:19 +0000
b4cdd1d9a1correct package name
asivache
2009-04-02 18:09:31 +0000
93fc768c38Fixing problems with SAMQueryIterator and reads
depristo
2009-04-02 18:04:28 +0000
d202264b23initial add of pooled calling experiment walker.
jmaguire
2009-04-02 17:55:40 +0000
3248176118Die with appropriate error message if we try to read past the end of a chromosome.
ebanks
2009-04-02 16:44:32 +0000
24e8581c30Slight improvements to allele caller interface; fixed problem with printing progress
depristo
2009-04-02 16:44:12 +0000
20d4bcbb2eI said - delete!
asivache
2009-04-02 16:21:21 +0000
25ace306b9GenomeAnalysisTK: better documentation of validation option. AlleleFrequencyWalker: output the last reference interval if it's left hanging open.
jmaguire
2009-04-02 16:11:20 +0000
816e768a74move interface from playground
asivache
2009-04-02 15:58:01 +0000
f42b75da72restore GFF_OUTPUT_FILE to a required argument.
jmaguire
2009-04-02 14:34:08 +0000
2cd9a1597fSimple improvements to allele caller
depristo
2009-04-02 14:09:14 +0000
d952790258GFF now parses attributes correctly and efficiently. Slightly better interface to Utils.join
depristo
2009-04-01 22:54:38 +0000
ce57fed2fbHack to work around an Apache CLI bug, where core arguments couldn't be commingled with walker arguments. These arguments can commingle now. Everybody into the pool.
hanna
2009-04-01 20:56:42 +0000
7ce280723fUpdated todo list
depristo
2009-04-01 20:31:42 +0000
6cc2fa24d5Added ability to downsample to a particular coverage
ebanks
2009-04-01 20:27:06 +0000
bb3dbb5756change default onTraversalDone to use the new output streams
jmaguire
2009-04-01 19:50:31 +0000
4faacac315Now handle the case where we don't actually SEE all of the positions.
jmaguire
2009-04-01 19:50:07 +0000
675505646dnow makes confident reference intervals.
jmaguire
2009-04-01 18:46:14 +0000
16c2ea4673Invalid arguments are not always flagged when stopAtNonOption is false. Make sure stopAnNonOption is true when we do final argument validation.
hanna
2009-04-01 15:58:57 +0000
7ee792df04Print correct help if core arguments (--input-file et al) aren't correctly specified.
hanna
2009-04-01 15:16:49 +0000
3af4290a49Added iterator to randomly downsample to a given fraction of the reads. Also, updated sort iterator to allow user to input max sorts. Put in placeholder for downsampling to given coverage.
ebanks
2009-04-01 02:11:13 +0000
385736469cHigh performance pileup code and utilities
depristo
2009-04-01 00:47:47 +0000