Commit Graph

  • d35e20ce21 Better error checking for missing .dict file. hanna 2009-05-17 21:57:12 +0000
  • 7161b8f927 Disable support for short name values directly abutting their arguments. hanna 2009-05-17 16:09:32 +0000
  • 4ab9bfe662 Upped sam-jdk jar to new version in public picard repository. hanna 2009-05-17 15:03:05 +0000
  • d152c2b911 New GATKArgumentCollection caused a subtle bug with argument grouping and the help system. Fixed. hanna 2009-05-17 14:54:25 +0000
  • 94e324b844 Write N for the alt allele when we're hom-ref. Stop EM loop when we've converged (likelihood[t-1] == likelihood[t]). jmaguire 2009-05-17 13:58:11 +0000
  • bd53bc18f9 added new required annotations kcibul 2009-05-17 12:24:06 +0000
  • 28bf7ec8ad Aesthetic cleanup. kiran 2009-05-17 04:09:23 +0000
  • a0464633fd Whoops. Changed denominator from reads to bases. kiran 2009-05-17 03:42:25 +0000
  • 5d60efc498 Factored out some simple stats accumulation. kiran 2009-05-17 03:37:57 +0000
  • 81fac73c01 LOD checks for normal and brute force versions ebanks 2009-05-17 02:56:03 +0000
  • 527df6e57b Massive speed-up, clean-up and tabular output. This program is going to rule. jmaguire 2009-05-16 16:52:40 +0000
  • 3b57a35009 don't be tricked by multiple read groups with the same sample id! jmaguire 2009-05-16 15:28:55 +0000
  • 947bac5cdc vast speedup jmaguire 2009-05-16 15:27:58 +0000
  • 6f1559bd77 Cleaned up a bit. Added some documentation. kiran 2009-05-15 21:22:24 +0000
  • 2c4de7b5c5 Switch TraverseByLoci over to new sharding system, and cleanup some code in passing read files along the pathway from command line to traversal engine. hanna 2009-05-15 21:02:12 +0000
  • 57e5f22987 We now only build the files that have changed. It should speed up compile time as our source tree grows. aaron 2009-05-15 20:48:01 +0000
  • f33f3c0434 added LOD threshold for determining when to clean ebanks 2009-05-15 20:23:59 +0000
  • 99d4ebc26d Added functionality to return the final accumulator of a traversal, so external tools can get the result of a walker. aaron 2009-05-15 20:20:27 +0000
  • dae77bf14a Fixed a typo in a comment. kiran 2009-05-15 20:07:31 +0000
  • bfc40f54f0 Nicer output when training off of perfect reads. Not that that works yet... kiran 2009-05-15 20:07:08 +0000
  • d1f3000afa bed-style output for IGV kcibul 2009-05-15 17:58:44 +0000
  • 36db44620b Improved output. Can optionally limit the number reads actually called. kiran 2009-05-15 00:07:57 +0000
  • 7834b969b4 Better interface to the tabular ROD, now makes writing files easier. Also has corresponding test files depristo 2009-05-14 23:20:11 +0000
  • 50f32b7f61 Added a shard strategy for the reduce-by-interval traversals. Also fixed bugs that I found along the way. aaron 2009-05-14 21:20:18 +0000
  • 0f8e6061b6 Simple interface improvements depristo 2009-05-14 21:08:09 +0000
  • 8e9e2f4502 Revised ROD system. Split the system in Basic type and interface. Enabled more control over rod accessing, including an initialize() function to fetch headers and other options from the file. Added general tabular rod, which has a named columns and supports a map<String,String> interface. Comes with shiny new Junit system for RODs. Also, added simple python script for accessing picard data. depristo 2009-05-14 21:06:28 +0000
  • 67293168e7 Support periods in sequence names. hanna 2009-05-14 20:17:57 +0000
  • 641afc4e76 fix a crash in the event that the input file has no read groups! jmaguire 2009-05-14 19:27:41 +0000
  • d8c1b010f1 Fixing the naming of the function I checked in earlier. aaron 2009-05-14 19:27:10 +0000
  • 5858f20902 Documentation. kiran 2009-05-14 18:58:43 +0000
  • 68c9455c0f Moved the base complement method to BaseUtils. kiran 2009-05-14 18:57:48 +0000
  • 3761c0900b Added Bustard vs. Four-prob percent bases consistent output. kiran 2009-05-14 18:01:41 +0000
  • 7a1f85ff86 option to print out the indels found by the cleaner to a file ebanks 2009-05-14 17:50:08 +0000
  • b62bddee42 The header was never being set. Added this hack for now and will alert the authorities ASAP... ebanks 2009-05-14 17:18:51 +0000
  • 959cf09d4b Removed some debugging print statements. kiran 2009-05-14 17:12:42 +0000
  • 2f42a643a8 A new, much simpler (and now, complete) driver program for four-base probs. Serves as a model for anyone who wants to write their own driver program that trains and calls with data from a different source than the raw Illumina data. kiran 2009-05-14 16:58:22 +0000
  • 5824dea0c1 Trains and calls a read at a time rather than a base at a time (which, given it's name, it should have done in the first place) kiran 2009-05-14 16:57:00 +0000
  • e4770885fd The four-probs for all bases in a single read. Some utility functions for generating the primary and secondary base strings, as well as generating the SQ tag byte array in a manner that's consistent with the Bustard base calls (meaning the primary Bustard call and the secondary Four-Prob call are not permitted to be the same). kiran 2009-05-14 16:55:49 +0000
  • fdd123fe16 A parser the raw Illumina data. Allows one to arbitrarily jump from one tile to another. kiran 2009-05-14 16:53:07 +0000
  • 7aa90757ac Moved the iterators over to the StingSAMIterator interface. This will help us ensure that iterators that need to be closed get closed. aaron 2009-05-14 16:52:18 +0000
  • 6d98234555 Holds raw intensities, sequence, and quality scores. kiran 2009-05-14 16:52:03 +0000
  • 241de0b235 A class that implements multiple training strategies and presents the training data in a common form. kiran 2009-05-14 16:51:29 +0000
  • 64c65c7751 New methods to generated compressed SQ quality elements in line with the SAM spec. kiran 2009-05-14 16:50:31 +0000
  • c3b2c66911 The GATK doesn't need the rest aaron 2009-05-14 16:20:45 +0000
  • 0215905bb6 Added an adapter class, that will adapt plain iterators and closeable iterators of SAMRecords into STingSAMIterators. Also unit tests. aaron 2009-05-14 15:17:32 +0000
  • 5dda448ae0 1. Add printouts for the cleaner 2. First pass at the entropy interval walker (still needs work) ebanks 2009-05-14 13:59:48 +0000
  • 80c13f7127 Added a getter for command-line arguments. hanna 2009-05-14 13:55:52 +0000
  • 307c6e4ecf Oops. Forgot to add new file to svn. hanna 2009-05-14 00:52:30 +0000
  • d14cab0be7 Added IterableLocusContextQueue and test. Cleaned up tests, adding BaseTest where it didn't exist. Enhanced test runner to run only classes ending in ...Test.java, so that utility classes can sit alongside the tests but won't be run by JUnit. hanna 2009-05-13 21:32:05 +0000
  • 7b59f63f12 and don't forget to close sam writer after we are done... asivache 2009-05-13 20:46:36 +0000
  • de0cce87ea new optional arg added that allows to specify a separate bam file to send all piles that fail to realign to; plus minor fixes asivache 2009-05-13 20:24:23 +0000
  • 8cce3d908f Bumped sam to latest. hanna 2009-05-13 19:19:55 +0000
  • 12ae3a22b6 Break locus context data access providers into modular components in preparation for traverse by loci. hanna 2009-05-13 18:51:16 +0000
  • 7084ecdeb6 a few changes; checked in to allow debugging. jmaguire 2009-05-13 15:50:48 +0000
  • 5f924c46e0 Added documentation for calling the GATK from Matlab. This is to document the extreme basic and experimental support for using Matlab to call the GATK, and is more of a placeholder for when we have time to revisit supporting this. aaron 2009-05-13 15:25:51 +0000
  • 5b47c5ab6c fixing kiran's busted build depristo 2009-05-12 21:29:04 +0000
  • 4f2c8bf0a3 Fixed an import statement that broke when all the files were moved to this directory. kiran 2009-05-12 20:43:16 +0000
  • cedc4c9ccb Refactored into oblivion. kiran 2009-05-12 20:33:07 +0000
  • 01de5cc0ee Moved to org.broadinstitute.sting.secondarybase kiran 2009-05-12 20:28:29 +0000
  • 4e4767e5de Moved to org.broadinstitute.sting.secondarybase kiran 2009-05-12 20:26:43 +0000
  • 219eb60716 Added newly-required documentation to arguments so that build can complete successfully. kiran 2009-05-12 20:26:10 +0000
  • 688358190c Moved secondary base stuff out of playground for the purpose of making it a core utility. Modified package names and imports such that things would build properly. kiran 2009-05-12 20:24:18 +0000
  • 8079acb1d3 basic step0 implementation kcibul 2009-05-12 19:49:39 +0000
  • 57ecb7fbf1 Nicer reporting functions. kiran 2009-05-12 19:48:30 +0000
  • ee99320c83 Removed at Mark's request. hanna 2009-05-12 19:48:21 +0000
  • f1de3d6366 Minor tweaks to how probs are supplied. kiran 2009-05-12 19:47:41 +0000
  • 095dacd154 Experimental refactoring. kiran 2009-05-12 19:46:50 +0000
  • 758f8aa89b Experimental refactoring. kiran 2009-05-12 19:46:34 +0000
  • 1518f8f9bf Update training data creation in CovariateCounterWalker to output much smaller files by counting the number of occurences of each data point combination rather than outputting a line for each data point (i.e. each base). Also fixed bug in LogisticRecalibrationWalker where a null SAMHeader was being pulled from a function that is now marked deprecated. andrewk 2009-05-12 19:23:14 +0000
  • 4c12df372c Dumb, dumb bug. ebanks 2009-05-12 19:21:33 +0000
  • 6e69193e3c Deprecated calls to getSamReader on both the GenomeAnalysisEngine and the TraversalEngine. This call fails in the new style traversals, but it won't disapear until the cut-over to the new traversals is complete. aaron 2009-05-12 18:52:42 +0000
  • 630066cc0a 1. Merge LocusWindows whose reads overlap. 2. Fix bug (we weren't clearing the "to emit" list) ebanks 2009-05-12 17:33:23 +0000
  • 9f942fdfa0 Added code to correct the violation of the parsing interface. Now the analysis type resides in the command line arg, but is stored into the argument collection before it's passed to the genomeAnalysisEngine. aaron 2009-05-12 15:33:55 +0000
  • c4d89997ca put in a dummy sample_name so it'll compile jmaguire 2009-05-12 15:12:42 +0000
  • c8d7223789 do pooled calling properly for 1kg jmaguire 2009-05-12 15:12:13 +0000
  • 313a6d0fb5 lots of changes to facilitate calling indels and 1kG jmaguire 2009-05-12 15:11:42 +0000
  • add7b6cf65 add sample_name to constructor, misc bug fixes jmaguire 2009-05-12 15:10:17 +0000
  • 0267ccae7f add code for computing indel genotype likelihoods make reference lods negative jmaguire 2009-05-12 15:09:29 +0000
  • 11723fbcc2 added method indelPileup. Generates a pileup of indel alleles given reads and ofsets (as from a locus walker). jmaguire 2009-05-12 15:08:24 +0000
  • ee9077fc69 LocusIterator iterated through LocusContexts, which was fine until now when we need something that iterates through loci (GenomeLocs). Rename LocusIterator to LocusContextIterator. hanna 2009-05-12 13:54:57 +0000
  • 608948210c Check for a reference before extraction. hanna 2009-05-12 13:29:44 +0000
  • 32696b13f5 Fixed method override issue with old-style traversals. hanna 2009-05-12 01:22:18 +0000
  • 862b8a6787 intervals_file + genome_loc => intervals. hanna 2009-05-12 01:04:18 +0000
  • 0bca588629 Botched some boolean logic. hanna 2009-05-11 22:53:52 +0000
  • 23e9e29964 Changed reads traversals from providing a LocusContext from which the reference sequence could be extracted to a char[] containing the reference bases. hanna 2009-05-11 22:45:11 +0000
  • 052819bed5 Switched dependencies of GenomeAnalysisTK to depend on GenomeAnalysisEngine. hanna 2009-05-11 22:33:00 +0000
  • ff1b92acc4 Switch over to the GenomeAnalysisEngine/CommandLineGATK system from the GenomeAnalysisTK code. aaron 2009-05-11 22:05:58 +0000
  • 009e71fcd9 We need to sort cleaned reads ourselves (instead of letting SAMFileWriter do it) because the SAM headers are often screwed up and claim to be "unsorted". While here, I broke off the module from the SortSamIterator in case someone else wants to use it. ebanks 2009-05-11 15:43:42 +0000
  • c735e1f627 small javadoc cleanup. aaron 2009-05-11 03:44:21 +0000
  • e8b8ab5985 Added code to extend Matt's getReferenceBases out to the read walkers, so they can see the corresponding reference for each read. aaron 2009-05-11 03:42:38 +0000
  • 4ce3feba4d my move ended up being a copy, so this is to delete dupplicate files. aaron 2009-05-11 02:10:26 +0000
  • 898f65547e Added code to split GenomeAnalysisTK.java into an object concerned with loading command line args, and one that runs the engines. This will allow us to run the GATK from other tools (like Matlab). Also some cleanup to seperate out the legacy traversals and the new style traversals. This is not live yet, and any modifications you need should be made to GenomeAnalysisTK.java for now. aaron 2009-05-11 02:07:20 +0000
  • 8d43ec3d7e a fix for a situation where a chromosome on the reference file contains no reads, and doesn't align to the bam file. This came up using reference 18, which has chomosomes like chr1_random that aren't in all BAM files. aaron 2009-05-11 01:39:25 +0000
  • ee02b61068 added support for the argument collections code aaron 2009-05-09 07:07:33 +0000
  • 742840017b added the argument collection annotation for situations where fields in a command line args have embedded fields that should be checked for command line args aaron 2009-05-09 06:59:17 +0000
  • 55c1b688bd Fix mediocre javadoc. hanna 2009-05-08 22:31:16 +0000
  • 522f8b58be Added second method for getting large sequences of the reference for use in reads traversals. hanna 2009-05-08 22:18:04 +0000
  • 517f27f331 Added sharding strat. code that picks the right kind of shard, based on the traversal engine aaron 2009-05-08 21:55:10 +0000
  • 6e394490cb Cleanup in preparation for ByLoci traversal. Also did some work minimizing unit tests. hanna 2009-05-08 21:27:54 +0000
  • ee777c89de Change the default mechanism for adding ROD bindings to the new system. TODO: create a new object type for these triplets. hanna 2009-05-08 18:43:00 +0000