Commit Graph

  • 891f4c2bd9 up on the wiki, removing it from the repo. Ignore my last commit aaron 2009-07-01 15:42:02 +0000
  • 05c5659053 This document is now up as content on the wiki, so I'm removing it from svn. aaron 2009-07-01 15:40:32 +0000
  • d58eeb7539 Don't cry wolf: only one warning is now emitted, instead of tons of warnings. aaron 2009-07-01 13:50:37 +0000
  • a3e0ec20c4 Kill the TraverseByLocusWindows traversal. TraverseLocusWindows will take its place. hanna 2009-07-01 13:46:35 +0000
  • 74e9bb46b4 Contents of the Hello World doc are now in the wiki. hanna 2009-06-30 22:32:56 +0000
  • 93da64db10 Update naming for consistency. hanna 2009-06-30 22:03:21 +0000
  • e93f751bd7 First step in replacing the Hello, World! document. Revamped the HelloWalker and checked it into the source tree, created a special build file for it, and added it to the packaging tool. hanna 2009-06-30 21:59:54 +0000
  • fdff233d70 new injector args and address Kiran's question ebanks 2009-06-30 19:49:22 +0000
  • 8d3dc57c3d Commit to emit in sorted order so we don't have to use /tmp ebanks 2009-06-30 19:47:15 +0000
  • f5cba5a6bb Fixed genome loc to be immutable, the only way to now change it's values is through the GenomeLocParser. aaron 2009-06-30 19:17:24 +0000
  • 455275996f Added contents to the wiki. hanna 2009-06-30 18:29:46 +0000
  • 177d6d00b8 added setContigIndex(). NOTE: both setContig() and setContigIndex are UNSAFE as one does not automatically involve updating the other, and there's also no validation asivache 2009-06-30 17:40:37 +0000
  • 9fca79ed62 Read groups are now sorted in the output data, for convenience depristo 2009-06-30 16:50:44 +0000
  • fe421e5712 All IntelliJ best practices info is now on the wiki. hanna 2009-06-30 16:45:52 +0000
  • 08df4771c8 count X/N/etc. as mismatches for the NM attribute in the BAMs ebanks 2009-06-30 16:08:55 +0000
  • d412c5dc2f Updated to use SecondaryBaseAnnotator class. kiran 2009-06-30 16:08:43 +0000
  • e3cdf7ef4b A single class that can be handed reads for training and basecalling. When in training mode, we accumulate no more than 10000 reads and always replace the lowest-quality reads with superior quality reads. Thus, the training set always contains 10000 of the best reads available. After training is complete, the class can be interrogated to return the SQ tag for a given RawRead object. kiran 2009-06-30 16:03:15 +0000
  • 74cc7136f7 All info from the user manual is now in the wiki. Deleting. hanna 2009-06-30 15:29:59 +0000
  • ddf4003536 Updates to picard public / private and sam. hanna 2009-06-30 14:50:55 +0000
  • 8aa3b65e7f fix to guarantee emission in sorted order ebanks 2009-06-30 13:48:41 +0000
  • 03f8177a53 When you get the reference string for a read that is mapped partially off the end of a contig, the string is masked with X's for base positions without corresponding reference positions. aaron 2009-06-29 20:51:55 +0000
  • 1dcababad1 a fix to make the test run aaron 2009-06-29 20:24:32 +0000
  • a17bf145f6 fix to respond to the change in IndelLikelihood constructor. jmaguire 2009-06-29 19:05:33 +0000
  • 7ecc43e9a7 Fixed subtle null ptr exception discovered by Kiran. Now deals with the rare situation where you have only say Q28 bases at dbSNP sites, so you fail in the Table recalibration step with a null pointer error into the data structure indexed by quality score. If you are Q score above those seen before you aren't modified in any way. depristo 2009-06-29 18:57:42 +0000
  • 95e2ae0171 Deal with reads whose ends are aligned off the end of a chromosome. Includes update to ignore non-ATCG bases (not just 'N') (Also, create a BWA dir for future work) ebanks 2009-06-29 16:50:05 +0000
  • 65a788f18a Added a ROD (SangerSNP) for parsing the Sanger's chr20 pilot1 SNP calls. Some doodling around with indel calling in an EM context. jmaguire 2009-06-29 16:32:12 +0000
  • ceeeec13b8 Computes a vector of numbers of reads falling into successive intervals of specified length (e.g. numbers of reads per every 1Mbase) asivache 2009-06-29 16:12:21 +0000
  • 3bacb3db03 updated some defaults ebanks 2009-06-26 19:28:05 +0000
  • eb74b16e39 updated what constitutes removing entropy ebanks 2009-06-26 18:29:00 +0000
  • d7d4298917 Some files to support generic genotype outputing aaron 2009-06-26 15:43:41 +0000
  • 1a97c86f95 don't crash when an unmapped read is encountered, just write it into the output file, it should be ok asivache 2009-06-26 15:33:59 +0000
  • da1f168a3e updated docs ebanks 2009-06-26 05:20:17 +0000
  • 491ed70b44 TraverseByLocusWindow -- asstd bug fixes. hanna 2009-06-25 22:51:38 +0000
  • 5289230eb8 Version 0.2.1 (released) of the TableRecalibrator depristo 2009-06-25 22:50:55 +0000
  • 73caf5db15 This is, strictly speaking, NOT a GATK module. Standalone, picard-level executable except that it uses couple of gatk utils (GenomeLoc). Remaps alignments from cutom reference (such as transcritome, hyb-sel etc) onto the 'master' reference asivache 2009-06-25 22:04:18 +0000
  • ee2af3b423 I committed this too soon... reverting... kiran 2009-06-25 20:49:12 +0000
  • ad3a3aa350 First pass at passing lists of files / lists of interval arguments work. Note that the interval ROD system will throw up its hands and not deal with intervals at all if multiple interval files are passed in (see JIRA GSA-95). hanna 2009-06-25 20:44:23 +0000
  • 23680a9a16 Replaced an expensive sort with an inexpensive direct computation. kiran 2009-06-25 20:25:12 +0000
  • 83816fb801 Stop using the annoying refIterator (temp change until new traversal is green lighted) ebanks 2009-06-25 20:05:39 +0000
  • 0c3aabd1c5 logger output should be less verbose by default. Also fixed a printout in my read validation walker aaron 2009-06-25 19:47:29 +0000
  • 11d83ac7d0 pushing up to test on unix box kcibul 2009-06-25 19:00:48 +0000
  • 0d9041380d remove printouts ebanks 2009-06-25 18:54:14 +0000
  • 0a16519aa2 a couple of additions to the tests, plus a change to the artificial resource pool to support the queryContained flag aaron 2009-06-25 18:30:32 +0000
  • 2c97c5e873 Compute a simple histogram of depth of coverage. jmaguire 2009-06-25 18:30:11 +0000
  • 102b38c055 Sketch of new version of TraverseByLocusWindow, and a flag to conditionally turn it on. hanna 2009-06-25 18:20:56 +0000
  • 4e04370f14 forgot a file aaron 2009-06-25 17:56:17 +0000
  • 5b1c23a7f2 changes to fix and test the interval based traversals aaron 2009-06-25 17:54:15 +0000
  • 3b24264c2b incorporating skew check, further output of metrics kcibul 2009-06-25 16:01:07 +0000
  • ea2426dcd0 one more change needed to commit ebanks 2009-06-25 15:09:53 +0000
  • f6eeb36c93 updated doc ebanks 2009-06-25 14:32:51 +0000
  • 347608cfe0 remove hacked traversal in preparation for move to Matt's new one ebanks 2009-06-25 14:32:05 +0000
  • 940d75171a Big cleaner changes: 1. Added a Walker to merge intervals before cleaning 2. (Almost) all Walkers can filter out 454 reads (and do by default) 3. Got rid of -all command and related pieces (time to switch to CleanedReadsInjector) ebanks 2009-06-25 14:31:24 +0000
  • 3cb6d7048e don't freak out if two reference intervals a custom contig is built of are strictly adjacent; instead politely warn user that her data suck and proceed asivache 2009-06-24 19:08:10 +0000
  • d4f3ca1a10 A utility class for keeping the mapping from 'custom' reference (e.g. transcriptome) onto the 'master' reference (e.g. whole genome), and for remapping SAM records from the former onto the latter. It's Arachne's BaitMultiMap, pretty much asivache 2009-06-24 18:16:15 +0000
  • 69dc502174 I forgot that this depends on BoundedScoringSet. kiran 2009-06-24 17:18:53 +0000
  • 61ce4e5983 quick doc change aaron 2009-06-24 16:35:46 +0000
  • a9c30c5fcc added -nosort cmdline flag; if specified, the output writer does not attempt to sort reads on the fly (sorting involves use of sorting collection backed up by temporary disk storage and can lead to crashes if temp size is low and/or filesystem is not behaving). Output can be later sorted externally by samtools asivache 2009-06-24 15:58:00 +0000
  • 7b5d8d7604 Changed the intensities array order from cycle,channel to channel,cycle. This, I'm told, is a far more efficient allocation strategy. kiran 2009-06-24 15:41:06 +0000
  • 3112302ec9 A priority-queue-like container that allows you to add a specified number of elements. When the limit has been reached, new additions replace the lower scoring elements. kiran 2009-06-24 15:39:47 +0000
  • 0a50f2e160 Updated and near final version of tabular recalibration system. Uses 'yates' correction for low-occupancy quality bins. Faster and more robust handling of input and output depristo 2009-06-24 03:52:12 +0000
  • caf5aef0f8 Much improved python analysis routines, as well as easier / more correct merging utility. Better R scripts, which now close recalibration data by the confidence of the quality score itself depristo 2009-06-24 01:12:35 +0000
  • ef546868bf Pooling of unmapped reads -- improves runtime of files with tons of unmapped reads by an order of magnitude. Desperately needs cleanup. hanna 2009-06-23 23:48:06 +0000
  • dfa2efbcf5 not crashing when refseq annotation track is not requested is a nice added feature asivache 2009-06-23 22:52:40 +0000
  • eb999f880a incorporating skew check kcibul 2009-06-23 19:51:51 +0000
  • 1339f3f3e3 make refseq annotation file an optional argument; if specified, indels will be annotated as genomic/utr/intron/coding (accidentally appearing 'unknowns' probably mean that there's something wrong with refseq annotations?) asivache 2009-06-23 18:17:03 +0000
  • 9c0dba6979 Some quick documentation and typo changes aaron 2009-06-23 13:40:13 +0000
  • 6b5560e1e9 Fairly detailed first pass at a README for the MSA realigner ebanks 2009-06-23 01:54:38 +0000
  • cb9c6f18ef spelling fix ebanks 2009-06-23 01:46:35 +0000
  • 630d9e6a37 Fixed a typo. kiran 2009-06-22 21:37:46 +0000
  • 8b4d0412ca Changed the duplicate traversal over to the new style of traversal and plumbed into the genome analysis engine. Also added a CountDuplicates walker, to validate the engine. aaron 2009-06-22 21:11:18 +0000
  • 4a92a999a0 made the constructors protected. Protected also mean package-protected, so other methods in the utils class can call these constructors (mainly the parser), as well as any inheriting classes. Also fixed some Intellij suggested clean-ups and documentation aaron 2009-06-22 16:01:59 +0000
  • 9e25229014 use better entropy threshold and don't print out "new" SNPs (since they're just an antrifact of the low (arbitrary) threshold ebanks 2009-06-22 15:30:08 +0000
  • bcb64d92e9 Aaron: 1, GenomeLoc: 0. I changed our GenomeLoc class, seperating the creation of a genome loc (with the reference setup) to a parser class. GenomeLoc now just represents the actual genomic postion. The constructors are now package-protected (to enforce using the parser), but we may want to expose some constructors in the future. aaron 2009-06-22 14:39:41 +0000
  • 26eb362f52 Added novel / known split to variant eval. That is, emits all of the standard analyses on SNP partitioned into those known in the provided known db and those novel. Also fixed problem with counting bases within subsets depristo 2009-06-21 21:27:40 +0000
  • d3f0c51944 longer update times so we don't overwhelm when running genome-wide depristo 2009-06-21 14:10:02 +0000
  • a21c2a7e48 don't make mapping quality too high ebanks 2009-06-21 04:51:42 +0000
  • 686c8133ed massive change in the way the cleaner works, mostly revolving around the fact that we no longer trust indels from the alignments (although we do use it as a good alternate consensus possibility). Other changes include better "greedy mode" performance and allowing the user to have just the cleaned reads themselves be printed out (mostly for Matt's CleanedReadInjector). ebanks 2009-06-21 03:56:59 +0000
  • c9e6cb72e1 Major improvements to python analysis code -- now computes a host of statistics about quality scores from the recal_data.csv file emitted by countcovariates. Includes average Q scores, medians, modes, stdev, coefficients of variation, RSME, and % bases > q10, q20, q30. Can finally quantify and compare the improvement of quality score recalibration. depristo 2009-06-20 19:50:37 +0000
  • 9e26550b0d Apprach v2. Added python analysis script, so java no longer must be used to analyses quality score data. About to refactor out lots of unneeded code depristo 2009-06-20 16:00:23 +0000
  • dde52e33eb Cleanup of the cleaned read injector based on Eric's feedback. hanna 2009-06-19 22:04:47 +0000
  • a0a3cf2f9f VariantFiltrationWalker can now apply specified exclusion tests after the feature tests. For a given variant, all reasons for exclusions are printed to screen. kiran 2009-06-19 21:12:01 +0000
  • 8ac40e8e2d Updated version of the recalibration tool depristo 2009-06-19 17:45:47 +0000
  • aef519b427 more comparisons ebanks 2009-06-19 16:46:05 +0000
  • 58b132ee10 Eliminate redundant computation. jmaguire 2009-06-19 16:31:57 +0000
  • 3a1b58ca65 remove unused argument lodThreshold. jmaguire 2009-06-19 12:40:12 +0000
  • 9a0151b7e1 Added an option to list all available feature classes and exit. kiran 2009-06-19 00:00:12 +0000
  • ed7afd8b70 Added javadocs. Now throws an exception if an unknown feature is specified. General cleanup. kiran 2009-06-18 23:28:38 +0000
  • 284fd6a5fb VariantFiltrationWalker now inspects its parent package and determines the list of features that can be applied. Command-line specification of filters to run look at the simple names of these features and do a case-insensitive match to determine which features to apply. A new verbose mode allows the user to see how the likelihoods are changing with the application of each subsequent feature. kiran 2009-06-18 22:45:36 +0000
  • 0a0ef573f7 Methods for finding classes given a path and finding classes that implement a given interface. This stuff was mostly copied from private methods in WalkerManager, so there's some code redundancy. At some point, those calls could be replaced with these. kiran 2009-06-18 22:43:19 +0000
  • d748c85dc4 Cleaned code and reorganized -- moving in the right direction for v2 depristo 2009-06-18 22:28:34 +0000
  • 9fe330cdf1 Bump picard and sam to latest version. hanna 2009-06-18 21:59:40 +0000
  • af7a759ba4 Convert the somatic coverage tool to output from the packaging tool rather than from the dist target. hanna 2009-06-18 21:29:30 +0000
  • 1bca144119 Moving things around depristo 2009-06-18 21:06:46 +0000
  • ca8a3bd85e Another temp checking for rearranging things depristo 2009-06-18 21:04:36 +0000
  • 3c40db260d Added REFERENCE_BASES required annotation for performance depristo 2009-06-18 21:03:57 +0000
  • 03fe166994 Wrote a public static version of loadFirstNReasonableReadsTrainingSet() so Alec can call it. kiran 2009-06-18 20:18:17 +0000
  • a4fa02f11c Moved output outside of for loop so I don't have 10 different versions of the same variant (though, now that I think of it, that's not necessarily a terrible thing for debugging... kiran 2009-06-18 19:59:26 +0000
  • 768a16e791 An experimental, tile-parallel version of the secondary base annotator. kiran 2009-06-18 19:58:09 +0000
  • e26df45e8e Different features can now be specified by repeatedly supplying the -F "featurename:arguments" option. kiran 2009-06-18 18:45:03 +0000
  • 17a5b50ea4 Script that aligns paired-end BAMs using BWA. andrewk 2009-06-18 18:14:58 +0000