Commit Graph

886 Commits (5289230eb8417e37050e52f4ff00eacb7957e44b)

Author SHA1 Message Date
depristo 5289230eb8 Version 0.2.1 (released) of the TableRecalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1108 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:50:55 +00:00
asivache 73caf5db15 This is, strictly speaking, NOT a GATK module. Standalone, picard-level executable except that it uses couple of gatk utils (GenomeLoc). Remaps alignments from cutom reference (such as transcritome, hyb-sel etc) onto the 'master' reference
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1107 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:04:18 +00:00
kiran ee2af3b423 I committed this too soon... reverting...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1106 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:49:12 +00:00
hanna ad3a3aa350 First pass at passing lists of files / lists of interval arguments work. Note that the interval
ROD system will throw up its hands and not deal with intervals at all if multiple interval files 
are passed in (see JIRA GSA-95). 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1105 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:44:23 +00:00
kiran 23680a9a16 Replaced an expensive sort with an inexpensive direct computation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1104 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:25:12 +00:00
ebanks 83816fb801 Stop using the annoying refIterator (temp change until new traversal is green lighted)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1103 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:05:39 +00:00
aaron 0c3aabd1c5 logger output should be less verbose by default. Also fixed a printout in my read validation walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1102 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 19:47:29 +00:00
kcibul 11d83ac7d0 pushing up to test on unix box
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1101 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 19:00:48 +00:00
ebanks 0d9041380d remove printouts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1100 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:54:14 +00:00
aaron 0a16519aa2 a couple of additions to the tests, plus a change to the artificial resource pool to support the queryContained flag
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1099 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:30:32 +00:00
jmaguire 2c97c5e873 Compute a simple histogram of depth of coverage.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1098 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:30:11 +00:00
hanna 102b38c055 Sketch of new version of TraverseByLocusWindow, and a flag to conditionally turn it on.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1097 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:20:56 +00:00
aaron 4e04370f14 forgot a file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1096 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 17:56:17 +00:00
aaron 5b1c23a7f2 changes to fix and test the interval based traversals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1095 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 17:54:15 +00:00
kcibul 3b24264c2b incorporating skew check, further output of metrics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1094 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 16:01:07 +00:00
ebanks ea2426dcd0 one more change needed to commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1093 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 15:09:53 +00:00
ebanks 347608cfe0 remove hacked traversal in preparation for move to Matt's new one
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1091 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 14:32:05 +00:00
ebanks 940d75171a Big cleaner changes:
1. Added a Walker to merge intervals before cleaning
2. (Almost) all Walkers can filter out 454 reads (and do by default)
3. Got rid of -all command and related pieces (time to switch to CleanedReadsInjector)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1090 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 14:31:24 +00:00
asivache 3cb6d7048e don't freak out if two reference intervals a custom contig is built of are strictly adjacent; instead politely warn user that her data suck and proceed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1089 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 19:08:10 +00:00
asivache d4f3ca1a10 A utility class for keeping the mapping from 'custom' reference (e.g. transcriptome) onto the 'master' reference (e.g. whole genome), and for remapping SAM records from the former onto the latter. It's Arachne's BaitMultiMap, pretty much
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1088 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 18:16:15 +00:00
kiran 69dc502174 I forgot that this depends on BoundedScoringSet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1087 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 17:18:53 +00:00
aaron 61ce4e5983 quick doc change
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1086 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 16:35:46 +00:00
asivache a9c30c5fcc added -nosort cmdline flag; if specified, the output writer does not attempt to sort reads on the fly (sorting involves use of sorting collection backed up by temporary disk storage and can lead to crashes if temp size is low and/or filesystem is not behaving). Output can be later sorted externally by samtools
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1085 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 15:58:00 +00:00
kiran 7b5d8d7604 Changed the intensities array order from cycle,channel to channel,cycle. This, I'm told, is a far more efficient allocation strategy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1084 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 15:41:06 +00:00
kiran 3112302ec9 A priority-queue-like container that allows you to add a specified number of elements. When the limit has been reached, new additions replace the lower scoring elements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1083 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 15:39:47 +00:00
depristo 0a50f2e160 Updated and near final version of tabular recalibration system. Uses 'yates' correction for low-occupancy quality bins. Faster and more robust handling of input and output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1082 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 03:52:12 +00:00
hanna ef546868bf Pooling of unmapped reads -- improves runtime of files with tons of unmapped reads by an order of magnitude.
Desperately needs cleanup.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1080 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 23:48:06 +00:00
asivache dfa2efbcf5 not crashing when refseq annotation track is not requested is a nice added feature
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1079 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 22:52:40 +00:00
kcibul eb999f880a incorporating skew check
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1078 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 19:51:51 +00:00
asivache 1339f3f3e3 make refseq annotation file an optional argument; if specified, indels will be annotated as genomic/utr/intron/coding (accidentally appearing 'unknowns' probably mean that there's something wrong with refseq annotations?)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1077 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 18:17:03 +00:00
aaron 9c0dba6979 Some quick documentation and typo changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1076 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 13:40:13 +00:00
ebanks cb9c6f18ef spelling fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1074 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 01:46:35 +00:00
kiran 630d9e6a37 Fixed a typo.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1073 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 21:37:46 +00:00
aaron 8b4d0412ca Changed the duplicate traversal over to the new style of traversal and plumbed into the genome analysis engine. Also added a CountDuplicates walker, to validate the engine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1072 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 21:11:18 +00:00
aaron 4a92a999a0 made the constructors protected. Protected also mean package-protected, so other methods in the utils class can call these constructors (mainly the parser), as well as any inheriting classes. Also fixed some Intellij suggested clean-ups and documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1071 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 16:01:59 +00:00
ebanks 9e25229014 use better entropy threshold and don't print out "new" SNPs (since they're just an antrifact of the low (arbitrary) threshold
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1070 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 15:30:08 +00:00
aaron bcb64d92e9 Aaron: 1, GenomeLoc: 0. I changed our GenomeLoc class, seperating the creation of a genome loc (with the reference setup) to a parser class. GenomeLoc now just represents the actual genomic postion. The constructors are now package-protected (to enforce using the parser), but we may want to expose some constructors in the future.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1069 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 14:39:41 +00:00
depristo 26eb362f52 Added novel / known split to variant eval. That is, emits all of the standard analyses on SNP partitioned into those known in the provided known db and those novel. Also fixed problem with counting bases within subsets
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1068 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 21:27:40 +00:00
depristo d3f0c51944 longer update times so we don't overwhelm when running genome-wide
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1067 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 14:10:02 +00:00
ebanks a21c2a7e48 don't make mapping quality too high
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1066 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 04:51:42 +00:00
ebanks 686c8133ed massive change in the way the cleaner works, mostly revolving around the fact
that we no longer trust indels from the alignments (although we do use it as
a good alternate consensus possibility).
Other changes include better "greedy mode" performance and allowing the user
to have just the cleaned reads themselves be printed out (mostly for Matt's
CleanedReadInjector).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1065 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 03:56:59 +00:00
depristo 9e26550b0d Apprach v2. Added python analysis script, so java no longer must be used to analyses quality score data. About to refactor out lots of unneeded code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1063 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-20 16:00:23 +00:00
hanna dde52e33eb Cleanup of the cleaned read injector based on Eric's feedback.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1062 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 22:04:47 +00:00
kiran a0a3cf2f9f VariantFiltrationWalker can now apply specified exclusion tests after the feature tests. For a given variant, all reasons for exclusions are printed to screen.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1061 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 21:12:01 +00:00
depristo 8ac40e8e2d Updated version of the recalibration tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1060 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 17:45:47 +00:00
ebanks aef519b427 more comparisons
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1059 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 16:46:05 +00:00
jmaguire 58b132ee10 Eliminate redundant computation.
Still room for more optimization, but I called chr20 (60Mb) in a couple hours on the queue this morning.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1058 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 16:31:57 +00:00
jmaguire 3a1b58ca65 remove unused argument lodThreshold.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1057 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 12:40:12 +00:00
kiran 9a0151b7e1 Added an option to list all available feature classes and exit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1056 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 00:00:12 +00:00
kiran ed7afd8b70 Added javadocs. Now throws an exception if an unknown feature is specified. General cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1055 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 23:28:38 +00:00