aaron
4e04370f14
forgot a file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1096 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 17:56:17 +00:00
aaron
5b1c23a7f2
changes to fix and test the interval based traversals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1095 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 17:54:15 +00:00
kcibul
3b24264c2b
incorporating skew check, further output of metrics
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1094 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 16:01:07 +00:00
ebanks
ea2426dcd0
one more change needed to commit
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1093 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 15:09:53 +00:00
ebanks
347608cfe0
remove hacked traversal in preparation for move to Matt's new one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1091 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 14:32:05 +00:00
ebanks
940d75171a
Big cleaner changes:
...
1. Added a Walker to merge intervals before cleaning
2. (Almost) all Walkers can filter out 454 reads (and do by default)
3. Got rid of -all command and related pieces (time to switch to CleanedReadsInjector)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1090 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 14:31:24 +00:00
asivache
3cb6d7048e
don't freak out if two reference intervals a custom contig is built of are strictly adjacent; instead politely warn user that her data suck and proceed
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1089 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 19:08:10 +00:00
asivache
d4f3ca1a10
A utility class for keeping the mapping from 'custom' reference (e.g. transcriptome) onto the 'master' reference (e.g. whole genome), and for remapping SAM records from the former onto the latter. It's Arachne's BaitMultiMap, pretty much
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1088 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 18:16:15 +00:00
kiran
69dc502174
I forgot that this depends on BoundedScoringSet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1087 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 17:18:53 +00:00
aaron
61ce4e5983
quick doc change
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1086 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 16:35:46 +00:00
asivache
a9c30c5fcc
added -nosort cmdline flag; if specified, the output writer does not attempt to sort reads on the fly (sorting involves use of sorting collection backed up by temporary disk storage and can lead to crashes if temp size is low and/or filesystem is not behaving). Output can be later sorted externally by samtools
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1085 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 15:58:00 +00:00
kiran
7b5d8d7604
Changed the intensities array order from cycle,channel to channel,cycle. This, I'm told, is a far more efficient allocation strategy.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1084 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 15:41:06 +00:00
kiran
3112302ec9
A priority-queue-like container that allows you to add a specified number of elements. When the limit has been reached, new additions replace the lower scoring elements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1083 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 15:39:47 +00:00
depristo
0a50f2e160
Updated and near final version of tabular recalibration system. Uses 'yates' correction for low-occupancy quality bins. Faster and more robust handling of input and output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1082 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 03:52:12 +00:00
hanna
ef546868bf
Pooling of unmapped reads -- improves runtime of files with tons of unmapped reads by an order of magnitude.
...
Desperately needs cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1080 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 23:48:06 +00:00
asivache
dfa2efbcf5
not crashing when refseq annotation track is not requested is a nice added feature
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1079 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 22:52:40 +00:00
kcibul
eb999f880a
incorporating skew check
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1078 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 19:51:51 +00:00
asivache
1339f3f3e3
make refseq annotation file an optional argument; if specified, indels will be annotated as genomic/utr/intron/coding (accidentally appearing 'unknowns' probably mean that there's something wrong with refseq annotations?)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1077 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 18:17:03 +00:00
aaron
9c0dba6979
Some quick documentation and typo changes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1076 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 13:40:13 +00:00
ebanks
cb9c6f18ef
spelling fix
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1074 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 01:46:35 +00:00
kiran
630d9e6a37
Fixed a typo.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1073 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 21:37:46 +00:00
aaron
8b4d0412ca
Changed the duplicate traversal over to the new style of traversal and plumbed into the genome analysis engine. Also added a CountDuplicates walker, to validate the engine.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1072 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 21:11:18 +00:00
aaron
4a92a999a0
made the constructors protected. Protected also mean package-protected, so other methods in the utils class can call these constructors (mainly the parser), as well as any inheriting classes. Also fixed some Intellij suggested clean-ups and documentation
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1071 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 16:01:59 +00:00
ebanks
9e25229014
use better entropy threshold and don't print out "new" SNPs (since they're just an antrifact of the low (arbitrary) threshold
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1070 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 15:30:08 +00:00
aaron
bcb64d92e9
Aaron: 1, GenomeLoc: 0. I changed our GenomeLoc class, seperating the creation of a genome loc (with the reference setup) to a parser class. GenomeLoc now just represents the actual genomic postion. The constructors are now package-protected (to enforce using the parser), but we may want to expose some constructors in the future.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1069 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 14:39:41 +00:00
depristo
26eb362f52
Added novel / known split to variant eval. That is, emits all of the standard analyses on SNP partitioned into those known in the provided known db and those novel. Also fixed problem with counting bases within subsets
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1068 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 21:27:40 +00:00
depristo
d3f0c51944
longer update times so we don't overwhelm when running genome-wide
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1067 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 14:10:02 +00:00
ebanks
a21c2a7e48
don't make mapping quality too high
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1066 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 04:51:42 +00:00
ebanks
686c8133ed
massive change in the way the cleaner works, mostly revolving around the fact
...
that we no longer trust indels from the alignments (although we do use it as
a good alternate consensus possibility).
Other changes include better "greedy mode" performance and allowing the user
to have just the cleaned reads themselves be printed out (mostly for Matt's
CleanedReadInjector).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1065 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 03:56:59 +00:00
depristo
9e26550b0d
Apprach v2. Added python analysis script, so java no longer must be used to analyses quality score data. About to refactor out lots of unneeded code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1063 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-20 16:00:23 +00:00
hanna
dde52e33eb
Cleanup of the cleaned read injector based on Eric's feedback.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1062 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 22:04:47 +00:00
kiran
a0a3cf2f9f
VariantFiltrationWalker can now apply specified exclusion tests after the feature tests. For a given variant, all reasons for exclusions are printed to screen.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1061 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 21:12:01 +00:00
depristo
8ac40e8e2d
Updated version of the recalibration tool
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1060 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 17:45:47 +00:00
ebanks
aef519b427
more comparisons
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1059 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 16:46:05 +00:00
jmaguire
58b132ee10
Eliminate redundant computation.
...
Still room for more optimization, but I called chr20 (60Mb) in a couple hours on the queue this morning.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1058 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 16:31:57 +00:00
jmaguire
3a1b58ca65
remove unused argument lodThreshold.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1057 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 12:40:12 +00:00
kiran
9a0151b7e1
Added an option to list all available feature classes and exit.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1056 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 00:00:12 +00:00
kiran
ed7afd8b70
Added javadocs. Now throws an exception if an unknown feature is specified. General cleanup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1055 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 23:28:38 +00:00
kiran
284fd6a5fb
VariantFiltrationWalker now inspects its parent package and determines the list of features that can be applied. Command-line specification of filters to run look at the simple names of these features and do a case-insensitive match to determine which features to apply. A new verbose mode allows the user to see how the likelihoods are changing with the application of each subsequent feature.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1054 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 22:45:36 +00:00
kiran
0a0ef573f7
Methods for finding classes given a path and finding classes that implement a given interface. This stuff was mostly copied from private methods in WalkerManager, so there's some code redundancy. At some point, those calls could be replaced with these.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1053 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 22:43:19 +00:00
depristo
d748c85dc4
Cleaned code and reorganized -- moving in the right direction for v2
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1052 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 22:28:34 +00:00
hanna
af7a759ba4
Convert the somatic coverage tool to output from the packaging tool rather than from the dist target.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1050 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:29:30 +00:00
depristo
1bca144119
Moving things around
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1049 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:06:46 +00:00
depristo
ca8a3bd85e
Another temp checking for rearranging things
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1048 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:04:36 +00:00
depristo
3c40db260d
Added REFERENCE_BASES required annotation for performance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1047 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:03:57 +00:00
kiran
03fe166994
Wrote a public static version of loadFirstNReasonableReadsTrainingSet() so Alec can call it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1046 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 20:18:17 +00:00
kiran
a4fa02f11c
Moved output outside of for loop so I don't have 10 different versions of the same variant (though, now that I think of it, that's not necessarily a terrible thing for debugging...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1045 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 19:59:26 +00:00
kiran
768a16e791
An experimental, tile-parallel version of the secondary base annotator.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1044 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 19:58:09 +00:00
kiran
e26df45e8e
Different features can now be specified by repeatedly supplying the -F "featurename:arguments" option.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1043 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 18:45:03 +00:00
kiran
7a921c908c
Can now adjust the genotype likelihoods of a variant returned from the rod. This automatically causes the lodBtr, lodBtnb, and genotype to be recomputed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1041 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 07:26:37 +00:00