Commit Graph

849 Commits (26eb362f52b9f4c3205b911885fc2e4ab981b7ed)

Author SHA1 Message Date
depristo 26eb362f52 Added novel / known split to variant eval. That is, emits all of the standard analyses on SNP partitioned into those known in the provided known db and those novel. Also fixed problem with counting bases within subsets
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1068 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 21:27:40 +00:00
depristo d3f0c51944 longer update times so we don't overwhelm when running genome-wide
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1067 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 14:10:02 +00:00
ebanks a21c2a7e48 don't make mapping quality too high
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1066 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 04:51:42 +00:00
ebanks 686c8133ed massive change in the way the cleaner works, mostly revolving around the fact
that we no longer trust indels from the alignments (although we do use it as
a good alternate consensus possibility).
Other changes include better "greedy mode" performance and allowing the user
to have just the cleaned reads themselves be printed out (mostly for Matt's
CleanedReadInjector).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1065 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 03:56:59 +00:00
depristo 9e26550b0d Apprach v2. Added python analysis script, so java no longer must be used to analyses quality score data. About to refactor out lots of unneeded code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1063 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-20 16:00:23 +00:00
hanna dde52e33eb Cleanup of the cleaned read injector based on Eric's feedback.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1062 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 22:04:47 +00:00
kiran a0a3cf2f9f VariantFiltrationWalker can now apply specified exclusion tests after the feature tests. For a given variant, all reasons for exclusions are printed to screen.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1061 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 21:12:01 +00:00
depristo 8ac40e8e2d Updated version of the recalibration tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1060 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 17:45:47 +00:00
ebanks aef519b427 more comparisons
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1059 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 16:46:05 +00:00
jmaguire 58b132ee10 Eliminate redundant computation.
Still room for more optimization, but I called chr20 (60Mb) in a couple hours on the queue this morning.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1058 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 16:31:57 +00:00
jmaguire 3a1b58ca65 remove unused argument lodThreshold.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1057 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 12:40:12 +00:00
kiran 9a0151b7e1 Added an option to list all available feature classes and exit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1056 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 00:00:12 +00:00
kiran ed7afd8b70 Added javadocs. Now throws an exception if an unknown feature is specified. General cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1055 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 23:28:38 +00:00
kiran 284fd6a5fb VariantFiltrationWalker now inspects its parent package and determines the list of features that can be applied. Command-line specification of filters to run look at the simple names of these features and do a case-insensitive match to determine which features to apply. A new verbose mode allows the user to see how the likelihoods are changing with the application of each subsequent feature.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1054 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 22:45:36 +00:00
kiran 0a0ef573f7 Methods for finding classes given a path and finding classes that implement a given interface. This stuff was mostly copied from private methods in WalkerManager, so there's some code redundancy. At some point, those calls could be replaced with these.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1053 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 22:43:19 +00:00
depristo d748c85dc4 Cleaned code and reorganized -- moving in the right direction for v2
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1052 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 22:28:34 +00:00
hanna af7a759ba4 Convert the somatic coverage tool to output from the packaging tool rather than from the dist target.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1050 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:29:30 +00:00
depristo 1bca144119 Moving things around
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1049 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:06:46 +00:00
depristo ca8a3bd85e Another temp checking for rearranging things
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1048 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:04:36 +00:00
depristo 3c40db260d Added REFERENCE_BASES required annotation for performance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1047 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:03:57 +00:00
kiran 03fe166994 Wrote a public static version of loadFirstNReasonableReadsTrainingSet() so Alec can call it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1046 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 20:18:17 +00:00
kiran a4fa02f11c Moved output outside of for loop so I don't have 10 different versions of the same variant (though, now that I think of it, that's not necessarily a terrible thing for debugging...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1045 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 19:59:26 +00:00
kiran 768a16e791 An experimental, tile-parallel version of the secondary base annotator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1044 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 19:58:09 +00:00
kiran e26df45e8e Different features can now be specified by repeatedly supplying the -F "featurename:arguments" option.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1043 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 18:45:03 +00:00
kiran 7a921c908c Can now adjust the genotype likelihoods of a variant returned from the rod. This automatically causes the lodBtr, lodBtnb, and genotype to be recomputed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1041 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 07:26:37 +00:00
kiran 9a7cec7d2e Directory to house variant calling and filtration tools.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1040 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 07:20:38 +00:00
jmaguire 5992d88409 skip N's in the reference (rather than crash. doh!)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1039 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 23:22:35 +00:00
kiran c4d9058f32 Added module rodVariants.class to the list of allowable RODs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1037 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 21:33:13 +00:00
kiran ab2a80f3ea A new ROD type that allows one to input a geli.calls file back into a walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1036 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 21:32:21 +00:00
kiran 9ef391706c Added outputting of genotype posteriors to geli.calls file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1035 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 21:31:46 +00:00
kcibul 615572ea06 output to out... not System.out...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1034 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 20:43:10 +00:00
aaron b947fd586f FIxed a nasty bug in GenomeLoc compareContigs; we were using '==' to compare Integer contig ID's. The surprising thing is that it actually works for Integers > -127 and < 128 (they're cached by the JVM, so it's actually comparing the underlying ints). Switched over GenomeLoc contigs to int based.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1033 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 20:19:47 +00:00
hanna cba9025983 More package-level documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1030 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 16:28:45 +00:00
hanna 43a28750e0 Package level documentation -- helps new users get acclimated to the codebase more quickly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1029 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 16:27:48 +00:00
kcibul 673205ed5f additional output tweaking
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1028 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 15:37:38 +00:00
depristo 7d281296a7 Finishing checking for building
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1027 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 14:12:40 +00:00
depristo d1e25bfe88 Intermediate checkin for safety -- now compiles
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1026 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 13:16:55 +00:00
depristo 2250769a42 Intermediate checkin for safety -- do not use
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1025 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 13:07:19 +00:00
depristo 86c8c08375 Intermediate checkin for safety -- do not use
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1024 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 13:06:24 +00:00
aaron 78b7fb25c7 allow contig names to have spaces in the fai. This is not yet supported by samtools fai generator (which truncates at the first space), but we might as well fix it on our side.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1022 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 22:23:12 +00:00
aaron 6ee64c7e43 added changes to support alec toUnmappedRead seek. Huge improvements (orders of magnitude) in unmapped read performance.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1021 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 22:15:56 +00:00
jmaguire 4f6d26849f Behold MultiSampleCaller!
Complete re-write of PoolCaller algorithm, now basically beta quality code. 

Improvements over PoolCaller include:

	- more correct strand test
	- fractional counts from genotypes (which means no individual lod threshold needed)
	- signifigantly cleaner code; first beta-quality code I've written since BaitDesigner so long ago.
	- faster, less likely to crash!	




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1020 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 20:03:24 +00:00
aaron 7db4497013 fixing the readTraversal output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1019 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 19:44:38 +00:00
aaron b11c5a7cd5 doing some read validation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1018 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 19:25:43 +00:00
asivache 010304fe44 bug: printing incorrect coordinates into output, finally fixed (?)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1017 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 18:08:56 +00:00
ebanks 647b8a1ab0 Fix TabularROD printing and testing so Aaron stops nagging me.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1016 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 15:49:26 +00:00
aaron a0a549557f added a check of the sort ordering to the query methods, so that we detect if a file is unsorted much earlier. Also added some verbosity to the exception; it now contains an information about the raw attribute we saw for 'SO', the sort order of the bam file.
Also fixed a bunch of documentation

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1015 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 22:15:03 +00:00
asivache 2259dc3a8f added filtering out indels with large levels of noise (mismatches) remaining in the close proximity; also a bug in recording deletion coordinates is fixed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1014 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 21:13:28 +00:00
ebanks a6477df6d1 Now optionally outputs whether "SNPs" are maintained/cleaned out/introduced by cleaning
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1013 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 20:02:02 +00:00
ebanks 11aa715630 added capability for filtering by platform
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1011 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 19:19:50 +00:00