Commit Graph

43 Commits (7b452ea2b9ad2d2f3e8bbfcbfb0e818f0a87f18d)

Author SHA1 Message Date
chartl 328f89f66a Minor changes to MannWhitneyU:
- Comment fixes to better explain why two-sided test wants to use the LOWER (not higher) value for U
 - Much more direct testing of MWU functions
 - Uniform approximation was always using the < cumulant (sometimes the > cumulant should be used instead)
 - Uniform approximation currently not used (regime in which it was being used was not the right one -- not necessarily bad, but not an improvement over normal)
    + this particular approximation is for major imbalances of the form m >> n. Code may be altered in the future to use this method for this particular regime, if the method's not too slow.
 - Hook into one-sided test.

RegionalAssociationRecalibrator: NaNs were being caused by presence of Infinity and -Infinity values out of the walker. Currently I'm just re-setting them to arbitrary post-whitened values, but the walker will be changed to prevent output of these values, and the "fix" will undone.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5539 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-30 17:03:02 +00:00
chartl f6dfdc7f3b Single-tailed hypothesis testing in MWU
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5533 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-29 15:53:40 +00:00
chartl 5a79f16ea4 Fixed an edge case where an exception was thrown if either of the sets was empty for the MWU test. Also altered the output format so U itself is not printed (which though interesting, isn't so useful for recalibration), but rather a value I call V (really the deviation of U from its expectation).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5490 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-22 16:28:44 +00:00
chartl 60ddc08cdf Added a boatload of new case-control association modules. Switched the U-test to use longs rather than ints (it just so happened that I overflowed and started getting negative U statistics. Not good.) Added the ALL association type for ease of specifying that we want to throw the book at something. Added an svn-commit.tmp~ because i can't get rid of it even with --force. Hopefully I can remove it after.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5386 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 21:58:12 +00:00
chartl a40a8006b5 Added in unit tests for the statistics calculated by the test runner; and bug-fixes to the calculations; so we have some assurance that the statistics coming out the back-end are correct.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@5380 348d0f76-0448-11de-a6fe-93d51630548a
2011-03-06 16:54:02 +00:00
hanna 861ee3e37a Changing testing framework from junit -> testng, for its enhanced configurability.
Initial test to see how Bamboo will respond.  More detailed email to follow.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4609 348d0f76-0448-11de-a6fe-93d51630548a
2010-11-01 21:31:44 +00:00
aaron b968af5db5 The tribble indexes are now updated with correct sequence lengths for each contig they have in their sequence dictionary. Also clean-up in the RMD track builder.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4321 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-21 18:21:22 +00:00
depristo 40e6179911 Penultimate step in exception system overhaul. UserError is now UserException. This class should be used for all communication with the USER for problems with their inputs. Engine now validates sequence dictionaries for compatibility, detecting not only lack of overlap but now inconsistent headers (b36 ref with v37 BAM, for example) as well as ref / bam order inconsistency. New -U option to allow users to tolerate dangerous seq dict issues. WalkerTest system now supports testing for exceptions (see email and wiki for docs). Tests for vcf and bam vs. ref incompatibility. Waiting on Tribble seq dict improvements to detect b36 VCF with b37 ref (currently cannot tell this is wrong.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4258 348d0f76-0448-11de-a6fe-93d51630548a
2010-09-12 14:02:43 +00:00
hanna bf0b6bd486 Update integration tests to use the new ROD syntax.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@4112 348d0f76-0448-11de-a6fe-93d51630548a
2010-08-25 18:13:30 +00:00
aaron 844cb2ed33 fixing a bug that Eric found with RODs for reads, where some records could be omitted. Sorry Eric!
Also putting more tolerance into the timing on the tibble index tests (that check to make sure we're deleting out of date indexes, and not deleting perfectly good indexes).  It seems that some of the farm nodes aren't great with a stopwatch.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3674 348d0f76-0448-11de-a6fe-93d51630548a
2010-06-29 21:38:55 +00:00
ebanks e702bea99f Moving VE2 to core; calling it "VariantEval" (one more checkin coming)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3179 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 20:25:47 +00:00
ebanks 5f7564bf0a Better naming of output columns
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3175 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-15 18:08:07 +00:00
ebanks 04909fa6ad Removing arbitrary selects
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3169 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-14 17:46:39 +00:00
ebanks dde092fb61 Added the ability in VE2 to select which eval modules to run, so that you aren't forced to use all of them. You can use --list to list all of the possible modules to run.
Heads up everyone: by default, *no* modules are run.  Please add "-all" to your scripts to maintain the previous behavior.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3161 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-13 22:15:58 +00:00
rpoplin c2a37e4b5c Variant Quality Score modules in VariantEval2 no longer create huge lists which hold all of the quality scores encountered and instead cast the quality score to an integer and use hash tables. Bug fix for files in which all the quality scores are set to -1.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3146 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-09 18:36:06 +00:00
aaron 8fd59c8823 Modified the report system based on Ryan's feedback: tables are now created independently to avoid the permutation problem when they were all compressed in rows, and removed our dependency on FreeMarker. The Grep format stays the same.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3130 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 20:39:55 +00:00
depristo 918b746798 More detailed validation output. Fixes for genotyping overflow -- these are temporary and need to be properly resolved
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3129 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-07 16:38:28 +00:00
rpoplin 60c227d67f Added new VE2 module to create a plot of titv ratio by variant quality score
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3125 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-06 15:19:27 +00:00
aaron 585cc880a2 changed jexl expressions to jexl names in the VariantEval2 output, fixed integration test, and fixed a problem where a line was getting dropped in CSV output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3108 348d0f76-0448-11de-a6fe-93d51630548a
2010-04-01 16:23:14 +00:00
chartl dc802aa26f Moved CoverageStatistics to core. This will be (soon) renamed DepthOfCoverage; so please use CoverageStatistics
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3090 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-29 13:32:00 +00:00
aaron 074ec77dcc First go of the new output system for VE2. There are three different report types supported right now (Table, Grep, CSV), which can be
specified with the reportType command line option in VE2.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@3083 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-27 03:59:32 +00:00
ebanks 73d6167bd6 Fixing broken integration tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2998 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-14 23:18:49 +00:00
depristo b39b5edca8 Bug fix in variant eval 2. Preliminary (slow and buggy) support for -XL exclude lists.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2991 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-12 19:23:12 +00:00
depristo 4f4555c80f PPV and Sensitivity added to validation tool output; support for arbitrary -sample arguments to subset variant contexts by sample
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2978 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 22:28:31 +00:00
depristo 486bef9318 Support for validationRate calculation in variant eval 2; better error messages for failed genome loc parsing; tolerance to odd whitespace in plinkrod, and fix for monomorphic sites in vcf2variantcontext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2976 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-10 16:25:16 +00:00
chartl 0a49dffa8f Row/Column names are now R-friendly
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2966 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-09 19:01:03 +00:00
chartl bca9bdcc68 Add integration test for quartiles overflowing on interval reduce
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2957 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-08 16:18:45 +00:00
depristo ee913eca07 Forgot to check in fix this morning
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2943 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 21:07:19 +00:00
chartl 8738c544f1 Minor refactoring of CoverageStatistics to allow simultaneous output of per-sample and per-read group statistics.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2940 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-05 17:06:52 +00:00
aaron 366771d5a6 another test-with-multiple outputs fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2934 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 22:46:15 +00:00
chartl 706d49d84c Commit for Aaron
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2932 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 21:29:07 +00:00
aaron 80cc6bbeb4 add a way to test files generated by a walker that aren't command-line arguments; added some example code in CoverageStatisticsIntegrationTest for Chris.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2929 348d0f76-0448-11de-a6fe-93d51630548a
2010-03-04 20:20:58 +00:00
chartl 87f8fb7282 Quick commit in advance of Aaron's. Just a bunch of refactoring (private classes separated out, put in proper package). Also support added for coverage by read group rather than sample.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2897 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 16:39:47 +00:00
depristo 9a6b384adb Support for no qual fields in VCF; better support for Mendelian violation calculations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2893 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-26 00:29:17 +00:00
chartl 3d92e5a737 Initial commit of integration test(s) for CoverageStatistics, currently in progress [midway commit is for Matt]
Modifications to CoverageStatistics - now includes and extends much of the behavior of DepthOfCoverage (per-base output, per-target output).

Additional functionality (coverage without deletions, base counts, by read group instead of by sample) is upcoming.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2888 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-25 20:25:07 +00:00
depristo 5f74fffa02 Massive improvements to VE2 infrastructure. Now supports VCF writing of interesting sites; multiple comp and eval tracks. Eric will be taking it over and expanding functionality over the next few weeks until it's ready to replace VE1
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2832 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-12 15:26:52 +00:00
depristo c66861746a improvements to ve2, including more meaningful mendelian violation counting. Support for VCF emitted interesting sites, annotated according to the evaluations themselves. Basic intergration test for VE2 started
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2819 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-10 16:12:29 +00:00
depristo c6d86da4b8 almost managed to move things around perfectly in move go
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2788 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-05 14:18:26 +00:00
depristo 1d86dd7fd1 Interface changes following Matt's advice. VariantContexts are now immutable, and there are special mutable versions, in case you need to change things. AttributedObject now a InferredGeneticContext and package protected. VariantContexts are now named, which makes them easier to use with the rod system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2780 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-04 20:55:49 +00:00
depristo d9671dffba Documentation for VariantContext. Please read it and start using it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2756 348d0f76-0448-11de-a6fe-93d51630548a
2010-02-01 17:49:51 +00:00
depristo 956b570c8e V5 improvements to VariantContext. Now fully supports genotypes. Filtering enabled. Significant tests throughout system. Support for rebuilding variant contexts from subsets of genotypes. Some code cleanup around repository
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2721 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 18:37:17 +00:00
depristo f6bca7873c V3 of VariantContext. Support for Genotypes and NO_CALL alleles. QUAL fields fully implemented. Can parse VCF records and dbSNP. More complete validation. Detailed testing routines for VariantContext and Allele.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2718 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-28 04:10:16 +00:00
depristo 3399ad9691 Incremental update 2 -- refined allele and VariantContext classes; support for AttributedObject class; extensive testing for Allele class, and partial for VariantContext. Now possible to easily convert dbSNP to VariantContext.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2705 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-27 17:19:37 +00:00