Commit Graph

1879 Commits (985daec76e31b1f2bad0e12ec3a4ead296e1cf47)

Author SHA1 Message Date
ebanks 58937bf9ba You can now use the -exp flag to tell the Genotyper to include experimental annotations when it calls out to VariantAnnotator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2256 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 04:45:05 +00:00
ebanks b05e73a914 Finished implementation of the Wilcoxon Rank Sum Test thanks to Tim Fennell (calculating the normal approximation) and Nick Patterson (dithering to break tie bands).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2255 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 04:04:39 +00:00
ebanks 861221d046 - Moved various header line printing into a single method
- Fixed output for coverage above min depth



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2254 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 02:15:43 +00:00
ebanks aef4be5610 Moved CoarseCoverageWalker to core and packaged both coverage walkers in coverage/
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2249 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:53:36 +00:00
ebanks df4e001a07 Renamed to more accurately describe its function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2248 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:34:49 +00:00
ebanks c2017cc91b PrintCoverageWalker functionality moved to DepthOfCoverageWalker. Added integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2247 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:23:59 +00:00
ebanks 01cf5cc741 1. Merged CoverageHistogram into DepthOfCoverageWalker
2. Fixed bug in histogram calculation for small intervals
3. Better output in DoCWalker
4. Comments added to code



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2245 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:01:53 +00:00
ebanks 44b9f60735 PercentOfBasesCovered functionality moved to DepthOfCoverageWalker. Added integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2244 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 16:11:09 +00:00
ebanks 126d1eca35 Move to core (qc/)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2243 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 15:45:58 +00:00
ebanks 9da5cc25ad More archiving (with permission from Andrey) plus a move to core.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2242 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 15:40:27 +00:00
aaron b3bdcd0e60 make sure we close the error log stream in CommandLineProgram if it's opened; unit tests and clean-up for BasicVariation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2241 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 06:59:27 +00:00
ebanks a88202c3f6 Refactored DoCWalker to output in a more helpful and usable style. It now outputs in tabular format with 2 different sections: per locus and then per interval.
I am now at a point where I can merge the functionality from other coverage walkers into this one.
Thanks to Andrew for input.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2239 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 05:28:21 +00:00
ebanks d7e4cd4c82 Moving some useful and stable walkers to core:
- ClipReads
- PrintRODs (generalized to print all RODs that are Variations)
- FixBAMSortOrderTag (added documentation to walker so that people know what it does and why)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2238 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 03:00:45 +00:00
rpoplin 46f3d3e39b Added comments to AnalyzeCovariates and R scripts. R script prevents residuals from going off the edge of the plot. Added skeleton code to the recalibration walkers showing how we plan to handle SOLID reference inserting behavior.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2233 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 23:15:52 +00:00
aaron 451a20ed55 commenting out some broken integration tests, to be uncommented if needed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2232 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 23:13:24 +00:00
depristo c776f9fb90 Simple utilities for dealing with Complete Genomics data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2230 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 22:51:41 +00:00
aaron 9d598f1c82 some integration test clean-up
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2229 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 21:11:02 +00:00
ebanks a09fee2b5e Moved some more walkers to oneoffprojects and killed an old indel-related walker that isn't being used.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2228 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:28:07 +00:00
depristo dec0a781c2 Un-reinventing the wheel. --sleep argument removed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2227 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:19:28 +00:00
ebanks a3343c75db Move and rename a hybrid-selection-specific coverage calculation to hybridselection/
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2225 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:11:22 +00:00
ebanks 2c83f2f2bc Move MSG - plus now obsolete classes which it depends on -- to oneoffprojects (with permission from Jared).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2224 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:04:22 +00:00
chartl 6a9e7bea05 Removing experimental annotations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2220 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 19:03:55 +00:00
jmaguire c180a76b05 Added option "append": if set, and the specified discovery output already exists, don't re-call anything that's already present in that file. Append new calls to it.
Great for resuming long jobs that died partway through.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2219 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 18:56:19 +00:00
ebanks 0a2304eff8 - Rename minConfidenceScore in VariantEval to minPhredConfidenceScore
- Moved validation walkers to new qc dir
- Killed unused test



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2218 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 17:59:19 +00:00
ebanks a5dfc9107d - Cleaned up annotation code some more
- Use QualityUtils when phred-scaling now



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2217 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 17:45:29 +00:00
ebanks 7055a3ea2d - All annotations are now required to return their VCF INFO keys and descriptions
- Renamed keys to fit with the standard naming
- FisherStrand is no longer standard
- Integration tests no longer test experimental annotations since they're not stable



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2216 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 17:24:06 +00:00
rpoplin 67179e2412 Initial checkin of AnalyzeCovariates.java which replaces analyzeRecalQuals_1KG.py and is updated to use the new Covariates system. It creates similar plots of residual error for each covariate that was used in the calculation. There is also an option to filter out base qualities below a given threshold.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2215 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 16:47:35 +00:00
ebanks 2838629724 -VCF writer now checks whether the allele frequency has been set before trying to write it out.
-Renamed methods to be more consistent.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2214 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 16:25:32 +00:00
depristo 6231637615 fixes for VariantAnnotations and second bases. Misc. removal of failing (and unstable) integration tests that require rereview
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2213 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 15:41:35 +00:00
aaron d487428468 remove incorrect parentheses
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2211 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 06:46:32 +00:00
chartl 886c44303a -Removing BTTJ integration test -- this broke a few revisions ago (2169) and it is unclear whether the resulting change was a correction to something that had previously been incorrect, or a true build-breaker. I'm currently investigating which case this is, but since Bamboo is back up I'm removing this _temporarily_ so that other testing can occur, and will make whatever changes to the test necessary to reflect the truth, then replace the test itself. Additional (and related) pileup tests are upcoming as well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2210 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 05:37:15 +00:00
ebanks b979bd2ced - Optimized implementation of -byReadGroup in DoCWalker
- Added implementation of -bySample in DoCWalker
- Removed CoverageBySample and added a watered down version to the examples directory



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2209 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 03:39:24 +00:00
ebanks 7c73496e72 Moved DoC walker over to new pileup system so it no longer moves like it's stuck in molasses.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2208 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 02:46:39 +00:00
ebanks ba8a8febc6 Thanks to Steve Hershman for finding this bug:
getNegLog10PError() does not equal the confidence score (you need to multiply by 10 as confidence is traditionally phred scaled).  Probably we should change the method to be getNeg10Log10PError().  Anyone have strong feelings on this?



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2207 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 01:59:03 +00:00
ebanks 3303808a8f Yet more walkers moved to oneoffprojects.
Made hybridselection subdir in playground.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2205 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 21:29:12 +00:00
ebanks 05923f7fba Started transition to oneoffprojects.
Moved/killed a few other walkers (with permission).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2204 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 21:19:02 +00:00
ebanks c36069355e Trivial change to verbose
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2203 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 20:48:10 +00:00
jmaguire 74f6526e09 VCFHomogenizer: A class that extends InputStream and dynamically re-writes pilot1 VCF's to be on-spec.
VCFTool: A command-line tool with various useful VCF functions (validate, grep, concordance).




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2202 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 17:55:42 +00:00
jmaguire adf8f1f8b3 Add an InputStream constructor, which is immensely useful for various reasons.
Also a minor performance optimization.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2201 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 17:25:00 +00:00
ebanks e581cceab6 Got Kris's permission to delete these walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2200 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 16:57:28 +00:00
rpoplin 3180fffd43 Eliminated unnecessary boxing of longs in RecalDatum. Changes to RecalDatum in preparation for new AnalyzeCovariates script. Updated TableRecalibrationWalker to make use of these changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2199 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 16:49:05 +00:00
chartl 21a9a717e4 Some minor changes and test:
- DepthOfCoverage is now by reference (so locus-by-locus output correctly reports zero-coverage bases)
  - VariantsToVCF now lets you bind variants with any string except intervals and dbsnp (not just NA######)
  - A PileupWalker integration test on a particularly nasty FHS site
  - Two second-base annotation related integration tests on that same site
       + outputs were all hand-validated in matlab; within a certain tolerance for the annotations




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2197 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 15:15:54 +00:00
ebanks 084337087e Removing deprecated code and walkers for which I had the green light from repository.
Moved piecemealannotator and secondarybases to archive.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2195 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 05:58:20 +00:00
ebanks 2c16c18a04 Move Andrey's old indel code (plus MSG accuracy test, which depends on it) to archive.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2194 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 05:29:00 +00:00
ebanks 7c6c490652 An unfinished implementation of the Wilcoxon rank sum test and a variant annotation that uses it. I need to merge and update this code with Tim's implementation somehow - but that won't happen until later this week, so I'm committing this before I accidentally blow it away.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2193 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 04:56:17 +00:00
ebanks 00f15ea909 Improved performance of deletion-free pileup and added mapping-quality-zero-free pileup convenience method.
Finished converting genotyper and annotator code to new ReadBackedPileup system.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2192 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 04:50:47 +00:00
rpoplin 6bb864da2a More misc cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2191 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 22:29:07 +00:00
rpoplin b89b9adb2c misc code cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2190 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 21:16:00 +00:00
depristo e793e62fc9 minor code cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2189 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 20:57:20 +00:00
rpoplin 4969cb1957 CountCovariates uses new optimized ReadBackedPileup. It also smarter about re-doing calculations for the dnsnp variation rate sanity check.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2188 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 20:35:40 +00:00