aaron
a34c2442c0
moved hard-coded file paths to the oneKGLocation, validationDataLocation, and seqLocation variables setup in the BaseTest.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2460 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 07:40:48 +00:00
rpoplin
562db45fa5
Sites that were marked NO_DINUC no longer get dinuc-corrected but are still recalibrated using the other available covariates. Solid cycle is now the same as Illumina cycle pending an analysis that looks at the effect of PrimerRoundCovariate. Solid color space methods cleaned up to reduce number of calls to read.getAttribute(). Polished NHashMap sort method in preparation for move to core/utils. Added additional plots in AnalyzeCovariates to look at reported quality as a function of the covariate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2451 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 20:19:37 +00:00
ebanks
438d21842a
The new recalibrator had been mimicking the behavior of the old one in that if there was no dinuc available (following a no-call base or at either end of a read), it didn't try to recalibrate. Now that Ryan has modularized the system, we no longer need to skip the base completely (we just need to skip the dinuc value)... which is good because the Picard people complained after realizing that cycle #1 never got recalibrated.
...
The major effects of this commit are as follows:
1. We no longer skip any good bases (of course, this change alone breaks every single integration test).
2. The dinuc covariate returns a "no dinuc" value for the first base of a read (but not for the last base anymore, since there is a valid dinuc) or if the previous base is a bad base (e.g. 'N').
I've done a bunch of testing on real data and everything looks right; however, let's wait until the recalibrator guru gets back from vacation next week and can double-check everything before shipping this out in another early access release.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2443 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 20:41:29 +00:00
ebanks
6df40876a3
Un-reverted Matt's previous changes and fixed integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2441 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 02:47:00 +00:00
ebanks
872a9d1c7b
I'm making this change now (as opposed to waiting until Monday) to honor Tim's request.
...
The cycle covariate is now first/second of pair aware. I'm taking it on faith from both Chris Hartl (waiting on slides from him) and Tim that this is the right thing to do. We'll have Ryan confirm it all next week.
The only change is that if a read is the second of a pair, we multiple the cycle by -1 (a simple way of separating its index from that of its mate).
Of course, this broke all integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2431 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 16:26:43 +00:00
rpoplin
fdf542c214
The CycleCovariate for 454 data is now the TACG flow cycle. That is, each flow grabs all the T's, A's, C's, and G's in order in a single cycle. This is changed from incrementing the cycle whenever there is a discontinuous nucleotide along the direction of the read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2408 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:39:51 +00:00
rpoplin
6fbf77be95
Updating the two solid_recal_mode options to also change the previous base since solid aligner prefers single color mismatch alignments over true SNP alignments. COUNT_AS_MISMATCH mode has been removed completely. The default mode is now SET_Q_ZERO.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2394 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 20:07:26 +00:00
hanna
07f1859290
Added integration test for running the recalibrator with no index.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2393 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 19:10:53 +00:00
rpoplin
8e44bfd2ef
CycleCovariate and PrimerRoundCovariate now correctly handle negative strand 454 and SOLID reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2349 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 21:52:30 +00:00
rpoplin
1d5b9883db
Added --solid_recal_mode argument to experiment with different ways of dealing with solid reference bias. Currently the default option is DO_NOTHING which means use the same behavior as the old recalibrator. Eventually the new methods in RecalDataManager will be moved over to a SolidUtils class. Added transition and transversion methods to BaseUtils that work like simpleComplement, used with the color space in my solid methods. Also, initial check-in of HomopolymerCovariate.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2276 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 14:26:27 +00:00
rpoplin
3180fffd43
Eliminated unnecessary boxing of longs in RecalDatum. Changes to RecalDatum in preparation for new AnalyzeCovariates script. Updated TableRecalibrationWalker to make use of these changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2199 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 16:49:05 +00:00
rpoplin
d8146ab23d
Changed the format of the recalibration csv file slightly so that it is easier to load the file into something like R and look at the values of the covariates.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2183 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 17:55:23 +00:00
rpoplin
6ff8526592
Added arguments to the recalibration walkers so the user can specify the default read group id and platform to use when a read has no read group. There are also options to force every read group and every platform to be the specified values. Added integration tests that use a bam file with no read groups. Added comments to all the covariates to explain what each of the methods in the Covariate interface are used for.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2157 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 15:41:12 +00:00
rpoplin
c9ff5f209c
Added a CountCovariates integration test that uses a vcf file as the list of variant sites to skip over instead of the usual dbSNP rod.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2152 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 21:51:38 +00:00
rpoplin
dffa46b380
BAM files created by TableRecalibration now have the version number and list of covariates used appended to their header with a new 'PG' tag. Eventually the entire list of command line args will be put in there as well. Big thanks to Matt and Aaron. The integration test uses the --no_pg_tag so that the md5 doesn't change every time the version number changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2148 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 20:53:57 +00:00
rpoplin
7f947f6b60
Updated recalibrator integration tests to use all three platforms as well as a bam with multi-platform reads intermingled. CountCovariates v2.0.1: Once again uses a read filter to filter out zero mapping quality reads. Added --sorted_output option to output the table recalibration file in sorted order
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2122 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 19:51:36 +00:00
rpoplin
1d46de6d34
The old recalibrator is replaced with the refactored recalibrator. Added a version message to the logger output. These walkers start at version 2.0.0
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2117 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 14:58:33 +00:00
rpoplin
22aaf8c5e0
Added the old recalibrator integration tests to the refactored recalibrator sitting in playground.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2096 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 22:43:28 +00:00
aaron
2e4949c4d6
Rev'ing Picard, which includes the update to get all the reads in the query region (GSA-173). With it come a bunch of fixes, including retiring the FourBaseRecaller code, and updated md5 for some walker tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1751 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 20:37:59 +00:00
ebanks
b0fa19a0b2
Fixed recal integration test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1689 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:22:32 +00:00
depristo
d9588e6083
bug fixes to LIBS and LIBH following ultra-aggressive regression testing across 454, solid, and solexa
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1558 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:36:12 +00:00
aaron
0df6a9da5c
-Seperating out normal (unit) tests and integration tests. From now on if your test are more of an integration test (i.e. you're testing a walker and all the subunits it relies on) please name the test "______IntegrationTest.java" instead of "______Test.java".
...
-Bamboo will now run the integration tests once a day, and the normal units tests on each check-in.
-Also added a bunch of unit tests for VariantEval walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1555 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:01:40 +00:00
depristo
1c3d67f0f3
Improvements to the CountCovariates and TableRecablirator, as well as regression tests for SLX and 454 data
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1539 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 22:26:57 +00:00
depristo
8e129d76fd
Support for original quality scores OQ flag. pQ flag in TableRecalibation to preserve quality scores below a threshold (defaulting to 5)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1474 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 14:14:21 +00:00
hanna
2db86b7829
Move the cleaned read injector test from playground to core. Remove CovariateCounterTest's dependency on the CleanedReadInjector. Start doing a bit of cleanup on the CLP's FieldParsers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1312 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 19:44:04 +00:00
depristo
b9d533042e
Two-tailed HardyWeinberg test implemented. VariantEval now separate violations from summary outputs for clarity; Fixing problems with CovariateCounterTest and TabularRodTest
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1177 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 22:02:04 +00:00
aaron
1dcababad1
a fix to make the test run
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1120 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 20:24:32 +00:00
depristo
5289230eb8
Version 0.2.1 (released) of the TableRecalibrator
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1108 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:50:55 +00:00
depristo
8ac40e8e2d
Updated version of the recalibration tool
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1060 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 17:45:47 +00:00