Commit Graph

2294 Commits (5eac510b2f0d922d8cf7e976f8a3cffd9dffd0b2)

Author SHA1 Message Date
hanna 5eac510b2f Refactor the code I gave Eric yesterday to output command line arguments.
Convert it from a completely wonky solution to a slightly less wonky solution
that will work in more cases.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2310 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 18:57:54 +00:00
hanna 74b8055b6a Only show extra walker help if the user didn't specify a walker or specified
an invalid walker.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2309 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 16:43:06 +00:00
ebanks e6f541fdca Forgot to update integration test last night
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2308 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 12:57:10 +00:00
depristo b2dfe85648 Better support for reading truth file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2307 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 12:16:05 +00:00
ebanks 0fae798b3a 1. Discoverable base calculations don't care about Genotypes (use Variation's PError regardless of whether the call is ref or var - it's the correct value even for ref calls).
2. Call a base genotypable if any of the Genotypes is above the threshold (you can't assume there's a single Genotype associated with the Variation).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2306 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 04:26:06 +00:00
ebanks a45adadf1f VCFGenotypeRecord already defines all the methods needed to be SampleBacked, so let's annotate it as being SampleBacked. This way, when used as a generic Genotype, sample data can be retrieved.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2305 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 04:16:21 +00:00
ebanks 78d5ac9bc2 Don't check het count when there are multiple Genotypes per Variation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2304 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 04:07:47 +00:00
ebanks ee691b8899 Added a whole bunch of unit tests for VCF reading.
We could still use more, but this is a good start.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2303 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 03:31:23 +00:00
chartl 6a4118ad3c grr, ought to actually assign it to the TRUTH_CALLS variable
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2302 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 23:31:46 +00:00
chartl 987fced151 Should read truth data from the parser options rather than direct from args
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2301 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 23:26:26 +00:00
ebanks f7c44ad019 - Read in arguments for the header based on reflection
- Hook up Variation and Genotype in SSG



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2300 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 21:35:33 +00:00
chartl 8825211fdb Adding this to subversion so it's protected
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2299 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 21:26:17 +00:00
rpoplin 12ec154f01 Make the AnalyzeCovariate plots look a little nicer when there are a small number of data points
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2298 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 21:22:40 +00:00
hanna 408f6f3dee Refactoring of prior commit: better handling of unnamed package within the help system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2297 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 20:12:35 +00:00
hanna 1d2151adcf Better handling of nulls output by
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2296 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 19:34:56 +00:00
ebanks 40c2d7a4bc Fix all-bases-mode and genotype-mode in the UG and add integration tests for them.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2295 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 17:41:30 +00:00
ebanks 4e54b91ce4 UG now outputs the FORMAT header fields when there's genotype data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2294 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 16:31:07 +00:00
rpoplin 12c49ea485 Added DuplicateReadFilter to filter out reads that are marked as duplicates.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2293 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 15:42:53 +00:00
ebanks fb900b12e1 VariantFiltration now details the filters it has used in the header of the VCF it produces.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2292 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 15:36:15 +00:00
ebanks 7a76e13459 Better explanation in the exception being thrown.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2291 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 03:59:36 +00:00
ebanks 8d67d9ade3 -Minor fix in UG for all-bases mode
-Make minConfidenceScore in VariantEval a double so non-integer values can be used (requested by Steve H).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2290 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 03:49:10 +00:00
ebanks 8a1c876104 Weird. I thought I had updated these md5s...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2289 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 03:31:41 +00:00
ebanks 717eb1de96 - Depth annotation now includes MQ0 reads
- Removed MQ0 annotation
- Updated RMS MQ annotation to use new pileup
- UG now outputs all of its arguments as key/value pairs in the header (for VCF)
- Cleaned up VCFGenotypeWriterAdapter interface a bit



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2288 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 02:53:00 +00:00
ebanks e8822a3fb4 Stage 3 of Variation refactoring:
We are now VCF3.3 compliant.
(Only a few more stages left.  Sigh.)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2287 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-08 21:43:28 +00:00
hanna 9e2f831206 A bit of cleanup in preparation for Picard patch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2286 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-08 16:09:04 +00:00
hanna d3b78338da Get rid of characters in the docs that aren't universally compatible with
character sets used throughout the group.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2285 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 21:41:07 +00:00
hanna d75d3a361a Clean up some of the walker help output based on additional experience and
feedback received.  Also, add a flag to build.xml to disable generation of
docs on demand (use ant -Ddisable.doc=true to disable docs).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2284 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 21:33:11 +00:00
alecw 2cf21317f9 Create package that contains just what Picard needs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2282 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 21:22:07 +00:00
hanna a3e88c0b1c Cleanup results of bad merge.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2281 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 19:30:49 +00:00
hanna 10be5a5de9 Move some files around to reflect our growing help infrastructure.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2280 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 19:23:12 +00:00
hanna 16ef500139 Tweak the build.sysclasspath option to force the system classpath to always be appended to additional jars added to the classpath by us. These seemed to be set differently depending on the platform or distribution before.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2279 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 19:19:29 +00:00
alecw c9e385f541 Add TileCovariate to GenomeAnalysisTK package
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2277 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 16:39:59 +00:00
rpoplin 1d5b9883db Added --solid_recal_mode argument to experiment with different ways of dealing with solid reference bias. Currently the default option is DO_NOTHING which means use the same behavior as the old recalibrator. Eventually the new methods in RecalDataManager will be moved over to a SolidUtils class. Added transition and transversion methods to BaseUtils that work like simpleComplement, used with the color space in my solid methods. Also, initial check-in of HomopolymerCovariate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2276 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 14:26:27 +00:00
depristo 2632cb6b58 minor improvements to snp selector
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2275 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 03:37:14 +00:00
depristo 8f461d3c40 Critical bug fix for VariantEval dbSNP calculations. Moved the system over to the new improved ROD iterators, resulting in dbSNP rates jumping 5% or so, due to masking of true SNPs by preceding indels.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2274 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 03:36:38 +00:00
hanna 8089aa3c50 Adding support to override the help text.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2273 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 00:16:26 +00:00
ebanks c0528cd88e Updated the CallsetConcordance classes to use new VCF Variation code... and uncovered a whole bunch of VCF bugs in the process. I'm not convinced that I got them all, so I'll unit test like crazy when the refactoring is done.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2272 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 11:43:40 +00:00
ebanks b6f8e33f4c Stage 2 of Variation refactoring:
VCFRecord now implements Variation, VCFGenotypeRecord now implements Genotype.

Because of this change, RodVCF is now just a wrapper around the VCFRecord and does nothing else.  Also, one can call toVariation on the VCFGenotypeRecord and it returns the VCFRecord.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2271 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 06:48:03 +00:00
hanna 3b440e0dbc Add a taglet to allow users to override the display name in command-line help.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2270 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 04:12:10 +00:00
ebanks 08f2214f14 Stage 1 of massive Variation/Genotype refactoring.
This stage consists only of the code originating in the Genotyper and flowing through to the genotype writers.  I haven't finished refactoring the writers and haven't even touched the readers at all.

The major changes here are that
1. Variations which are BackedByGenotypes are now correctly associated with those Genotypes
2. Genotypes which have an associated Variation can actually be associated with it (and then return it when toVariation() is called).

The only integration tests which need to be updated are MSG-related (because the refactoring now made it easy for me to prevent MSG from emitting tri-allelic sites).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2269 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 03:12:41 +00:00
chartl b817db0962 Syzygy has a default LOD score of 0.91 on bases with no coverage, this is problematic. Set the minimum lod threshold to 1 because I just don't want to see that codswallop.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2268 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 23:29:14 +00:00
hanna b04de77952 First pass at a reorganized walker info display. Groups walkers by package
and displays walker data extracted from the JavaDoc.  Needs a bit of help,
both in content and flexibility of package naming.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2267 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 23:24:29 +00:00
depristo 07b88621c5 Improved RankSum calculations and RankSum annotation. Much more meaningful
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2266 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 22:16:40 +00:00
depristo 0753315156 updates to the python snp selector -- now sorts info fields and we stop printing unnecessary debugging info in vcf2table
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2265 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 22:16:02 +00:00
chartl 0f89a38473 forgot to commit this earlier
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2264 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 22:10:16 +00:00
hanna 4c147329a9 Turn javadoc comments for packages and classes into key/value pairs in a properties file. Embed the properties file
in GenomeAnalysisTK.jar.  Still no support for actually displaying the archived javadoc.  Also change the approach 
to providing package javadocs: retired the deprecated package.html file in favor of Java1.5-style package-info.java.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2263 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 20:08:41 +00:00
chartl c1263e841c stop printing the debug info -- hurr
Also it turns out that sometimes there can be a call with zero total non-I/non-D bases -- so add one to numerator and denominator to prevent divide by zero error



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2262 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 16:17:38 +00:00
chartl 0c2d6d7e41 A brute-force script to convert Syzygy lod-score calls files into a proper VCF -- with some useful annotations.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2261 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 16:07:06 +00:00
ebanks 1e8dcc30da -dbSNP rod should not implement VariantBackedByGenotype since dbsnp records have no genotype data
-added code to cache the allele list so it didn't need to get recomputed each time it was requested.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2260 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 14:56:48 +00:00
rpoplin 855face681 Histogram of covariate values now goes from 0 to max value which makes it look nicer in most cases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2259 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 14:44:03 +00:00