Commit Graph

2170 Commits (2a116bb5d68dbbed4d04f86b30a766aa426edaa7)

Author SHA1 Message Date
hanna 0d890e1bf0 Rework Eric's output management code given that the behavior of the UG changes drastically
depending on its output format.  Current implementation is probably a bit overkill-ish and
we can whittle this down to what's absolutely necessary.
Writing VCFs to the 'out' protected printstream may not work at this moment.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2425 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-22 00:33:43 +00:00
ebanks f448a263e9 The cleaner now cleans duplicate reads (instead of ignoring them) - although it doesn't include them for scoring ref or alt consenses
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2424 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 21:01:55 +00:00
ebanks cf303810d3 VCF reader now creates the correct type of header line for each header type
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2423 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 20:39:06 +00:00
ebanks e06dfe44c4 Check for null platform (even when the read group isn't null) and assign it the default platform if it is
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2420 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 07:01:41 +00:00
ebanks 87e5a41964 Fixed a bug that accounted for a bunch of my remaining mis-cleaned indels.
Also, slightly optimized the cleaner by using readBases (instead of readString) and caching cigar element lengths.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2419 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 05:46:16 +00:00
hanna b780ffb34a Add a getFormat() method to get the output format from the writer. The need for
this call suggests that I may be thinking about the typing of the GenotypeWriter object the wrong way.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2418 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 01:46:26 +00:00
hanna 11cbfcec9c Get rid of backlink from ArgumentDefinitions to ArgumentSources. This will help in the future with multiple
source -> single definition mapping sets.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2417 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 00:39:36 +00:00
hanna 9e53c06328 First revision of command-line argument support for GenotypeWriter. Also, fixed the damn build.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2416 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 19:19:23 +00:00
ebanks 4ff61097cf Trivial change: < -> <=
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2415 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 03:35:27 +00:00
ebanks 566b556b50 Give user ability to turn off max allowed interval size
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2414 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 03:20:22 +00:00
ebanks a5f75cbfd4 The previous commit broke the build, so this is a temporary patch to get it to compile. ConcordanceTruthTable should use enums (esp. now that all of the concordance variables need to be public), but VariantEval will need to be rewritten soon anyways so I'll just push it off until then.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2413 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 02:34:41 +00:00
depristo ee8bcdc61d PooledConcordance calculations have been reformatted and bugs fixed. Now properly handles monomorphic sites. Also works with -G option now, correctly
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2412 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-19 23:22:36 +00:00
depristo 9bf2d12c64 Misc. improvements to the LMW code. Support for emitting all sites, regardless of genotype. Min and max quality scores.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2411 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-19 23:20:57 +00:00
aaron 7e0f69dab5 Changed the GLF record to store it's contig name and position in each record instead of in the Reader. Integration tests all stay the same.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2410 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:54:56 +00:00
hanna 80b3eb85fa Fixed curiously epic failure in read-backed pileup: size() mismatched the numReads-numDeletions at that locus in the case where includeReadsWithDeletionsAtLoci == false, causing failures including bad output from pileup walker. Also fixed up ValidatingPileup to run with the new ReadBackedPileup instead of just compiling successfully.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2409 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:52:44 +00:00
rpoplin fdf542c214 The CycleCovariate for 454 data is now the TACG flow cycle. That is, each flow grabs all the T's, A's, C's, and G's in order in a single cycle. This is changed from incrementing the cycle whenever there is a discontinuous nucleotide along the direction of the read.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2408 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:39:51 +00:00
aaron c39675d2c1 VCFTool.java got left off of the last commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2407 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 21:33:53 +00:00
ebanks 4ea31fd949 Pushed header initialization out of the GenotypeWriter constructors and into a writeHeader method, in preparation for parallelization.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2406 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 19:16:41 +00:00
ebanks eeddf0d08e Adding sample utils for convenience methods to pull out samples from e.g. SAMFileHeader or Genotype objects
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2405 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 18:51:21 +00:00
chartl 79b997f43d Minor fix to getValue (thanks Ryan!)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2404 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:45:51 +00:00
aaron 9971a8da9a adding a check to the RodVCF to ensure that records are in-order in the underlying VCF file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2403 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:24:45 +00:00
chartl 38563bbc2d The values used to be integers (-1 for unpaired, 0 for unmapped, 1 for first, 2 for second); but i switched to strings before commit so it was more clear. Forgot to update the OTHER getValue method.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2402 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:05:14 +00:00
chartl 7b5e332ff3 Added - PairedQualityScoreCountsWalker: counts quality scores (e.g. as a histogram) on first reads of a pair and second reads of a pair. Turns out there's a consistent difference in quality scores; even after recalibrating without the pair ordering as a covariate (there's a bit of averaging -- but not as much as I initially thought).
Added - A paired read order covariate to use with recalibration. Currently experimental: for instance, what's a proper pair versus just a pair? Nobody should use this one...



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2401 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:01:01 +00:00
ebanks 4f59bfd513 Updates to the various GenotypeWriters to make them do simple things like write records (plus allow GLFReader to close).
Adding first pass of stub and storage classes for the GenotypeWriters so that UG can be parallelizable.  Not hooked up yet, so UG is unchanged.
The mergeInto() code in the storage class is ugly, but it's all Tribble's fault.  We can clean it up later if this whole thing works.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2400 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 07:20:23 +00:00
ebanks 1cde4161b7 Fixed another test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2399 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 05:05:03 +00:00
ebanks 94f5edb68a 1. Fixed VCFGenotypeRecord bug (it needs to emit fields in the order specified by the GenotypeFormatString)
2. isNoCall() added to Genotype interface so that we can distinguish between ref and no calls (all we had before was isVariant())
3. Added Hardy-Weinberg annotation; still experimental - not working yet so don't use it.
4. Move 'output type' argument out of the UnifiedArgumentCollection and into the UnifiedGenotyper, in preparation for parallelization.
5. Improved some of the UG integration tests.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2398 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 04:14:14 +00:00
jmaguire 98839193b7 compatibility with VCF lib's switch to GenomeLoc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2397 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 00:52:48 +00:00
jmaguire 8787dd4c5e Various and sundry additions to VCF tools. Some useful to the general public, some one-offs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2396 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 00:35:45 +00:00
rpoplin 6fbf77be95 Updating the two solid_recal_mode options to also change the previous base since solid aligner prefers single color mismatch alignments over true SNP alignments. COUNT_AS_MISMATCH mode has been removed completely. The default mode is now SET_Q_ZERO.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2394 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 20:07:26 +00:00
hanna 07f1859290 Added integration test for running the recalibrator with no index.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2393 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 19:10:53 +00:00
ebanks c75ec67f84 When called as a standalone, VariantAnnotator now emits samples in sorted (as opposed to random) order in VCFs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2392 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 19:01:08 +00:00
rpoplin aa86f3710d Updating HomopolymerCovariate to only count the consecutive previous bases. I left in the code but commented out for if somebody wants to worry about carry forward homopolymer problems.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2391 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 18:25:09 +00:00
hanna b863fffdf6 Fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2390 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 17:55:00 +00:00
hanna 9143822822 Fix half-hearted attempt to try to move classes from package to package.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2389 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 17:41:42 +00:00
asivache e6cc7dab26 fixing md5 sum; new version of IndelIntervalWalker does the right thing...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2388 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 01:04:13 +00:00
asivache acb4d477da sync...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2387 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 01:03:01 +00:00
asivache ba86508854 remove debug print command
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2386 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 00:00:01 +00:00
asivache d72d332239 1) changed to search specifically for D and I cigar elements (and to process properly/ignore H,S,P elements) and print out only intervals that encompass actual indels. There's still one interval per read (at most) generated, which is the smallest intervals that covers ALL indels (D or I elements) present in the read; 2) if an interval (thus the original read itself and indels in it) sticks beyond the end of the chromosome, the read is ignored and this interval is NOT printed into the output; instead, a warning is printed to STDOUT (should we send it to logger.warn() instead?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2385 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 23:29:07 +00:00
hanna 5b78354efd Fixed NPE in index check with RefWalkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2384 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 22:37:45 +00:00
hanna e6127cd6c5 Temporary hack for Tim Fennell: introduce a sharding strategy that stuffs all data into a single
shard for cases when the index file isn't available.  Works for the case in question, but is not
guaranteed to work in general.  Will be replaced once the new sharding system comes online.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2383 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:55:42 +00:00
ebanks bef1c50b3b Some cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2382 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:41:06 +00:00
ebanks bb92e31118 Optimizations:
1. push the ReadBackedPileup filtering up into the ReadFilters for read-based filters
2. stop querying the cigar for its length (just do it once)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2381 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:39:58 +00:00
andrewk 36875fca89 Update documentation in the new help system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2380 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:33:12 +00:00
hanna ee47eb4367 Make filters used available to the walker via getToolkit().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2379 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:26:04 +00:00
ebanks b626fc0684 Joint Estimate is now the default calculation model.
Reworked all of the integration tests so that they're now more comprehensive, cover more of what we wan to test, and don't take forever to run.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2376 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 19:41:02 +00:00
ebanks e051311e8c Added convenience methods in RodVCF to pull out all of the VCF data from the VCFRecord (e.g. getID(), getSamples(), getInfoValues())
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2374 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 17:58:41 +00:00
ebanks bb312814a2 UG is now officially in the business of making good SNP calls (as opposed to being hyper-aggressive in its calls and expecting the end-user to filter).
Bad/suspicious bases/reads (high mismatch rate, low MQ, low BQ, bad mates) are now filtered out by default (and not used for the annotations either), although this can all be turned off.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2373 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 17:28:09 +00:00
aaron af440943a4 Fixing a bug that Steven uncovered; we had an abigous contract for peek() in PushbackIterator, and SeekableRODIterator wasn't checking to see if it's PushbackIterator hasNext() was true before calling peek().
Changed peek() to element() to be consistant with the Java standards of the Queue and Stack classes (element() throws an exception if a record isn't available).  

Also updated some of the ROD iterator next() methods to throw NoSuchElementException if next() is called when a record isn't available.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2372 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 23:04:40 +00:00
andrewk 1035abc85f Add minimum base quality thresholding to depth of coverage via getBaseAndMappingFilteredPileup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2371 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 22:58:30 +00:00
sjia 2deae95df9 Updated documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2370 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 21:31:47 +00:00
hanna 555976d575 One more walker with formatting to fix.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2369 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 21:23:13 +00:00
hanna cf46472419 Fix up Sherman's new docs in compliance with javadoc specs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2368 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 21:20:38 +00:00
sjia df79ed8db1 Updated documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2367 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:53:41 +00:00
sjia a80a5f1036 Updated documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2366 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:52:08 +00:00
sjia 18f61d2586 Updated documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2365 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:45:19 +00:00
sjia 5974c42468 Updated documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2364 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:41:35 +00:00
sjia d8cfd707bc Updated documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2363 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:35:18 +00:00
sjia 4322beeb35 Updated documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2362 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:33:38 +00:00
sjia 4148991d81 Now also encodes amino acids, includes documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2361 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:26:56 +00:00
ebanks 9b0bdbbf29 Fix for homopolymer bug: ref was lowercase, alt allele was uppercase, so alt != ref. Yuck.
This is a temporary fix - pushed more elegant solution over to Matt.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2360 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 19:02:23 +00:00
depristo a810586418 Check-in without javadoc = smackdown
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2359 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 15:32:39 +00:00
ebanks b234019cf5 Readded locus printing suppression to DoC walker
(and removed unused import from UG)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2358 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 14:50:56 +00:00
depristo 0d2a761460 Bugfix for minBaseQuality to ignore deletion reads. LocusMismatch walker now allows us to skip every nths eligable site
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2357 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 14:38:39 +00:00
ebanks bf7bab754e Made getPileupWithoutMappingQualityZeroReads() and getPileupWithoutDeletions() more efficient, per Mark's cue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2356 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 04:35:21 +00:00
ebanks 874552ff75 Pull the genotype (and genotype quality) calculation out of the VCF code and into the Genotyper.
[Also, enable Mark's new UG arguments]



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2355 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 04:29:28 +00:00
depristo 2cbc85cc7a min mapping quality and min base quality arguments for UG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2354 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 03:57:27 +00:00
depristo faa638532a Correct location
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2353 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 02:42:21 +00:00
depristo 1da97ebb85 Walker for calculating non-independent base errors, v1. Will be moved to somewhere not in core
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2352 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 02:40:15 +00:00
chartl 1389ac6bdf Hurrr -- this uses power as part of its output. Changes to the power calculation broke the md5s RIGHT AFTER I HAD FIXED THEM arghflrg.
Will fix again.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2351 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 22:42:50 +00:00
chartl b42fc905e8 Added - new tests (Hapmap was re-added)
Modified - Hapmap now takes a -q command to filter out variants by quality
Modified - MathUtils - cumBinomialProbLog now uses BigDecimal to handle some numerical imprecisions
Modified - PowerBelowFrequency - returns 0.0 if called with a negative number (can't be done from inside the walker itself, but since it's called elsewhere one can't be too careful)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2350 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 21:57:20 +00:00
rpoplin 8e44bfd2ef CycleCovariate and PrimerRoundCovariate now correctly handle negative strand 454 and SOLID reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2349 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 21:52:30 +00:00
ebanks c7b23d6ca5 Now that VCFGenotypeRecords implement SampleBacked (as they should), a quick fix was needed to get the GenotypeConcordance working when no direct samples were provided in a samples file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2348 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 04:27:16 +00:00
asivache bd7b07f3f1 added PrimitivePair.Long and a few shortcut utility methods to PrimitivePairs: add(pair), subtract(pair), assignFrom(pair)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2347 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 00:15:44 +00:00
ebanks 97618663ef Refactored and generalized the VCF header info code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2346 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 21:02:45 +00:00
depristo 05b8782d5f Documentation updates. Moved CountX.java walkers to QC
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2345 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 18:40:22 +00:00
depristo 92307361a4 In preparation for move
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2344 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 18:28:06 +00:00
ebanks 45199136f0 Completed my documentation responsibilities - based on Mark's reasonable assignment and not the one Matt made up while on Meth.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2342 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 04:13:30 +00:00
ebanks bd2a46ab4c I want to move over to hpprojects tonight, so I'm checking in various changes all in one go:
1. Initial code for annotating calls with the base mismatch rate within a reference window (still needs analysis).
2. Move error checking code from rodVCF to VCFRecord.
3. More improvements to SNP Genotype callset concordance.
4. Fixed some comments in Variation/Genotype



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2341 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 02:52:18 +00:00
kiran 2748eb60e1 Added short documentation for each class so that it appears in the walker command-line documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2340 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-12 21:41:07 +00:00
rpoplin 78e94b5a84 TableRecalibration now puts the full list of walker arguments into the PG tag of the bam file it creates. Thanks Matt and Eric. Also, the default nback for the HomopolymerCovariate is 8, down from 10.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2339 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-12 17:29:41 +00:00
rpoplin 014013630f Added hieracrchy to the covariate classes: Required, Standard, and Experimental. Required covariates (rg and reported quality) are added for the user whether or not they are specified in the -cov list. There is now a -standard option in CountCovariates which will add in all of the standard covariates so the user doesn't have to type them all out or even know which ones are the standard. There is logger output to say which covariates are being used of course. The list of covariates used is also added to the PG tag in the bam file produced by TableRecalibration.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2338 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-12 16:34:05 +00:00
hanna 6955b5bf53 Cleanup of the doc system, and introduce Kiran's concept of a detailed summary
below the specific command-line arguments for the walker.  Also introduced
@help.summary to override summary descriptions if required.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2337 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-12 04:04:37 +00:00
hanna cdfe204d19 Incorporated feedback from Kiran. Use the Javadoc first sentence extraction capability to just show the first sentence from each line of Javadoc. @help.description can still be used to produce exceptionally verbose descriptions.
Also increased the line width as much as I could tolerate (100 characters -> 120 characters).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2336 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 21:59:55 +00:00
rpoplin 4fa4e95fbc Updated AnalyzeCovariates to extend org.broadinstitute.sting.utils.cmdLine.CommandLineProgram and use the standard argument parsing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2335 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 21:57:18 +00:00
kiran 38d9f7b903 Renamed ReferenceContext's getSimpleBase() method to getBaseIndex()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2334 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 20:14:39 +00:00
aaron 09811b9f34 Now that we always output the VCF header, make sure that we correctly handle the situation where there are no records in the file. Added unit tests as well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2333 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 19:51:05 +00:00
hanna 0da2105e3c Moving DuplicateQualsWalker to oneoffprojects.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2332 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 19:22:32 +00:00
rpoplin 60c3eb4b60 Added help.description to the recalibration walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2331 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 19:02:29 +00:00
ebanks 2ea7632b76 The SNP genotype concordance module is now more comprehensive.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2330 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 18:34:33 +00:00
hanna 590aeee7d2 Documentation for more basic walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2329 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 18:15:40 +00:00
hanna d1815f3559 More documentation for walkers that I'm familiar with in the collection of core walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2328 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 18:02:33 +00:00
hanna 956c36a2c8 Help for the qc package.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2327 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 17:32:47 +00:00
hanna 450ea233a5 Docs for the basic walkers: CountLoci, CountReads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2326 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 17:17:34 +00:00
hanna f97ac939fa Punch up the help documentation for CombineDuplicates.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2325 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 17:09:35 +00:00
aaron 86dc98bfb5 update the documentation for CombineDuplicates for the new help system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2324 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 17:01:42 +00:00
aaron 420725441a documentation updates for the new help system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2323 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 16:15:44 +00:00
hanna 23d96b1d43 Help system content for the alignment module.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2322 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 16:01:25 +00:00
ebanks 2de7e1a178 Move VariantAnnotator over to use a StratifiedAlignmentContext split by sample.
The only major difference is that we are now able to get accurate allele balance ratios.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2321 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 05:28:28 +00:00
depristo 8f7554d44f A few improvements to pooled concordance calcluations. Now will show you FN with the -V option. BasicGenotype now prints out a reasonable representaiton wiwth toString
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2320 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 23:09:10 +00:00
aaron f64a4c66ac some tweaks for the GATK paper genotyper to better work with shared memory parallelization, added documentation changes for Matt's new help system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2319 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 22:33:51 +00:00
andrewk a7cd172628 Added 8x coverage field and minimum base quality command line option in order to be able to compare to U. Wash. exome metrics.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2318 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 22:14:44 +00:00
ebanks 2869270c11 Fixed deletion depth calculation plus mis-spelling in ReadBackedPileup method.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2315 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 21:11:42 +00:00
ebanks 31b1d60d28 Generalized the StratifiedAlignmentContext code so that it's easy to add new ways to stratify. Then added an MQ0-free stratification so we don't need to be carrying around 2 different alignment contexts (full vs. mq0-free) anymore.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2314 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 19:50:06 +00:00
hanna 0c396f04a2 Fix obvious cut/paste error in output stream management code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2313 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 19:23:13 +00:00
ebanks 11ac7885b0 Pull out StratifiedAlignmentContext code so other walkers can use it.
This is basically a wrapper class around AlignmentContext which allows you to stratify a context by e.g. reads on forward vs. reverse strands.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2312 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 19:21:16 +00:00
hanna adb2fdbee7 Before, we were only checking that the reference was present if @Requires required that a reference was present. Now we always check that a reference is present, so that we get an intelligent error message.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2311 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 19:15:48 +00:00
hanna 5eac510b2f Refactor the code I gave Eric yesterday to output command line arguments.
Convert it from a completely wonky solution to a slightly less wonky solution
that will work in more cases.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2310 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 18:57:54 +00:00
hanna 74b8055b6a Only show extra walker help if the user didn't specify a walker or specified
an invalid walker.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2309 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 16:43:06 +00:00
ebanks e6f541fdca Forgot to update integration test last night
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2308 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 12:57:10 +00:00
ebanks 0fae798b3a 1. Discoverable base calculations don't care about Genotypes (use Variation's PError regardless of whether the call is ref or var - it's the correct value even for ref calls).
2. Call a base genotypable if any of the Genotypes is above the threshold (you can't assume there's a single Genotype associated with the Variation).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2306 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 04:26:06 +00:00
ebanks a45adadf1f VCFGenotypeRecord already defines all the methods needed to be SampleBacked, so let's annotate it as being SampleBacked. This way, when used as a generic Genotype, sample data can be retrieved.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2305 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 04:16:21 +00:00
ebanks 78d5ac9bc2 Don't check het count when there are multiple Genotypes per Variation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2304 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 04:07:47 +00:00
ebanks ee691b8899 Added a whole bunch of unit tests for VCF reading.
We could still use more, but this is a good start.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2303 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-10 03:31:23 +00:00
ebanks f7c44ad019 - Read in arguments for the header based on reflection
- Hook up Variation and Genotype in SSG



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2300 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 21:35:33 +00:00
hanna 408f6f3dee Refactoring of prior commit: better handling of unnamed package within the help system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2297 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 20:12:35 +00:00
hanna 1d2151adcf Better handling of nulls output by
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2296 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 19:34:56 +00:00
ebanks 40c2d7a4bc Fix all-bases-mode and genotype-mode in the UG and add integration tests for them.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2295 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 17:41:30 +00:00
ebanks 4e54b91ce4 UG now outputs the FORMAT header fields when there's genotype data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2294 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 16:31:07 +00:00
rpoplin 12c49ea485 Added DuplicateReadFilter to filter out reads that are marked as duplicates.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2293 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 15:42:53 +00:00
ebanks fb900b12e1 VariantFiltration now details the filters it has used in the header of the VCF it produces.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2292 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 15:36:15 +00:00
ebanks 7a76e13459 Better explanation in the exception being thrown.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2291 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 03:59:36 +00:00
ebanks 8d67d9ade3 -Minor fix in UG for all-bases mode
-Make minConfidenceScore in VariantEval a double so non-integer values can be used (requested by Steve H).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2290 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 03:49:10 +00:00
ebanks 8a1c876104 Weird. I thought I had updated these md5s...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2289 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 03:31:41 +00:00
ebanks 717eb1de96 - Depth annotation now includes MQ0 reads
- Removed MQ0 annotation
- Updated RMS MQ annotation to use new pileup
- UG now outputs all of its arguments as key/value pairs in the header (for VCF)
- Cleaned up VCFGenotypeWriterAdapter interface a bit



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2288 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-09 02:53:00 +00:00
ebanks e8822a3fb4 Stage 3 of Variation refactoring:
We are now VCF3.3 compliant.
(Only a few more stages left.  Sigh.)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2287 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-08 21:43:28 +00:00
hanna 9e2f831206 A bit of cleanup in preparation for Picard patch.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2286 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-08 16:09:04 +00:00
hanna d3b78338da Get rid of characters in the docs that aren't universally compatible with
character sets used throughout the group.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2285 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 21:41:07 +00:00
hanna d75d3a361a Clean up some of the walker help output based on additional experience and
feedback received.  Also, add a flag to build.xml to disable generation of
docs on demand (use ant -Ddisable.doc=true to disable docs).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2284 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 21:33:11 +00:00
hanna a3e88c0b1c Cleanup results of bad merge.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2281 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 19:30:49 +00:00
hanna 10be5a5de9 Move some files around to reflect our growing help infrastructure.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2280 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 19:23:12 +00:00
rpoplin 1d5b9883db Added --solid_recal_mode argument to experiment with different ways of dealing with solid reference bias. Currently the default option is DO_NOTHING which means use the same behavior as the old recalibrator. Eventually the new methods in RecalDataManager will be moved over to a SolidUtils class. Added transition and transversion methods to BaseUtils that work like simpleComplement, used with the color space in my solid methods. Also, initial check-in of HomopolymerCovariate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2276 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 14:26:27 +00:00
depristo 8f461d3c40 Critical bug fix for VariantEval dbSNP calculations. Moved the system over to the new improved ROD iterators, resulting in dbSNP rates jumping 5% or so, due to masking of true SNPs by preceding indels.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2274 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 03:36:38 +00:00
hanna 8089aa3c50 Adding support to override the help text.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2273 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-07 00:16:26 +00:00
ebanks c0528cd88e Updated the CallsetConcordance classes to use new VCF Variation code... and uncovered a whole bunch of VCF bugs in the process. I'm not convinced that I got them all, so I'll unit test like crazy when the refactoring is done.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2272 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 11:43:40 +00:00
ebanks b6f8e33f4c Stage 2 of Variation refactoring:
VCFRecord now implements Variation, VCFGenotypeRecord now implements Genotype.

Because of this change, RodVCF is now just a wrapper around the VCFRecord and does nothing else.  Also, one can call toVariation on the VCFGenotypeRecord and it returns the VCFRecord.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2271 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 06:48:03 +00:00
hanna 3b440e0dbc Add a taglet to allow users to override the display name in command-line help.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2270 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 04:12:10 +00:00
ebanks 08f2214f14 Stage 1 of massive Variation/Genotype refactoring.
This stage consists only of the code originating in the Genotyper and flowing through to the genotype writers.  I haven't finished refactoring the writers and haven't even touched the readers at all.

The major changes here are that
1. Variations which are BackedByGenotypes are now correctly associated with those Genotypes
2. Genotypes which have an associated Variation can actually be associated with it (and then return it when toVariation() is called).

The only integration tests which need to be updated are MSG-related (because the refactoring now made it easy for me to prevent MSG from emitting tri-allelic sites).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2269 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-06 03:12:41 +00:00
hanna b04de77952 First pass at a reorganized walker info display. Groups walkers by package
and displays walker data extracted from the JavaDoc.  Needs a bit of help,
both in content and flexibility of package naming.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2267 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 23:24:29 +00:00
depristo 07b88621c5 Improved RankSum calculations and RankSum annotation. Much more meaningful
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2266 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 22:16:40 +00:00
hanna 4c147329a9 Turn javadoc comments for packages and classes into key/value pairs in a properties file. Embed the properties file
in GenomeAnalysisTK.jar.  Still no support for actually displaying the archived javadoc.  Also change the approach 
to providing package javadocs: retired the deprecated package.html file in favor of Java1.5-style package-info.java.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2263 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 20:08:41 +00:00
ebanks 1e8dcc30da -dbSNP rod should not implement VariantBackedByGenotype since dbsnp records have no genotype data
-added code to cache the allele list so it didn't need to get recomputed each time it was requested.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2260 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 14:56:48 +00:00
ebanks 58937bf9ba You can now use the -exp flag to tell the Genotyper to include experimental annotations when it calls out to VariantAnnotator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2256 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 04:45:05 +00:00
ebanks b05e73a914 Finished implementation of the Wilcoxon Rank Sum Test thanks to Tim Fennell (calculating the normal approximation) and Nick Patterson (dithering to break tie bands).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2255 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 04:04:39 +00:00
ebanks 861221d046 - Moved various header line printing into a single method
- Fixed output for coverage above min depth



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2254 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-04 02:15:43 +00:00
ebanks aef4be5610 Moved CoarseCoverageWalker to core and packaged both coverage walkers in coverage/
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2249 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:53:36 +00:00
ebanks df4e001a07 Renamed to more accurately describe its function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2248 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:34:49 +00:00
ebanks c2017cc91b PrintCoverageWalker functionality moved to DepthOfCoverageWalker. Added integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2247 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:23:59 +00:00
ebanks 01cf5cc741 1. Merged CoverageHistogram into DepthOfCoverageWalker
2. Fixed bug in histogram calculation for small intervals
3. Better output in DoCWalker
4. Comments added to code



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2245 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 17:01:53 +00:00
ebanks 44b9f60735 PercentOfBasesCovered functionality moved to DepthOfCoverageWalker. Added integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2244 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 16:11:09 +00:00
ebanks 126d1eca35 Move to core (qc/)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2243 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 15:45:58 +00:00
ebanks 9da5cc25ad More archiving (with permission from Andrey) plus a move to core.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2242 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 15:40:27 +00:00
aaron b3bdcd0e60 make sure we close the error log stream in CommandLineProgram if it's opened; unit tests and clean-up for BasicVariation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2241 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 06:59:27 +00:00
ebanks a88202c3f6 Refactored DoCWalker to output in a more helpful and usable style. It now outputs in tabular format with 2 different sections: per locus and then per interval.
I am now at a point where I can merge the functionality from other coverage walkers into this one.
Thanks to Andrew for input.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2239 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 05:28:21 +00:00
ebanks d7e4cd4c82 Moving some useful and stable walkers to core:
- ClipReads
- PrintRODs (generalized to print all RODs that are Variations)
- FixBAMSortOrderTag (added documentation to walker so that people know what it does and why)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2238 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-03 03:00:45 +00:00
rpoplin 46f3d3e39b Added comments to AnalyzeCovariates and R scripts. R script prevents residuals from going off the edge of the plot. Added skeleton code to the recalibration walkers showing how we plan to handle SOLID reference inserting behavior.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2233 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 23:15:52 +00:00
aaron 451a20ed55 commenting out some broken integration tests, to be uncommented if needed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2232 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 23:13:24 +00:00
depristo c776f9fb90 Simple utilities for dealing with Complete Genomics data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2230 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 22:51:41 +00:00
aaron 9d598f1c82 some integration test clean-up
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2229 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 21:11:02 +00:00
ebanks a09fee2b5e Moved some more walkers to oneoffprojects and killed an old indel-related walker that isn't being used.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2228 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:28:07 +00:00
depristo dec0a781c2 Un-reinventing the wheel. --sleep argument removed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2227 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:19:28 +00:00
ebanks a3343c75db Move and rename a hybrid-selection-specific coverage calculation to hybridselection/
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2225 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:11:22 +00:00
ebanks 2c83f2f2bc Move MSG - plus now obsolete classes which it depends on -- to oneoffprojects (with permission from Jared).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2224 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 20:04:22 +00:00
chartl 6a9e7bea05 Removing experimental annotations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2220 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 19:03:55 +00:00
jmaguire c180a76b05 Added option "append": if set, and the specified discovery output already exists, don't re-call anything that's already present in that file. Append new calls to it.
Great for resuming long jobs that died partway through.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2219 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 18:56:19 +00:00
ebanks 0a2304eff8 - Rename minConfidenceScore in VariantEval to minPhredConfidenceScore
- Moved validation walkers to new qc dir
- Killed unused test



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2218 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 17:59:19 +00:00
ebanks a5dfc9107d - Cleaned up annotation code some more
- Use QualityUtils when phred-scaling now



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2217 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 17:45:29 +00:00
ebanks 7055a3ea2d - All annotations are now required to return their VCF INFO keys and descriptions
- Renamed keys to fit with the standard naming
- FisherStrand is no longer standard
- Integration tests no longer test experimental annotations since they're not stable



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2216 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 17:24:06 +00:00
rpoplin 67179e2412 Initial checkin of AnalyzeCovariates.java which replaces analyzeRecalQuals_1KG.py and is updated to use the new Covariates system. It creates similar plots of residual error for each covariate that was used in the calculation. There is also an option to filter out base qualities below a given threshold.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2215 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 16:47:35 +00:00
ebanks 2838629724 -VCF writer now checks whether the allele frequency has been set before trying to write it out.
-Renamed methods to be more consistent.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2214 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 16:25:32 +00:00
depristo 6231637615 fixes for VariantAnnotations and second bases. Misc. removal of failing (and unstable) integration tests that require rereview
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2213 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 15:41:35 +00:00
aaron d487428468 remove incorrect parentheses
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2211 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 06:46:32 +00:00
chartl 886c44303a -Removing BTTJ integration test -- this broke a few revisions ago (2169) and it is unclear whether the resulting change was a correction to something that had previously been incorrect, or a true build-breaker. I'm currently investigating which case this is, but since Bamboo is back up I'm removing this _temporarily_ so that other testing can occur, and will make whatever changes to the test necessary to reflect the truth, then replace the test itself. Additional (and related) pileup tests are upcoming as well.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2210 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 05:37:15 +00:00
ebanks b979bd2ced - Optimized implementation of -byReadGroup in DoCWalker
- Added implementation of -bySample in DoCWalker
- Removed CoverageBySample and added a watered down version to the examples directory



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2209 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 03:39:24 +00:00
ebanks 7c73496e72 Moved DoC walker over to new pileup system so it no longer moves like it's stuck in molasses.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2208 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 02:46:39 +00:00
ebanks ba8a8febc6 Thanks to Steve Hershman for finding this bug:
getNegLog10PError() does not equal the confidence score (you need to multiply by 10 as confidence is traditionally phred scaled).  Probably we should change the method to be getNeg10Log10PError().  Anyone have strong feelings on this?



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2207 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-02 01:59:03 +00:00
ebanks 3303808a8f Yet more walkers moved to oneoffprojects.
Made hybridselection subdir in playground.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2205 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 21:29:12 +00:00
ebanks 05923f7fba Started transition to oneoffprojects.
Moved/killed a few other walkers (with permission).



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2204 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 21:19:02 +00:00
ebanks c36069355e Trivial change to verbose
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2203 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 20:48:10 +00:00
jmaguire 74f6526e09 VCFHomogenizer: A class that extends InputStream and dynamically re-writes pilot1 VCF's to be on-spec.
VCFTool: A command-line tool with various useful VCF functions (validate, grep, concordance).




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2202 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 17:55:42 +00:00
jmaguire adf8f1f8b3 Add an InputStream constructor, which is immensely useful for various reasons.
Also a minor performance optimization.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2201 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 17:25:00 +00:00
ebanks e581cceab6 Got Kris's permission to delete these walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2200 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 16:57:28 +00:00
rpoplin 3180fffd43 Eliminated unnecessary boxing of longs in RecalDatum. Changes to RecalDatum in preparation for new AnalyzeCovariates script. Updated TableRecalibrationWalker to make use of these changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2199 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 16:49:05 +00:00
chartl 21a9a717e4 Some minor changes and test:
- DepthOfCoverage is now by reference (so locus-by-locus output correctly reports zero-coverage bases)
  - VariantsToVCF now lets you bind variants with any string except intervals and dbsnp (not just NA######)
  - A PileupWalker integration test on a particularly nasty FHS site
  - Two second-base annotation related integration tests on that same site
       + outputs were all hand-validated in matlab; within a certain tolerance for the annotations




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2197 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 15:15:54 +00:00
ebanks 084337087e Removing deprecated code and walkers for which I had the green light from repository.
Moved piecemealannotator and secondarybases to archive.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2195 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 05:58:20 +00:00
ebanks 2c16c18a04 Move Andrey's old indel code (plus MSG accuracy test, which depends on it) to archive.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2194 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 05:29:00 +00:00
ebanks 7c6c490652 An unfinished implementation of the Wilcoxon rank sum test and a variant annotation that uses it. I need to merge and update this code with Tim's implementation somehow - but that won't happen until later this week, so I'm committing this before I accidentally blow it away.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2193 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 04:56:17 +00:00
ebanks 00f15ea909 Improved performance of deletion-free pileup and added mapping-quality-zero-free pileup convenience method.
Finished converting genotyper and annotator code to new ReadBackedPileup system.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2192 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 04:50:47 +00:00
rpoplin 6bb864da2a More misc cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2191 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 22:29:07 +00:00
rpoplin b89b9adb2c misc code cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2190 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 21:16:00 +00:00
depristo e793e62fc9 minor code cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2189 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 20:57:20 +00:00
rpoplin 4969cb1957 CountCovariates uses new optimized ReadBackedPileup. It also smarter about re-doing calculations for the dnsnp variation rate sanity check.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2188 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 20:35:40 +00:00
ebanks add2fa7ab4 more use of new ReadBackedPileup optimizations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2187 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 20:04:01 +00:00
rpoplin 817e2cb8c5 Recalibrator makes use of the new GATKSAMRecord wrapper and now no longer has to hash the SAMRecord. Covariate's getValue method signature has changed to take the SAMRecord instead of the ReadHashDatum. ReadHashDatum removed completely.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2185 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 19:59:17 +00:00
ebanks e9a8156cfb Use new optimized ReadBackedPileup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2184 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 18:17:18 +00:00
rpoplin d8146ab23d Changed the format of the recalibration csv file slightly so that it is easier to load the file into something like R and look at the values of the covariates.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2183 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 17:55:23 +00:00
ebanks a184d28ce9 Completing the optimization started by Matt: we now wrap SAMRecords and SAMReadGroupRecords with our own versions which cache oft-used variables (e.g. platform, readString, strand flag). All walkers automagically get this speedup since the wrapping occurs in the engine.
I note that all integration/unit tests pass except for BaseTransitionTableCalculatorJava, which is already broken.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2182 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 17:39:29 +00:00
depristo af22ca1b47 Bug fixes for VariantEval. dbCoverage now reports dbSNP rate, not some wierd eval_snps_in_db as before. We now separate non-indel and non-snp db sites in dbcoverage. Some dbSNP records don't fit into these two categories. Also fixed a consistency issue where novel / known sites where being determined solely by whether dbSNP had a record there, rather than the stricter dbcoverage screen for isSNP().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2180 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 01:39:01 +00:00
chartl 27651d8dc2 Oops. numReads is now called size
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2175 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 06:59:17 +00:00
chartl 21744e024b Quick walker that determines % of bases covered at (user - defined depth)x . I've been maintaining it in my directories alone, but now that i've accidentally deleted it twice, into playground it goes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2174 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 06:51:19 +00:00
hanna 3300ca906a An iterator for Eric to use when injecting his new wrapping reads -- a stopgap solution for getting additional caching
functionality into a SAMRecord.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2173 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-27 22:25:52 +00:00