Commit Graph

2186 Commits (ac4756db207c34cb65e2ce7d46a6aaa6e28e103e)

Author SHA1 Message Date
hanna 29c129aced Added very primitive read fishing walker with lots of hard coding. Fixed
bugs encountered when testing read fishing in Ecoli.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2496 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-04 00:54:57 +00:00
ebanks 7b702b086f You don't need to be bi-allelic to have a non-ref alt allele frequnecy, but you do have to be a variant.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2495 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-03 22:02:39 +00:00
ebanks b668d32cf1 Updated the min mapping quality and min base quality defaults to be 10 in both cases (and updated all integration tests) as suggested by Mark.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2494 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-03 21:31:04 +00:00
hanna b6ecc9e151 Support for ad-hoc reference sequences. Also reenabled BWA/Java integration test, which was commented out
and the data backing it up deleted without my knowledge.  Unfortunately, since the data was deleted, I had
to regenerate the data and a new md5.  Hopefully the aligner output is still correct.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2493 348d0f76-0448-11de-a6fe-93d51630548a
2010-01-02 20:19:14 +00:00
asivache ad549eacfd Now that we changed how deletions are represented, got to update MD5...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2491 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 22:00:58 +00:00
asivache 46362ce532 In extended event lines, now prints deletions in verbose format as well (e.g. "-AAT")
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2490 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:57:20 +00:00
asivache a18e31f5b8 If alignment context at the locus holds extended event, get rod metadata and (importantly) reference bases for the whole span of the event (if it is a deletion that is, insertions still have length 0 on the ref!)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2489 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:56:25 +00:00
asivache a41cb0701b Now can generate verbose String representation of deletions (e.g. "-AAT") if reference bases are provided as an argument to getEventStringWithCounts().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2488 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:54:50 +00:00
asivache 89791d730e Compute and cache the length of the longest deletion observed at the site; ReadBackedExtendedEventPileup now has a getter to access that value.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2487 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:19:39 +00:00
asivache 9c41ac252f Disable testSingleBPFailure - getReferenceContext() now whould agree to accept length > 1 genome locs as its argument, so there's nothing to test...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2486 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 21:12:00 +00:00
asivache 8932e67325 Removed sanity check that required GenomeLoc argument to be strictly 1-base long. We need to relax this in order to be able to pass around a reference context containing full-length chunk of deleted reference bases
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2485 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 20:14:08 +00:00
hanna 497ae700c4 A rethink of the existing BAM block extraction code: rather than working in
chunk space directly, stream data in block space, converting to chunk space
on demand.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2484 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 18:19:51 +00:00
rpoplin 80658fd99e AnalyzeCovariates gets the same performance improvements as the recalibrator. NHashMap class is removed completely.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2483 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 18:10:10 +00:00
rpoplin 9b2733a54a Misc clean up in the recalibrator related to the nested hash map implementation. CountCovariates no longer creates the full flattened set of keys and iterates over them. The output csv file is in sorted order by default now but there is a new option -unsorted which can be used to save a little bit of run time.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2482 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-30 16:58:04 +00:00
asivache 4aeb50c87d Added: integration test for extended pileup (with indels included)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2481 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 23:02:23 +00:00
asivache c928347c0c Extended event pileups are more verbose now: following a sequence of 'D','I', and '.' symbols, actual distinct events are listed along with their counts (example: +AAA:3,+AAC:1 for the total of 4 indel observations with 3 reads showing +AAA and one read showing +AAC)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2480 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 22:44:18 +00:00
asivache 8330058216 method added: getEventStringsWithCounts()
Returns list of Pairs <String,Integer>, where each pair consists of a unique indel event observed at the site and the total number of observations of that event. String representation for insertions is verbose (e.g. +ACT), while deletions are represented as "5D" (since read backed pileup has no reference information, so we can not get actual sequence of deleted bases)

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2479 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 22:41:58 +00:00
asivache cf3e59eb4a back to archive
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2478 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 22:00:38 +00:00
asivache 295d16572e synch; will go back to archive in a sec
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2477 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 22:00:03 +00:00
asivache e286313b67 Fix for reads that have insertion as their last (mapped) cigar elements (i.e. not followed by M)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2476 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 21:13:16 +00:00
hanna 05deb8796b Simplify handling of reference sequence for unmapped reads. Improvement made based on a suggestion from Alec.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2475 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 21:06:20 +00:00
rpoplin 96c4929b3c Recalibrator now uses NestedHashMap instead of NHashMap. The keys are now nested hash maps instead of Lists of Comparables. These results in a big speed up (thanks Tim!). There is still a little bit of clean up to do, but everything works now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2474 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 21:01:32 +00:00
depristo 7826e144a1 forgot to update md5s
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2473 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:31:29 +00:00
asivache bfd6bf9ec5 PileupWalker just got a new option: --showIndelPileups. When this option is used, two lines are printed for every genomic location that has indels associated with it: first line is a conventional base pileup, the second line is an "extended event" (indel) pileup. The refence base in that second line is always set to "E" (for Extended), and the pileup string contains I,D,. symbols for insertion, deletion, noevent, respectively. Only this simple short format for indel pileups is implemented so far.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2472 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:16:34 +00:00
asivache 9652692019 Modified to enable locus traversals firing additional calls to walker's map() with alignment context filled with extended events (indels). Walker should override generateExtendedEvents() to return true, and it should make sure that it catches those additional indel pileups and processes them differently, as needed. If there are indels associated with a specific reference base, TWO map() calls will be issued in locus traversal at that location: first one will have a context filled with a regular base pileup, the second call will provide the context filled with indel pileup (pileup elements will have insertion, deletion, or noevent type associated with them and will also carry information about the full length of the event and inserted bases).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2471 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:13:25 +00:00
asivache 06eb576924 Can now be constructed with either base pileup or extended event (indel) pileup; has query methods checking what kind of pileup is served by the context, and getter methods return the appropriate pileup. TODO: while it is impossible right now to create a context that contains both types of pileups simultaneously, this restriction is only weakly enforced through the lack of appropriate constructor. Either we keep it this way, or some getters may become ambiguous and have to be fixed!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2470 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:07:29 +00:00
asivache f445745c56 Pileup element and corresponding container class tweaked for representing pileups of extended events (indels) at a given locus. There's some redundancy with PileupElement and ReadBackedPileup (should we rename them to BasePileupElement and ReadBackedBasePileup?), so that abstracting a basic interface/abstract base from these classes can be considered in the future
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2469 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 20:03:39 +00:00
depristo 87e863b48d Removed used routines in duputils; duplicatequals to archive; docs for new duplicate traversal code; general code cleanup; bug fixes for combineduplicates; integration tests for combine duplicates walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2468 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 19:46:29 +00:00
depristo 29f94119d1 Fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2466 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 18:08:41 +00:00
ebanks 5fdf17fccb Removed the VCF "NS" annotation (which wasn't working for pooled calls anyways) since it's ambiguous and not useful.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2465 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 17:30:47 +00:00
hanna e32174fbc4 UnifiedGenotyper now works without -varout or -vf set.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2464 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 16:46:24 +00:00
hanna b125571a98 Intermediate check in: transfer responsibility of wrapping the GenotypeWriter around the output stream to the output
management code.  Currently, will not work when neither -varout nor -vf are specified, but should work in all other
cases.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2463 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 16:11:11 +00:00
ebanks aeb34758e6 Adding a validation stringency to the VCF writers (which defaults to STRICT). If set to SILENT, it will not throw an exception for (reasonable) off-spec requests but will instead ignore such requests and silently move on.
This change allows the pooled calculation model to work correctly with multiple threads.  Boys, the Genotyper is now officially parallelized.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2462 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 15:33:53 +00:00
rpoplin 29a3d9b47a AnalyzeCovariates also has to skip over NO_DINUC
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2461 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 14:36:05 +00:00
aaron a34c2442c0 moved hard-coded file paths to the oneKGLocation, validationDataLocation, and seqLocation variables setup in the BaseTest.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2460 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-29 07:40:48 +00:00
depristo 9d263b2565 Integration tests for count duplicates walker validated on a TCGA hybrid capture lane.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2459 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 23:57:25 +00:00
depristo fcc80e8632 Completely rewritten duplicate traversal, more free of bugs, with integration tests for count duplicates walker validated on a TCGA hybrid capture lane.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2458 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 23:56:49 +00:00
hanna 4617052b3c For Alec, and others at the Broad who want to run our unit/integration tests off of gsa1/gsa2: put a ceiling on the amount of memory that integration tests can use. Reduce the memory footprint of the fasta reader test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2457 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 23:42:46 +00:00
alecw b5e5e27225 New versions of picard-private, sam and picard jars for TileCovariate and regeneration of NM tag
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2456 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 22:18:55 +00:00
hanna d4ee999ef9 Creates files supplemental to the reference sequence, consumed by BWA.
ANN - Alternate form of the sequence dictionary.  Should be created from a sequence dictionary with full contig names.
AMB - A map of 'holes' in the genome, aka runs of non-ACGTacgt bases.  This skeletal implementation always reports no
      holes.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2455 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 21:40:44 +00:00
rpoplin fcc52fbcd1 Fixed the build. Added missing import line.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2454 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 21:26:00 +00:00
ebanks 893c9c85fa Added previous optimization to diploid (non-pool) model and shaved off 20% of runtime from it. Moved out some common functionality to joint estimate parent class.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2453 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 21:20:48 +00:00
rpoplin 92e3682991 Moved NHashMap to sting/utils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2452 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 20:57:32 +00:00
rpoplin 562db45fa5 Sites that were marked NO_DINUC no longer get dinuc-corrected but are still recalibrated using the other available covariates. Solid cycle is now the same as Illumina cycle pending an analysis that looks at the effect of PrimerRoundCovariate. Solid color space methods cleaned up to reduce number of calls to read.getAttribute(). Polished NHashMap sort method in preparation for move to core/utils. Added additional plots in AnalyzeCovariates to look at reported quality as a function of the covariate.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2451 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 20:19:37 +00:00
asivache 2a704e83df Reads now have new traversal flag: generateExtendedEvents(). Support added to GenomeAnalysisEngine and Walker. This is a silent and transparent framework change that no existing code is going to see. The actual code that makes use of the new flag (which is false by default) will be committed separately...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2450 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 19:52:44 +00:00
ebanks c8d0e6e004 Optimization to pooled calculation model: stop calculating P(D|AF) if we are beyond the max likelihood such that subsequent likelihoods won't factor into the confidence score. Also, use new Pileup interface.
Pooled calling now takes less than half the time it used to.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2449 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 18:39:55 +00:00
ebanks b1ac4b81d5 Optimization: look up diploid genotypes from a static matrix instead of creating them on the fly (with String.format); bases no longer need to be ordered appropriately
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2448 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 17:28:51 +00:00
andrewk 57516582c2 Converter from HapMap chip genotype data to VCF added; HapMapGenotypeROD adjusted to not convert from Hg18 to b36 formatting of contigs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2447 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-28 01:36:08 +00:00
ebanks d2770f380c Writing calls to standard out now works again (it got broken when we introduced parallelization)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2446 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-27 04:36:45 +00:00
ebanks 12990c5e7a Added qual-by-depth annotation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2445 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-25 02:30:30 +00:00
ebanks 0571d9dcb9 Point MAX_QUAL_SCORE to SAMUtils.MAX_PHRED_SCORE.
Also, array size for caches should be max score + 1.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2444 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 20:47:32 +00:00
ebanks 438d21842a The new recalibrator had been mimicking the behavior of the old one in that if there was no dinuc available (following a no-call base or at either end of a read), it didn't try to recalibrate. Now that Ryan has modularized the system, we no longer need to skip the base completely (we just need to skip the dinuc value)... which is good because the Picard people complained after realizing that cycle #1 never got recalibrated.
The major effects of this commit are as follows:
1. We no longer skip any good bases (of course, this change alone breaks every single integration test).
2. The dinuc covariate returns a "no dinuc" value for the first base of a read (but not for the last base anymore, since there is a valid dinuc) or if the previous base is a bad base (e.g. 'N').

I've done a bunch of testing on real data and everything looks right; however, let's wait until the recalibrator guru gets back from vacation next week and can double-check everything before shipping this out in another early access release.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2443 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 20:41:29 +00:00
ebanks aaf674d9db Cleaned up this annotation.
Still experimental.  As of now, it's not useful.  More analysis is needed to determine how to handle cases where UG is unsure whether a sample is het or hom.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2442 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 03:06:46 +00:00
ebanks 6df40876a3 Un-reverted Matt's previous changes and fixed integration tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2441 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 02:47:00 +00:00
hanna 2bd0b1bbf7 After further review, it's unclear that my patch in RecalDataManager was the right choice. Reverting.
Also updating other IntervalCleanerIntegrationTest failures that were masked by my first patch.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2440 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-24 00:32:33 +00:00
hanna 98c268483e Fixed issues with the integration tests:
1) sam-jdk apparently no longer supports custom tags with type int[] values.
2) BAM output for indel cleaner integration test changed in a way that's so subtle it can't be seen after converting the output to .sam.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2439 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 23:12:22 +00:00
aaron b134e0052f added changes to the code to allow different types of interval merging,
1: all overlapping and abutting intervals merged (ALL), 
2: just overlapping, not abutting intervals (OVERLAPPING_ONLY), 
3: no merging (NONE).  This option is not currently allowed, it will throw an exception.  Once we're more certain that unmerged lists are going to work in all cases in the GATK, we'll enable that.  

The command line option is --interval_merging or -im


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2437 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 21:59:14 +00:00
alecw 159778416c In TableRecalibrationWalker, update UQ tag if it was present in the original SAMRecord. This required a new sam.jar, which caused some other files to need to be changed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2435 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 21:42:36 +00:00
hanna 87ff2b15d4 First step in introducing a patch to Picard: create our ideal interface into the BAM file for sharding.
This commit can iterate over the BAM file, pulling out information about the blocks in the file without actually loading
or decompressing the reads.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2434 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 21:35:08 +00:00
ebanks 770093a40e Oops - forgot to check this one in.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2433 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 19:53:28 +00:00
ebanks dc96879861 2 separate changes which both affect lots of UG integration md5s, so I'm committing them together:
1. allele balance annotation is now weighted by genotype quality (so we don't get misled by borderline het calls)

2. Updates to the Unified Genotyper for parallelization:
   a. verbose writing now works again; arg was moved from UAC to UG
   b. UG checks for command that don't work with parallelization
   c. some cleanup



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2432 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 19:03:56 +00:00
ebanks 872a9d1c7b I'm making this change now (as opposed to waiting until Monday) to honor Tim's request.
The cycle covariate is now first/second of pair aware.  I'm taking it on faith from both Chris Hartl (waiting on slides from him) and Tim that this is the right thing to do.  We'll have Ryan confirm it all next week.
The only change is that if a read is the second of a pair, we multiple the cycle by -1 (a simple way of separating its index from that of its mate).
Of course, this broke all integration tests.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2431 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 16:26:43 +00:00
hanna e29e8e52b9 Multithreading support for the unified genotyper. Tests on a 10Mbase region on pilot 1 show a 6.8x improvement
when running 8 ways parallel.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2430 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-23 00:48:06 +00:00
kiran 164a94a3d0 Modified the walker documentation so that the stray punctuation wouldn't cause the GATK to stop parsing the help documenation early (aka I changed one word).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2429 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-22 20:50:01 +00:00
kiran 4ee6a478e3 Creates a table of reference allele percentage and alternate allele percentage at Hapmap-chip sites in a BAM file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2428 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-22 20:43:44 +00:00
ebanks 03bf75e335 Now implements TreeReducible
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2427 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-22 17:52:51 +00:00
hanna 0d890e1bf0 Rework Eric's output management code given that the behavior of the UG changes drastically
depending on its output format.  Current implementation is probably a bit overkill-ish and
we can whittle this down to what's absolutely necessary.
Writing VCFs to the 'out' protected printstream may not work at this moment.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2425 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-22 00:33:43 +00:00
ebanks f448a263e9 The cleaner now cleans duplicate reads (instead of ignoring them) - although it doesn't include them for scoring ref or alt consenses
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2424 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 21:01:55 +00:00
ebanks cf303810d3 VCF reader now creates the correct type of header line for each header type
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2423 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 20:39:06 +00:00
ebanks e06dfe44c4 Check for null platform (even when the read group isn't null) and assign it the default platform if it is
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2420 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 07:01:41 +00:00
ebanks 87e5a41964 Fixed a bug that accounted for a bunch of my remaining mis-cleaned indels.
Also, slightly optimized the cleaner by using readBases (instead of readString) and caching cigar element lengths.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2419 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 05:46:16 +00:00
hanna b780ffb34a Add a getFormat() method to get the output format from the writer. The need for
this call suggests that I may be thinking about the typing of the GenotypeWriter object the wrong way.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2418 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 01:46:26 +00:00
hanna 11cbfcec9c Get rid of backlink from ArgumentDefinitions to ArgumentSources. This will help in the future with multiple
source -> single definition mapping sets.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2417 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-21 00:39:36 +00:00
hanna 9e53c06328 First revision of command-line argument support for GenotypeWriter. Also, fixed the damn build.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2416 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 19:19:23 +00:00
ebanks 4ff61097cf Trivial change: < -> <=
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2415 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 03:35:27 +00:00
ebanks 566b556b50 Give user ability to turn off max allowed interval size
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2414 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 03:20:22 +00:00
ebanks a5f75cbfd4 The previous commit broke the build, so this is a temporary patch to get it to compile. ConcordanceTruthTable should use enums (esp. now that all of the concordance variables need to be public), but VariantEval will need to be rewritten soon anyways so I'll just push it off until then.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2413 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-20 02:34:41 +00:00
depristo ee8bcdc61d PooledConcordance calculations have been reformatted and bugs fixed. Now properly handles monomorphic sites. Also works with -G option now, correctly
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2412 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-19 23:22:36 +00:00
depristo 9bf2d12c64 Misc. improvements to the LMW code. Support for emitting all sites, regardless of genotype. Min and max quality scores.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2411 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-19 23:20:57 +00:00
aaron 7e0f69dab5 Changed the GLF record to store it's contig name and position in each record instead of in the Reader. Integration tests all stay the same.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2410 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:54:56 +00:00
hanna 80b3eb85fa Fixed curiously epic failure in read-backed pileup: size() mismatched the numReads-numDeletions at that locus in the case where includeReadsWithDeletionsAtLoci == false, causing failures including bad output from pileup walker. Also fixed up ValidatingPileup to run with the new ReadBackedPileup instead of just compiling successfully.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2409 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:52:44 +00:00
rpoplin fdf542c214 The CycleCovariate for 454 data is now the TACG flow cycle. That is, each flow grabs all the T's, A's, C's, and G's in order in a single cycle. This is changed from incrementing the cycle whenever there is a discontinuous nucleotide along the direction of the read.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2408 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 22:39:51 +00:00
aaron c39675d2c1 VCFTool.java got left off of the last commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2407 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 21:33:53 +00:00
ebanks 4ea31fd949 Pushed header initialization out of the GenotypeWriter constructors and into a writeHeader method, in preparation for parallelization.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2406 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 19:16:41 +00:00
ebanks eeddf0d08e Adding sample utils for convenience methods to pull out samples from e.g. SAMFileHeader or Genotype objects
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2405 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 18:51:21 +00:00
chartl 79b997f43d Minor fix to getValue (thanks Ryan!)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2404 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:45:51 +00:00
aaron 9971a8da9a adding a check to the RodVCF to ensure that records are in-order in the underlying VCF file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2403 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:24:45 +00:00
chartl 38563bbc2d The values used to be integers (-1 for unpaired, 0 for unmapped, 1 for first, 2 for second); but i switched to strings before commit so it was more clear. Forgot to update the OTHER getValue method.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2402 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:05:14 +00:00
chartl 7b5e332ff3 Added - PairedQualityScoreCountsWalker: counts quality scores (e.g. as a histogram) on first reads of a pair and second reads of a pair. Turns out there's a consistent difference in quality scores; even after recalibrating without the pair ordering as a covariate (there's a bit of averaging -- but not as much as I initially thought).
Added - A paired read order covariate to use with recalibration. Currently experimental: for instance, what's a proper pair versus just a pair? Nobody should use this one...



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2401 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 15:01:01 +00:00
ebanks 4f59bfd513 Updates to the various GenotypeWriters to make them do simple things like write records (plus allow GLFReader to close).
Adding first pass of stub and storage classes for the GenotypeWriters so that UG can be parallelizable.  Not hooked up yet, so UG is unchanged.
The mergeInto() code in the storage class is ugly, but it's all Tribble's fault.  We can clean it up later if this whole thing works.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2400 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 07:20:23 +00:00
ebanks 1cde4161b7 Fixed another test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2399 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 05:05:03 +00:00
ebanks 94f5edb68a 1. Fixed VCFGenotypeRecord bug (it needs to emit fields in the order specified by the GenotypeFormatString)
2. isNoCall() added to Genotype interface so that we can distinguish between ref and no calls (all we had before was isVariant())
3. Added Hardy-Weinberg annotation; still experimental - not working yet so don't use it.
4. Move 'output type' argument out of the UnifiedArgumentCollection and into the UnifiedGenotyper, in preparation for parallelization.
5. Improved some of the UG integration tests.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2398 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 04:14:14 +00:00
jmaguire 98839193b7 compatibility with VCF lib's switch to GenomeLoc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2397 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 00:52:48 +00:00
jmaguire 8787dd4c5e Various and sundry additions to VCF tools. Some useful to the general public, some one-offs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2396 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-18 00:35:45 +00:00
rpoplin 6fbf77be95 Updating the two solid_recal_mode options to also change the previous base since solid aligner prefers single color mismatch alignments over true SNP alignments. COUNT_AS_MISMATCH mode has been removed completely. The default mode is now SET_Q_ZERO.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2394 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 20:07:26 +00:00
hanna 07f1859290 Added integration test for running the recalibrator with no index.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2393 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 19:10:53 +00:00
ebanks c75ec67f84 When called as a standalone, VariantAnnotator now emits samples in sorted (as opposed to random) order in VCFs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2392 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 19:01:08 +00:00
rpoplin aa86f3710d Updating HomopolymerCovariate to only count the consecutive previous bases. I left in the code but commented out for if somebody wants to worry about carry forward homopolymer problems.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2391 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 18:25:09 +00:00
hanna b863fffdf6 Fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2390 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 17:55:00 +00:00
hanna 9143822822 Fix half-hearted attempt to try to move classes from package to package.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2389 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 17:41:42 +00:00
asivache e6cc7dab26 fixing md5 sum; new version of IndelIntervalWalker does the right thing...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2388 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 01:04:13 +00:00
asivache acb4d477da sync...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2387 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 01:03:01 +00:00
asivache ba86508854 remove debug print command
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2386 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-17 00:00:01 +00:00
asivache d72d332239 1) changed to search specifically for D and I cigar elements (and to process properly/ignore H,S,P elements) and print out only intervals that encompass actual indels. There's still one interval per read (at most) generated, which is the smallest intervals that covers ALL indels (D or I elements) present in the read; 2) if an interval (thus the original read itself and indels in it) sticks beyond the end of the chromosome, the read is ignored and this interval is NOT printed into the output; instead, a warning is printed to STDOUT (should we send it to logger.warn() instead?
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2385 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 23:29:07 +00:00
hanna 5b78354efd Fixed NPE in index check with RefWalkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2384 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 22:37:45 +00:00
hanna e6127cd6c5 Temporary hack for Tim Fennell: introduce a sharding strategy that stuffs all data into a single
shard for cases when the index file isn't available.  Works for the case in question, but is not
guaranteed to work in general.  Will be replaced once the new sharding system comes online.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2383 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:55:42 +00:00
ebanks bef1c50b3b Some cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2382 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:41:06 +00:00
ebanks bb92e31118 Optimizations:
1. push the ReadBackedPileup filtering up into the ReadFilters for read-based filters
2. stop querying the cigar for its length (just do it once)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2381 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:39:58 +00:00
andrewk 36875fca89 Update documentation in the new help system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2380 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:33:12 +00:00
hanna ee47eb4367 Make filters used available to the walker via getToolkit().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2379 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 21:26:04 +00:00
ebanks b626fc0684 Joint Estimate is now the default calculation model.
Reworked all of the integration tests so that they're now more comprehensive, cover more of what we wan to test, and don't take forever to run.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2376 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 19:41:02 +00:00
ebanks e051311e8c Added convenience methods in RodVCF to pull out all of the VCF data from the VCFRecord (e.g. getID(), getSamples(), getInfoValues())
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2374 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 17:58:41 +00:00
ebanks bb312814a2 UG is now officially in the business of making good SNP calls (as opposed to being hyper-aggressive in its calls and expecting the end-user to filter).
Bad/suspicious bases/reads (high mismatch rate, low MQ, low BQ, bad mates) are now filtered out by default (and not used for the annotations either), although this can all be turned off.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2373 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-16 17:28:09 +00:00
aaron af440943a4 Fixing a bug that Steven uncovered; we had an abigous contract for peek() in PushbackIterator, and SeekableRODIterator wasn't checking to see if it's PushbackIterator hasNext() was true before calling peek().
Changed peek() to element() to be consistant with the Java standards of the Queue and Stack classes (element() throws an exception if a record isn't available).  

Also updated some of the ROD iterator next() methods to throw NoSuchElementException if next() is called when a record isn't available.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2372 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 23:04:40 +00:00
andrewk 1035abc85f Add minimum base quality thresholding to depth of coverage via getBaseAndMappingFilteredPileup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2371 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 22:58:30 +00:00
sjia 2deae95df9 Updated documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2370 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 21:31:47 +00:00
hanna 555976d575 One more walker with formatting to fix.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2369 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 21:23:13 +00:00
hanna cf46472419 Fix up Sherman's new docs in compliance with javadoc specs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2368 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 21:20:38 +00:00
sjia df79ed8db1 Updated documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2367 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:53:41 +00:00
sjia a80a5f1036 Updated documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2366 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:52:08 +00:00
sjia 18f61d2586 Updated documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2365 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:45:19 +00:00
sjia 5974c42468 Updated documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2364 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:41:35 +00:00
sjia d8cfd707bc Updated documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2363 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:35:18 +00:00
sjia 4322beeb35 Updated documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2362 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:33:38 +00:00
sjia 4148991d81 Now also encodes amino acids, includes documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2361 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 20:26:56 +00:00
ebanks 9b0bdbbf29 Fix for homopolymer bug: ref was lowercase, alt allele was uppercase, so alt != ref. Yuck.
This is a temporary fix - pushed more elegant solution over to Matt.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2360 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 19:02:23 +00:00
depristo a810586418 Check-in without javadoc = smackdown
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2359 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 15:32:39 +00:00
ebanks b234019cf5 Readded locus printing suppression to DoC walker
(and removed unused import from UG)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2358 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 14:50:56 +00:00
depristo 0d2a761460 Bugfix for minBaseQuality to ignore deletion reads. LocusMismatch walker now allows us to skip every nths eligable site
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2357 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 14:38:39 +00:00
ebanks bf7bab754e Made getPileupWithoutMappingQualityZeroReads() and getPileupWithoutDeletions() more efficient, per Mark's cue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2356 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 04:35:21 +00:00
ebanks 874552ff75 Pull the genotype (and genotype quality) calculation out of the VCF code and into the Genotyper.
[Also, enable Mark's new UG arguments]



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2355 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 04:29:28 +00:00
depristo 2cbc85cc7a min mapping quality and min base quality arguments for UG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2354 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 03:57:27 +00:00
depristo faa638532a Correct location
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2353 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 02:42:21 +00:00
depristo 1da97ebb85 Walker for calculating non-independent base errors, v1. Will be moved to somewhere not in core
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2352 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-15 02:40:15 +00:00
chartl 1389ac6bdf Hurrr -- this uses power as part of its output. Changes to the power calculation broke the md5s RIGHT AFTER I HAD FIXED THEM arghflrg.
Will fix again.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2351 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 22:42:50 +00:00
chartl b42fc905e8 Added - new tests (Hapmap was re-added)
Modified - Hapmap now takes a -q command to filter out variants by quality
Modified - MathUtils - cumBinomialProbLog now uses BigDecimal to handle some numerical imprecisions
Modified - PowerBelowFrequency - returns 0.0 if called with a negative number (can't be done from inside the walker itself, but since it's called elsewhere one can't be too careful)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2350 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 21:57:20 +00:00
rpoplin 8e44bfd2ef CycleCovariate and PrimerRoundCovariate now correctly handle negative strand 454 and SOLID reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2349 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 21:52:30 +00:00
ebanks c7b23d6ca5 Now that VCFGenotypeRecords implement SampleBacked (as they should), a quick fix was needed to get the GenotypeConcordance working when no direct samples were provided in a samples file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2348 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 04:27:16 +00:00
asivache bd7b07f3f1 added PrimitivePair.Long and a few shortcut utility methods to PrimitivePairs: add(pair), subtract(pair), assignFrom(pair)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2347 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-14 00:15:44 +00:00
ebanks 97618663ef Refactored and generalized the VCF header info code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2346 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 21:02:45 +00:00
depristo 05b8782d5f Documentation updates. Moved CountX.java walkers to QC
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2345 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 18:40:22 +00:00
depristo 92307361a4 In preparation for move
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2344 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 18:28:06 +00:00
ebanks 45199136f0 Completed my documentation responsibilities - based on Mark's reasonable assignment and not the one Matt made up while on Meth.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2342 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 04:13:30 +00:00
ebanks bd2a46ab4c I want to move over to hpprojects tonight, so I'm checking in various changes all in one go:
1. Initial code for annotating calls with the base mismatch rate within a reference window (still needs analysis).
2. Move error checking code from rodVCF to VCFRecord.
3. More improvements to SNP Genotype callset concordance.
4. Fixed some comments in Variation/Genotype



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2341 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-13 02:52:18 +00:00
kiran 2748eb60e1 Added short documentation for each class so that it appears in the walker command-line documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2340 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-12 21:41:07 +00:00
rpoplin 78e94b5a84 TableRecalibration now puts the full list of walker arguments into the PG tag of the bam file it creates. Thanks Matt and Eric. Also, the default nback for the HomopolymerCovariate is 8, down from 10.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2339 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-12 17:29:41 +00:00
rpoplin 014013630f Added hieracrchy to the covariate classes: Required, Standard, and Experimental. Required covariates (rg and reported quality) are added for the user whether or not they are specified in the -cov list. There is now a -standard option in CountCovariates which will add in all of the standard covariates so the user doesn't have to type them all out or even know which ones are the standard. There is logger output to say which covariates are being used of course. The list of covariates used is also added to the PG tag in the bam file produced by TableRecalibration.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2338 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-12 16:34:05 +00:00
hanna 6955b5bf53 Cleanup of the doc system, and introduce Kiran's concept of a detailed summary
below the specific command-line arguments for the walker.  Also introduced
@help.summary to override summary descriptions if required.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2337 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-12 04:04:37 +00:00
hanna cdfe204d19 Incorporated feedback from Kiran. Use the Javadoc first sentence extraction capability to just show the first sentence from each line of Javadoc. @help.description can still be used to produce exceptionally verbose descriptions.
Also increased the line width as much as I could tolerate (100 characters -> 120 characters).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2336 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 21:59:55 +00:00
rpoplin 4fa4e95fbc Updated AnalyzeCovariates to extend org.broadinstitute.sting.utils.cmdLine.CommandLineProgram and use the standard argument parsing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2335 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-11 21:57:18 +00:00