alecw
ac1b289d55
Add tile to ReadHashDatum, and implement TileCovariate
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2166 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 21:41:42 +00:00
depristo
db40e28e54
ReadBackedPileup in all its glory. Documented, aligned with the output of LocusIteratorByState, and caching common outputs for performance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2165 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 20:54:44 +00:00
rpoplin
b44363d20a
Removed silly casts from Integer to int.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2164 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 19:59:21 +00:00
ebanks
d0f673f0c0
Use Math.abs so we don't get (inconsistent) -0's
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2160 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 19:08:34 +00:00
rpoplin
6ff8526592
Added arguments to the recalibration walkers so the user can specify the default read group id and platform to use when a read has no read group. There are also options to force every read group and every platform to be the specified values. Added integration tests that use a bam file with no read groups. Added comments to all the covariates to explain what each of the methods in the Covariate interface are used for.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2157 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 15:41:12 +00:00
aaron
cfbd9332b0
small cleanups for the GATK paper genotyper; switched to the managed output system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2156 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 08:04:13 +00:00
ebanks
e1e5b35b19
Don't have the spanning deletions argument be a hard cutoff, but instead be a percentage of the reads in the pileup. Default is now 5% of reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2155 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 04:54:44 +00:00
depristo
03342c1fdd
Restructuring and interface change to ReadBackedPileup. We now lower support the Pileup interface, the BasicPileup static methods, and the ReadBackedPileup class. Now everything is a ReadBackedPileup and all methods to manipulate pileups are off of it. Also provides the recommended iterable() interface of pileup elements so you can use the syntax for (PileupElement p : pileup) and access directly from p.getBase() and p.getQual() and p.getSecondBase(). Only a few straggler walkers use the old style interface -- but those walkers will be retired soon. Documentation coming in the AM. Please everyone use the new syntax, it's safer, and will be more efficient as soon as the LocusIteratorByState directly emits the ReadBackedPileup for the Alignment context, as opposed to the current interface. In the process of the change over, discovered several bugs in the second-best base code due to things getting out of sync, but these changes were resolved manually. All other integrationtests passed without modification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2154 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 03:51:41 +00:00
ebanks
2cb3e53b0b
Verbose mode shouldn't be printing out 'NaN's and 'Infinity's
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2153 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 22:01:00 +00:00
rpoplin
c9ff5f209c
Added a CountCovariates integration test that uses a vcf file as the list of variant sites to skip over instead of the usual dbSNP rod.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2152 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 21:51:38 +00:00
ebanks
3484f652e7
1. Variation is now passed to VariantAnnotator along with the List of Genotypes so non-genotype calls has access to all relevant info.
...
2. Killed OnOffGenoype
3. SpanningDeletions is now SpanningDeletionFraction
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2151 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 21:47:20 +00:00
ebanks
e05cb346f3
GenotypeLocusData now extends Variation.
...
Also, Variations should be INSERTIONs or DELETIONs (and not just INDELs).
Technically, VCF records can be indels now.
More changes coming
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2150 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 21:07:55 +00:00
rpoplin
8b30279edc
style update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2149 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 20:56:31 +00:00
rpoplin
dffa46b380
BAM files created by TableRecalibration now have the version number and list of covariates used appended to their header with a new 'PG' tag. Eventually the entire list of command line args will be put in there as well. Big thanks to Matt and Aaron. The integration test uses the --no_pg_tag so that the md5 doesn't change every time the version number changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2148 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 20:53:57 +00:00
aaron
8fbc0c8473
fix for bug GSA-234: fasta index files couldn't handle anything but letters, numbers, or spaces in the contig name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2147 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 19:19:47 +00:00
andrewk
3fca23cd16
Added a stub treeReduce function for debugging multi-threaded execution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2146 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:51:19 +00:00
rpoplin
277e6d6b32
Further optimizations of TableRecalibration. This completes my goal of having the only math done in the map function be addition, subtraction and rounding the quality score to an integer. Everything else has been moved to the initialize method and only done once.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2145 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:21:57 +00:00
andrewk
e4546f802c
Accumulates coverage across hybrid selection bait intervals to assess effect of bait adjacency. Requires input bait intervals that have an overhang beyond the actual bait interval to capture coverage data at these points. Outputs R parseable file that has all data in lists and then does some basic plotting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2144 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:12:34 +00:00
andrewk
e5106c9924
Hybrid selection performance statistics now include counts of the number of adjacent baits (0,1,2) using OverlapDetector and optionally include assayed bait quantities input via interval lists.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2143 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:07:23 +00:00
ebanks
87c1860398
I'm not sure I believe it, but JProfiler claims that calling FourBaseProbs.isVerbose() was taking 5% of my runtime...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2142 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 17:00:32 +00:00
ebanks
b3f561710f
Optimizations:
...
1. Only do calculations in UG for alternate allele with highest sum of quality scores (note that this also constitutes a bug fix for a precision problem we were having).
2. Avoid using Strings in DiploidGenotype when we can (it was taking 1.5% of my compute according to JProfiler)
UG now runs in half the time for JOINT_ESTIMATE model.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2141 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 16:27:39 +00:00
rpoplin
a59e5b5e1a
Added dbSNP sanity check to CountCovariates. If the mismatch rate is too low at dbSNP sites it warns the user that the dbSNP file is suspicious. Added option in CountCovariates and TableRecalibration to ignore read group id's and collapse them together. Also, If the read group is null the walkers no long crash with NullPointerException but instead warn the user the read group and platform are defaulting to some values. Default window size in MinimumNQSCovariate is 5 (two bases in either direction) based on rereading of Chris's analysis.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2140 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 16:16:44 +00:00
alecw
e5e6d515c3
Fix misunderstanding of GenomeLoc interval
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2138 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 15:12:49 +00:00
ebanks
cb6d6f2686
Very minor performance improvements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2137 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 05:21:07 +00:00
ebanks
c90bea39a1
read.getReadString().charAt(offset) --> read.getReadBases()[offset]
...
[As a courtesy I fixed all instances once I was updating GenotypeLikelihoods]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2136 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 04:25:19 +00:00
ebanks
ec321abd7b
Added ability to filter on the QUAL field
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2135 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 04:08:22 +00:00
ebanks
36d493e645
All standard annotations now inherit from StandardVariantAnnotation. Users can specify whether they want all annotations, just the standard annotations, or specific annotations. When calling in from another walker, the default is just the standard ones.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2134 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 03:55:12 +00:00
ebanks
ee5093d2c6
-Added VariantFiltration integration tests
...
-Added integration test for GLFs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2133 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 02:36:27 +00:00
ebanks
be6a549e7b
Added the capability to allow expressions in an integration test command (i.e. -filter 'foo') by escaping them in the command.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2132 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 02:34:48 +00:00
hanna
903342745d
Basic integration test for the aligner.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2131 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 23:08:05 +00:00
hanna
4837fe919c
Convenience changes. If no -BWT option is specified, pull the BWT location from the reference.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2130 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 22:46:05 +00:00
rpoplin
9e4eadc37c
CountCovariates v2.0.2: Added a --process_nth_locus <int> argument to only use every Nth covered locus when creating the recalibration table.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2129 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 22:07:38 +00:00
chartl
6a52ca3db6
Update to the UG integration test. Why I had to rm -rf my entire sting directory to get it to correctly fail we may never know.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2128 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 21:23:00 +00:00
ebanks
ed4cf3de57
Check that we're biallelic before calling isSNP()
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2127 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 20:20:48 +00:00
rpoplin
5744a1d968
The covariates don't care about SAMRecord's anymore - Cleaning up the import statements.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2126 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 20:10:12 +00:00
chartl
23983b2fd8
New annotation: ResidualQuality
...
Computes a metric for how much error is left that isn't explained by ref or snp bases. This is the sum of Q scores, weighted by the proportion of non-ref non-snp bases to non-snp bases. Reported in Log space.
Update to the integration test so bamboo doesn't look as though someone murdered it with a spork
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2124 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 20:04:01 +00:00
ebanks
70059a0fc9
Refactored joint estimation model to allow subclasses to overload PofD calculation over all frequencies. Pooled model now takes only 20% of time that it used to.
...
Added integration test for pooled model and updated other joint estimation tests to be more comprehensive now that they are faster.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2123 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 20:03:38 +00:00
rpoplin
7f947f6b60
Updated recalibrator integration tests to use all three platforms as well as a bam with multi-platform reads intermingled. CountCovariates v2.0.1: Once again uses a read filter to filter out zero mapping quality reads. Added --sorted_output option to output the table recalibration file in sorted order
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2122 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 19:51:36 +00:00
ebanks
c299ca5f49
It would help if I copied the MD5s from the right integration test...
...
I hate Mondays.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2121 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 17:21:36 +00:00
ebanks
ff4797acbb
Forgot to check in integration test update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2120 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 17:13:51 +00:00
ebanks
14bf6ce83c
1. Newest version of the joint estimation model. Faster than previous version and now qscores can get to be > 39.8 for hets.
...
2. More sanity checks in annotations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2119 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 17:05:50 +00:00
hanna
ee2abd30c4
Count the best alignments and emit them to a file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2118 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 16:37:59 +00:00
rpoplin
1d46de6d34
The old recalibrator is replaced with the refactored recalibrator. Added a version message to the logger output. These walkers start at version 2.0.0
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2117 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 14:58:33 +00:00
ebanks
dfe7d69471
1. VCF: don't print slod if it's never set
...
2. UG: don't print slod if lods are infinite (todo: figure out a good guess instead)
3. UG: if probF=0 for 2 alt alleles are both 0 (because of precision), use log values to discriminate
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2116 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 02:55:43 +00:00
ebanks
753cb100a3
Add checks for weird situations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2115 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 02:14:25 +00:00
ebanks
04d6ac940c
Always print out VCF header - not just when there is genotype data present.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2114 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 01:44:10 +00:00
ebanks
bf935a6ab1
1. Fixed bug in PrimaryBaseSecondaryBaseSymmetry code (not checking for null before trying to access object's methods) which was causing Integration Tests to fail.
...
2. Retired allele frequency range from UG, which wasn't very useful.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2113 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-23 01:31:48 +00:00
rpoplin
b24240664f
Reduced the number of calls to new ArrayList() in TableRecalibration. This results in a speed up of perhaps up to 6 percent (timed trials are hard).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2112 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-22 17:24:31 +00:00
hanna
c9c4999354
BWA: odds and ends. Get rid of some spurious debug code that was accidentally
...
checked in. Add a better way to write out unmapped reads (thanks Kiran!) Add
a pre-built version of the shared library to the repository for early adoption.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2111 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-22 15:26:07 +00:00
depristo
9c206abb97
removing unnecessary printing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2110 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-22 12:41:48 +00:00
chartl
59416ae06a
This is an annotation adapted from one that Mark Daly suggested some time ago. Right now it calculates:
...
- For all reference bases, the proportion of their second best bases that support the SNP
- the proportion of non-reference bases that support the SNP
and reports the difference between the two. Initially I was taking depth into account as well, but that did not appear to work as nicely as I'd like (even at 20,000x depth, if 95% of the non-reference bases are C, and 98% of the reference second-best-bases are C, then we would want to be suspicious of it; but perhaps slightly less so than if the depth were only 20...)
Anyway it's now available. I'm not sure how useful it will be, but I spawned the FHS annotation jobs again, so we'll see.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2109 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-22 00:47:49 +00:00
rpoplin
98f921fe24
The refactored CountCovariates now hashes the read object into a HashMap which holds all the properties the covariates pull out of the read over and over again such as read group string, bases string and its complement string, quality scores, etc. This results in a big speed up. CountCovariatesRefactored is now just slightly slower than CountCovariates (perhaps 1.07x according to my latest time trial). Thanks to Alec for suggesting IdentityHashMap. CycleCovariate now warns the user that is is defaulting to the Solexa definition of cycle when the platform string pulled out of the read is unrecognized instead of halting with an Exception.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2108 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-21 20:38:17 +00:00
depristo
27122f7f97
Performance improvements for pooled caller. Now possible to actually run on real data in a finite amount of time. Minor changes to GL interface (making strandIndex public) to support cached calculations in pooled caller.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2107 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-21 15:07:40 +00:00
ebanks
797bb83209
New VariantFiltration.
...
Wiki docs are updated.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2105 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 19:50:26 +00:00
hanna
a78bc60c0f
Minor tweak to improve ease-of-use of iterator system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2104 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 18:24:19 +00:00
hanna
4fbb6d05d0
Refactoring. Push the revisions to the common aligner interface down into
...
the aligner base classes. Hack the managed implementation to support the
new interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2103 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 17:08:09 +00:00
ebanks
d84444200b
The Unified Genotyper now sorts the sample names in the vcf that it outputs.
...
[There was no reason to enforce that every VCF being output from the GATK should have the samples sorted, since someone might want them ordered non-alphabetically]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2102 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 16:13:18 +00:00
hanna
38a030f2ba
Finishing off data transfer conduits for single alignment generator.
...
Misc bug fixes elsewhere.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2101 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 15:21:59 +00:00
ebanks
2a5349d886
VariantAnnotator now adds dbsnp id if a dbsnp rod is supplied and it's not already set for a record
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2100 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 03:26:09 +00:00
ebanks
b434c1c240
Check for null entries before adding
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2099 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 03:12:20 +00:00
depristo
82fd824c4d
Continuing improvements to unified genotyper
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2098 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 01:39:29 +00:00
aaron
33dcfc858d
updates to the paper genotyper based on Mark's comments. There's still more work to do, including more testing.
...
Also a 250% improvement in the getBases() and getQuals() of BasicPileup, which was nearly all of the runtime for the genotyper (using primitives instead of objects when possible).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2097 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 23:06:49 +00:00
rpoplin
22aaf8c5e0
Added the old recalibrator integration tests to the refactored recalibrator sitting in playground.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2096 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 22:43:28 +00:00
hanna
a95302fe98
Single alignment generator, another checkpoint. Does generate single alignments, but some of the data still
...
needs to plumbed through and it may leak memory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2095 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 21:20:03 +00:00
hanna
a972b2769f
Checkpoint. Add first phase of single alignment interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2094 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 19:03:43 +00:00
chartl
306f4624c6
oops forgot to update the md5s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2093 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 18:22:29 +00:00
aaron
6ba1f3321d
Fixed the sample mix-up bug Kiran discovered, and added a unit test in the VCF reader class (Thanks for the good example files Kiran). Also renamed the toStringRepresentation function to toStringEncoding, and added a matching method in VCFGenotypeRecord.
...
Updated the integration tests that were failing to due to different ordering of genotyping entries in VCF, I'll check in the VCF diff tool I wrote when I get a cycle or two.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2092 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 18:17:47 +00:00
chartl
b4babb82eb
adding an extra bit of data to come out of CTT (number of chips with actual data)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2091 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 17:46:10 +00:00
alecw
7623b39927
Add rodPicardDbSNP
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2088 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 17:27:46 +00:00
alecw
b2b4ff7eca
Cache SAMReadGroup rather than get it twice
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2087 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 17:27:18 +00:00
chartl
b3872386c9
Test to ensure that ConcordanceTruthTable and those walkers which rely on it for tabulating pooled truth information from truth information of the individuals within the pool is doing that calculation correctly. Tests single het, single hom (with/without reference), together, together without reference, and a mix of everything.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2082 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 15:26:32 +00:00
depristo
eeb3a3fffb
comments for Aaron
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2081 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 12:56:04 +00:00
aaron
7997455f38
first go of the genotyper for the GATK paper. More testing and review tomorrow to call it done.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2080 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 07:55:24 +00:00
ebanks
7b957d3e2e
Make the whining from Khalid's office stop already
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2079 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 03:04:48 +00:00
hanna
85bc9d3e91
(Hopefully) temporary hack: load contig information by contig name rather than contig id to avoid
...
off-by-one errors.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2078 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 23:33:27 +00:00
rpoplin
0fbd81766b
CountCovariates now uses any rod of type VariationRod with the name dbsnp as the source of known variant sites to skip over. It also grabs the platform string out of the read group when deciding which algorithm to use to calculate machine cycle. In this way it can now handle multi-platform bams. I added a new covariate: PositionCovariate. This is simply the offset regardless of which platform the read came from. This will be useful for comparing between the two covariates. Finally, this message serves as a warning that I will be killing the old recalibrator tomorrow after I've updated and verified new integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2077 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 23:03:47 +00:00
ebanks
f667bed7fc
-Don't annotate allele balance or on-off genotype if there's no genotype data
...
-If qscore is infinity (because of precision) make a best guess instead
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2076 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 22:01:32 +00:00
chartl
90212c643b
more effective & efficient test for SecondBaseSkew
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2075 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 20:53:32 +00:00
ebanks
087e01a439
minor changes for --noSLOD
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2074 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 18:48:01 +00:00
ebanks
a70cf2b763
A bunch of changes needed to make outputting pooled calls possible
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2073 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 18:42:57 +00:00
ebanks
0a35c8e0ba
1. The joint estimation model now constrains genotypes to be AA,AB,or BB only (i.e. to use a single alternate allele). Note that this doesn't work for the old models (point estimate or SSG) because calculations aren't divided by alternate allele.
...
2. Allele frequency spectrum is not emitted for single samples (since it doesn't make sense).
3. If in pooled mode, throw an exception of pool size isn't set appropriately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2072 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 17:43:15 +00:00
chartl
405c6bf2c1
VariantEval genotype concordance for pools! Integration test coming soon
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2071 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 17:24:54 +00:00
depristo
6fe1c337ff
Pileup cleanup; pooled caller v1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2070 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 17:03:48 +00:00
rpoplin
f0a234ab29
TableRecalibration is now much smarter about hashing calculations, taking advantage of the sequential recalibration formulation. Instead of hashing RecalDatums it hashes the empirical quality score itself. This cuts the runtime by 20 percent. TableRecalibration also now skips over reads with zero mapping quality (outputs them to the new bam but doesn't touch their base quality scores).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2069 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 16:47:44 +00:00
chartl
be31d7f4cc
Added - a walker that outputs relevant information about false negatives given a bunch of hapmap individuals and corresponding integration tests for it.
...
This will output for hapmap variant sites:
chromosome position ref allele variant allele number of variant alleles of the individuals depth of coverage power to detect singletons at lod 3 number of variant bases seen whether or not variant was called
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2068 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 15:47:52 +00:00
chartl
b68d6e06b7
Rollback of the previous "fix" and implementation of the real fix.
...
We totally *do* want to annotate the call if called by another walker. Totally boneheaded misenterpretation of what the code was doing -- Eric, please forgive me for being an idiot.
Instead, change the StingException to what it really should be -- an IllegalStateException, which is not coincidentally already handled by the calling function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2067 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 06:09:24 +00:00
chartl
95f1be94c0
Fix for the broken build:
...
do **not** attempt to annotate if UnifiedGenotyper is called from another walker! Why this didn't break the build earlier I have no idea.
Ultimately, there should be a better way of interfacing UG with another walker -- what if some other walker wants the annotations from UG? But since we're calling map directly -- and the annotations don't get returned directly from map -- this needs to be handled differently, while the map function should ultimately return the LOD score or quality under the GCM alone.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2066 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 05:56:31 +00:00
ebanks
9fb50e9bd9
Further refactoring so that pooled calling will work.
...
Okay, Mark, you should be all set.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2065 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 00:18:13 +00:00
chartl
539f6f15e5
Added --
...
Second base skew annotations and integration tests. Nothing need be given except -A SecondBaseSkew; the statistic it annotates calls with is a chi-square statistic given by the deviation of the observed proportion of reference second-best-bases from the expected 1/3. Future additions may be to ask that the deviation be instead from a given transition table.
A big note for all users: All IllegalStateExceptions from the variation ROD (e.g. the RodGeliText) are dealt with SILENTLY. I understand this isn't optimal, but I'd rather simply not annotate a non-bi-allelic site than fail completely (there are quite a few such sites even on the regions over which the integration test has been written).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2064 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 00:11:13 +00:00
depristo
42a0bbaf46
Minor reformating for pooled calling
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2063 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 22:06:11 +00:00
rpoplin
ec1a870905
Working with byte arrays is faster than working with Strings so the Covariates now take in byte arrays. None of the Covariates themselves used the reference base so I removed it. DinucCovariate now returns a Dinuc object which implements Comparable instead of returning a String because it was too slow. CountCovariates now uses a read filter to filter out unmapped reads and allows the user to specify -cov all which will use all of the available covariates, of which there are 7 now. If no covariates are specified it defaults to ReadGroup and QualityScore, the two required covariates. Initial code in place to leave SOLID bases alone if they have bad color space quality. TableRecalibration uses @Requires to tell the GATK to not give the reference bases since they weren't being used for anything.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2062 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 21:50:52 +00:00
ebanks
4d9c826766
Integration tests actually run on real data now.
...
<tries to hide sheepish grin>
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2061 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 21:04:14 +00:00
ebanks
5e126875ea
temporarily disable (tests are broken)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2060 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 20:45:52 +00:00
ebanks
a048f5cdf1
-Refactored JointEstimation code so that pooled calling will work
...
-Use phred-scale for fisher strand test
-Use only 2N allele frequency estimation points
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2059 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 20:21:15 +00:00
chartl
43bd4c8e8f
Ignoring deletions in the primary pileup by default was causing the primary pileup to become shorter than the secondary pileup when building up the secondary base pileup string. This fix makes sure to include the primary Ds within the pileup so that not only are the pileups guaranteed to be the same size, the same offsets will truly correspond with the same read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2058 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 17:20:13 +00:00
aaron
aece7fa4c7
a convenience method to join a map into a single string, which I need for some VCF work. Added some documentation to the join method as well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2057 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 16:50:01 +00:00
asivache
21729d9311
Do not print debug message when debug mode is not requested!!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2056 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 20:28:41 +00:00
rpoplin
967215066d
The old CountCovariates now warns the user if they didn't supply a dbSNP rod file. Thanks Kiran for the use case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2055 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 19:16:46 +00:00
rpoplin
eb07c7f7f8
CountCovariates now warns the user if they didn't supply a dbSNP rod file. Thanks Kiran for the use case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2054 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 18:44:54 +00:00
ebanks
4558375575
Stage 1 of the VariantFiltration refactoring is now complete. There now exists a parallel tool called VariantAnnotator which simply takes variant calls and annotates them with the same type of data that we used to use for filtering (e.g. DoC, allele balance). The output is a VCF with the INFO field appropriately annotated.
...
VariantAnnotator can be called as a standalone walker or by another walker, as it is by the UnifiedGenotyper. UG now no longer computes any of this meta data - it relegates the task completely to the annotator (assuming the output format accepts it).
This is a fairly all-encompassing check in. It involves changes to all of the UG code, bug fixes to much of the VCF code as things popped up, and other changes throughout. All integration tests pass and I've tediously confirmed that the annotation values are correct, but this framework could use some more rigorous testing.
Stage 2 of the process will happen later this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2053 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 02:41:20 +00:00