chartl
59416ae06a
This is an annotation adapted from one that Mark Daly suggested some time ago. Right now it calculates:
...
- For all reference bases, the proportion of their second best bases that support the SNP
- the proportion of non-reference bases that support the SNP
and reports the difference between the two. Initially I was taking depth into account as well, but that did not appear to work as nicely as I'd like (even at 20,000x depth, if 95% of the non-reference bases are C, and 98% of the reference second-best-bases are C, then we would want to be suspicious of it; but perhaps slightly less so than if the depth were only 20...)
Anyway it's now available. I'm not sure how useful it will be, but I spawned the FHS annotation jobs again, so we'll see.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2109 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-22 00:47:49 +00:00
rpoplin
98f921fe24
The refactored CountCovariates now hashes the read object into a HashMap which holds all the properties the covariates pull out of the read over and over again such as read group string, bases string and its complement string, quality scores, etc. This results in a big speed up. CountCovariatesRefactored is now just slightly slower than CountCovariates (perhaps 1.07x according to my latest time trial). Thanks to Alec for suggesting IdentityHashMap. CycleCovariate now warns the user that is is defaulting to the Solexa definition of cycle when the platform string pulled out of the read is unrecognized instead of halting with an Exception.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2108 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-21 20:38:17 +00:00
depristo
27122f7f97
Performance improvements for pooled caller. Now possible to actually run on real data in a finite amount of time. Minor changes to GL interface (making strandIndex public) to support cached calculations in pooled caller.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2107 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-21 15:07:40 +00:00
ebanks
797bb83209
New VariantFiltration.
...
Wiki docs are updated.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2105 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 19:50:26 +00:00
hanna
a78bc60c0f
Minor tweak to improve ease-of-use of iterator system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2104 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 18:24:19 +00:00
hanna
4fbb6d05d0
Refactoring. Push the revisions to the common aligner interface down into
...
the aligner base classes. Hack the managed implementation to support the
new interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2103 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 17:08:09 +00:00
ebanks
d84444200b
The Unified Genotyper now sorts the sample names in the vcf that it outputs.
...
[There was no reason to enforce that every VCF being output from the GATK should have the samples sorted, since someone might want them ordered non-alphabetically]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2102 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 16:13:18 +00:00
hanna
38a030f2ba
Finishing off data transfer conduits for single alignment generator.
...
Misc bug fixes elsewhere.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2101 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 15:21:59 +00:00
ebanks
2a5349d886
VariantAnnotator now adds dbsnp id if a dbsnp rod is supplied and it's not already set for a record
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2100 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 03:26:09 +00:00
ebanks
b434c1c240
Check for null entries before adding
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2099 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 03:12:20 +00:00
depristo
82fd824c4d
Continuing improvements to unified genotyper
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2098 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-20 01:39:29 +00:00
aaron
33dcfc858d
updates to the paper genotyper based on Mark's comments. There's still more work to do, including more testing.
...
Also a 250% improvement in the getBases() and getQuals() of BasicPileup, which was nearly all of the runtime for the genotyper (using primitives instead of objects when possible).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2097 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 23:06:49 +00:00
rpoplin
22aaf8c5e0
Added the old recalibrator integration tests to the refactored recalibrator sitting in playground.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2096 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 22:43:28 +00:00
hanna
a95302fe98
Single alignment generator, another checkpoint. Does generate single alignments, but some of the data still
...
needs to plumbed through and it may leak memory.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2095 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 21:20:03 +00:00
hanna
a972b2769f
Checkpoint. Add first phase of single alignment interface.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2094 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 19:03:43 +00:00
chartl
306f4624c6
oops forgot to update the md5s
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2093 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 18:22:29 +00:00
aaron
6ba1f3321d
Fixed the sample mix-up bug Kiran discovered, and added a unit test in the VCF reader class (Thanks for the good example files Kiran). Also renamed the toStringRepresentation function to toStringEncoding, and added a matching method in VCFGenotypeRecord.
...
Updated the integration tests that were failing to due to different ordering of genotyping entries in VCF, I'll check in the VCF diff tool I wrote when I get a cycle or two.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2092 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 18:17:47 +00:00
chartl
b4babb82eb
adding an extra bit of data to come out of CTT (number of chips with actual data)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2091 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 17:46:10 +00:00
alecw
7623b39927
Add rodPicardDbSNP
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2088 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 17:27:46 +00:00
alecw
b2b4ff7eca
Cache SAMReadGroup rather than get it twice
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2087 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 17:27:18 +00:00
chartl
b3872386c9
Test to ensure that ConcordanceTruthTable and those walkers which rely on it for tabulating pooled truth information from truth information of the individuals within the pool is doing that calculation correctly. Tests single het, single hom (with/without reference), together, together without reference, and a mix of everything.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2082 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 15:26:32 +00:00
depristo
eeb3a3fffb
comments for Aaron
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2081 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 12:56:04 +00:00
aaron
7997455f38
first go of the genotyper for the GATK paper. More testing and review tomorrow to call it done.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2080 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 07:55:24 +00:00
ebanks
7b957d3e2e
Make the whining from Khalid's office stop already
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2079 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-19 03:04:48 +00:00
hanna
85bc9d3e91
(Hopefully) temporary hack: load contig information by contig name rather than contig id to avoid
...
off-by-one errors.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2078 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 23:33:27 +00:00
rpoplin
0fbd81766b
CountCovariates now uses any rod of type VariationRod with the name dbsnp as the source of known variant sites to skip over. It also grabs the platform string out of the read group when deciding which algorithm to use to calculate machine cycle. In this way it can now handle multi-platform bams. I added a new covariate: PositionCovariate. This is simply the offset regardless of which platform the read came from. This will be useful for comparing between the two covariates. Finally, this message serves as a warning that I will be killing the old recalibrator tomorrow after I've updated and verified new integration tests.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2077 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 23:03:47 +00:00
ebanks
f667bed7fc
-Don't annotate allele balance or on-off genotype if there's no genotype data
...
-If qscore is infinity (because of precision) make a best guess instead
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2076 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 22:01:32 +00:00
chartl
90212c643b
more effective & efficient test for SecondBaseSkew
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2075 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 20:53:32 +00:00
ebanks
087e01a439
minor changes for --noSLOD
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2074 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 18:48:01 +00:00
ebanks
a70cf2b763
A bunch of changes needed to make outputting pooled calls possible
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2073 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 18:42:57 +00:00
ebanks
0a35c8e0ba
1. The joint estimation model now constrains genotypes to be AA,AB,or BB only (i.e. to use a single alternate allele). Note that this doesn't work for the old models (point estimate or SSG) because calculations aren't divided by alternate allele.
...
2. Allele frequency spectrum is not emitted for single samples (since it doesn't make sense).
3. If in pooled mode, throw an exception of pool size isn't set appropriately.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2072 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 17:43:15 +00:00
chartl
405c6bf2c1
VariantEval genotype concordance for pools! Integration test coming soon
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2071 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 17:24:54 +00:00
depristo
6fe1c337ff
Pileup cleanup; pooled caller v1
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2070 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 17:03:48 +00:00
rpoplin
f0a234ab29
TableRecalibration is now much smarter about hashing calculations, taking advantage of the sequential recalibration formulation. Instead of hashing RecalDatums it hashes the empirical quality score itself. This cuts the runtime by 20 percent. TableRecalibration also now skips over reads with zero mapping quality (outputs them to the new bam but doesn't touch their base quality scores).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2069 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 16:47:44 +00:00
chartl
be31d7f4cc
Added - a walker that outputs relevant information about false negatives given a bunch of hapmap individuals and corresponding integration tests for it.
...
This will output for hapmap variant sites:
chromosome position ref allele variant allele number of variant alleles of the individuals depth of coverage power to detect singletons at lod 3 number of variant bases seen whether or not variant was called
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2068 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 15:47:52 +00:00
chartl
b68d6e06b7
Rollback of the previous "fix" and implementation of the real fix.
...
We totally *do* want to annotate the call if called by another walker. Totally boneheaded misenterpretation of what the code was doing -- Eric, please forgive me for being an idiot.
Instead, change the StingException to what it really should be -- an IllegalStateException, which is not coincidentally already handled by the calling function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2067 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 06:09:24 +00:00
chartl
95f1be94c0
Fix for the broken build:
...
do **not** attempt to annotate if UnifiedGenotyper is called from another walker! Why this didn't break the build earlier I have no idea.
Ultimately, there should be a better way of interfacing UG with another walker -- what if some other walker wants the annotations from UG? But since we're calling map directly -- and the annotations don't get returned directly from map -- this needs to be handled differently, while the map function should ultimately return the LOD score or quality under the GCM alone.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2066 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 05:56:31 +00:00
ebanks
9fb50e9bd9
Further refactoring so that pooled calling will work.
...
Okay, Mark, you should be all set.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2065 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 00:18:13 +00:00
chartl
539f6f15e5
Added --
...
Second base skew annotations and integration tests. Nothing need be given except -A SecondBaseSkew; the statistic it annotates calls with is a chi-square statistic given by the deviation of the observed proportion of reference second-best-bases from the expected 1/3. Future additions may be to ask that the deviation be instead from a given transition table.
A big note for all users: All IllegalStateExceptions from the variation ROD (e.g. the RodGeliText) are dealt with SILENTLY. I understand this isn't optimal, but I'd rather simply not annotate a non-bi-allelic site than fail completely (there are quite a few such sites even on the regions over which the integration test has been written).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2064 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-18 00:11:13 +00:00
depristo
42a0bbaf46
Minor reformating for pooled calling
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2063 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 22:06:11 +00:00
rpoplin
ec1a870905
Working with byte arrays is faster than working with Strings so the Covariates now take in byte arrays. None of the Covariates themselves used the reference base so I removed it. DinucCovariate now returns a Dinuc object which implements Comparable instead of returning a String because it was too slow. CountCovariates now uses a read filter to filter out unmapped reads and allows the user to specify -cov all which will use all of the available covariates, of which there are 7 now. If no covariates are specified it defaults to ReadGroup and QualityScore, the two required covariates. Initial code in place to leave SOLID bases alone if they have bad color space quality. TableRecalibration uses @Requires to tell the GATK to not give the reference bases since they weren't being used for anything.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2062 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 21:50:52 +00:00
ebanks
4d9c826766
Integration tests actually run on real data now.
...
<tries to hide sheepish grin>
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2061 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 21:04:14 +00:00
ebanks
5e126875ea
temporarily disable (tests are broken)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2060 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 20:45:52 +00:00
ebanks
a048f5cdf1
-Refactored JointEstimation code so that pooled calling will work
...
-Use phred-scale for fisher strand test
-Use only 2N allele frequency estimation points
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2059 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 20:21:15 +00:00
chartl
43bd4c8e8f
Ignoring deletions in the primary pileup by default was causing the primary pileup to become shorter than the secondary pileup when building up the secondary base pileup string. This fix makes sure to include the primary Ds within the pileup so that not only are the pileups guaranteed to be the same size, the same offsets will truly correspond with the same read.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2058 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 17:20:13 +00:00
aaron
aece7fa4c7
a convenience method to join a map into a single string, which I need for some VCF work. Added some documentation to the join method as well.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2057 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-17 16:50:01 +00:00
asivache
21729d9311
Do not print debug message when debug mode is not requested!!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2056 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 20:28:41 +00:00
rpoplin
967215066d
The old CountCovariates now warns the user if they didn't supply a dbSNP rod file. Thanks Kiran for the use case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2055 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 19:16:46 +00:00
rpoplin
eb07c7f7f8
CountCovariates now warns the user if they didn't supply a dbSNP rod file. Thanks Kiran for the use case.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2054 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 18:44:54 +00:00
ebanks
4558375575
Stage 1 of the VariantFiltration refactoring is now complete. There now exists a parallel tool called VariantAnnotator which simply takes variant calls and annotates them with the same type of data that we used to use for filtering (e.g. DoC, allele balance). The output is a VCF with the INFO field appropriately annotated.
...
VariantAnnotator can be called as a standalone walker or by another walker, as it is by the UnifiedGenotyper. UG now no longer computes any of this meta data - it relegates the task completely to the annotator (assuming the output format accepts it).
This is a fairly all-encompassing check in. It involves changes to all of the UG code, bug fixes to much of the VCF code as things popped up, and other changes throughout. All integration tests pass and I've tediously confirmed that the annotation values are correct, but this framework could use some more rigorous testing.
Stage 2 of the process will happen later this week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2053 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 02:41:20 +00:00
hanna
ce5034dc5d
Finally reinstate the iterator-style interface. Get rid of some scaffolding code.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2052 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-16 02:34:19 +00:00
kiran
103763fc84
An accessor for the VCF header
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2051 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-15 09:28:25 +00:00
kiran
97ed945797
Example code for a bug in the VCF implementation. See JIRA entry at http://jira.broadinstitute.org:8008/browse/GSA-225
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2050 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-15 09:27:12 +00:00
rpoplin
88fd762436
The -rf argument is now being used for read filter and is colliding with my walkers. Changed mine to -recalFile
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2048 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-14 19:37:46 +00:00
rpoplin
b05119987c
Clarified some of the comments in the individual covariates now that things have been moved around to speed up the code. In general most error checking and adjustments to the data are done per read instead of per base. This means that functionality was moved out of the covariate modules and into CovariateCounterWalker and TableRecalibrationWalker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2047 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-14 18:44:05 +00:00
rpoplin
672472789e
Added some documentation to the helper classes. Fixed an error case in TableRecalibrationWalker.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2046 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-14 18:13:43 +00:00
hanna
15c14add4d
Repackage the aligner for better partitioning. The C aligner, for example, is now
...
partitioned from the Java aligner, and both are partitioned from the more general-
purpose BWT reader.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2045 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 22:55:27 +00:00
rpoplin
d1b525b428
Default window size for NQS covariate is 3
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2040 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 19:24:27 +00:00
rpoplin
394c839974
Implemented NQS covariate. Extended Cycle covariate to handle 454 and SOLID reads. Added a Primer Round covariate for SOLID reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2039 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 19:22:21 +00:00
ebanks
bf451873ff
1. Bug fix: check that AF=0 doesn't contain more probability than 1-fraction
...
2. Fix for Kiran: allow UG to call SNPs at deletion sites; we'll add an annotation to the VariantAnotator for deletions at the locus (next week).
3. Added integration tests for joint estimation model
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2038 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 18:02:18 +00:00
asivache
1be36ca959
Bug fix: when cleanedReadIterator is initialized, it gets immediately set to the contig of the first cleaned read; when the first uncleaned read coming in is on the lower contig, this would trigger 'readNextContig' with that lower contig as an argument. As the result, the whole cleaned reads file would be read through the end and no cleaned reads would be ever seen by the code afterwards. Now we do not call readNextContig if the (uncleaned) read's contig is lower than the current contig already loaded into cleanedReadIterator. the 'readNextContig' method now also throws an exception if requested contig is less than the currently loaded one
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2037 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 15:41:26 +00:00
rpoplin
b1376e4216
structure refactored throughout for performance improvements
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2036 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 15:41:09 +00:00
depristo
cff31f2d06
comments for eric
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2035 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 14:19:31 +00:00
aaron
234bb71747
changed the toVariation() method to take a reference base, instead of using the reference base loaded from the underlying data source (if it was reference aware). Also changed some isVariant() methods which weren't using the passed in ref base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2034 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 06:54:38 +00:00
ebanks
902cf84448
Bug fix: if the most likely allele frequency is 0, don't make a variant call (even if the Qscore for AF=1/n > threshold)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2033 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 04:10:32 +00:00
ebanks
555fb975de
1. Print out allele frequency range (from joint estimation model only).
...
2. Don't print verbose output from SLOD calculation (it's just a repeat of previous output).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2032 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 03:59:13 +00:00
mmelgar
72825c4848
A walker that generates a table of secondary base counts in a bam file.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2031 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 02:11:23 +00:00
hanna
7c386fa428
Another case of reordering of read groups blowing up checksums.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2030 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-13 00:07:35 +00:00
hanna
8145ed4672
Take 2, updating picard with bug fix for bam files containing no reads.
...
Just stomped on the existing md5s because that's what Eric told me to do.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2029 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 22:52:08 +00:00
ebanks
61b5fb82ce
2 major changes:
...
1. Add dbsnp RS ID to VCF output from genotyper; to do this I needed to fix the dbsnp rod which did not correctly return this value.
2. Remove AlleleBalanceBacked and instead generalize the arbitrary info fields backing VCFs (and potentially others) in preparation for refactoring VariantFiltration next week.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2028 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 22:51:49 +00:00
mmelgar
3742a05760
Now can read E2 or SQ tag.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2027 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 15:18:21 +00:00
aaron
c3c001e02e
cleanup of the traversal output code
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2026 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 06:18:10 +00:00
ebanks
0922400ca9
Don't try to calculate ratios when DoC is zero (which happens when calls are made by an LD-aware genotyper)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2025 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 02:51:44 +00:00
ebanks
697d7e02c8
Remove the lazy initialize functionality. When no calls are made by the genotyper, we still want a vcf file to be output with valid header.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2024 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 02:14:50 +00:00
hanna
2ea85fb62b
Fix some problematic command-line argument naming and descriptions.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2023 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-12 02:12:26 +00:00
hanna
0c2a957ae0
Better configuration support. Now supports everything that people have expressed interest in except edit distance.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2021 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 20:54:49 +00:00
depristo
6c9f86bb4d
Removed unnecessary output and added debugging print() routine
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2020 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 18:37:36 +00:00
ebanks
578dcc54a4
Don't create a record if ref=N
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2018 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 04:32:17 +00:00
hanna
8406325247
New Picard is breaking one of the integration tests.
...
Revert until we find out whether the cause is legit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2017 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 03:59:32 +00:00
hanna
499e7d1d75
Push forward some more delicate merging routines.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2016 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 03:07:34 +00:00
hanna
bae4d3f7ea
Updated Picard with fix for Doug Voet. Thanks Alec.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2015 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-11 02:01:08 +00:00
hanna
2e4782f202
Command-line arguments for SamReadFilters.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2014 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 23:36:17 +00:00
rpoplin
a13cbe1df0
The refactored recalibrator now passes the integration tests as well as my own validation tests. I'm ready to have other people start jamming on the files. I'll make an updated wiki page soon. The refactored recalibrator is currently a bit slower than the old one but there were a lot of great, easy ideas today for how to improve it.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2013 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 22:20:06 +00:00
hanna
2cf9670d1e
Allow users to directly specify filters from the command-line, applicable to
...
any walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2012 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 18:40:16 +00:00
ebanks
6a37090529
Output changes for VCF and UG:
...
1. Don't cap q-scores at 99
2. Scale SLOD to allow more resolution in the output
3. UG outputs weighted allele balance (AB) and on-off genotype (OO) info fields for het genotype calls (works for joint estimation model and SSG)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2011 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 16:31:31 +00:00
rpoplin
1e7ddd2d9f
Added a validateOldRecalibrator option to CovariateCounterWalker which reorders the output to match the old recalibrator exactly. This facilitates direct comparison of output. Changed the -cov argument slightly to require the user to specify both ReadGroupCovariate and QualityScoreCovariate to make it more clear to the user which covariates are being used. Some speed up improvements throughout.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2010 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 15:55:56 +00:00
depristo
7e30fe230a
oops, missing file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2009 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 13:25:18 +00:00
depristo
d316cbad4c
VariantFilteration now accepts a VCF rod in addition to an input geli. It will then annotate this VCF file with filtering information in the INFO field too. --OnlyAnnotate will not write in filtering output
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2008 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 13:24:58 +00:00
aaron
f9819d5f13
a little clean-up
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2007 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 06:18:34 +00:00
aaron
2ed423ed56
print the current location in read walkers (in addition to the number of reads processed), along with some refactoring to support the change.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2006 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 05:57:01 +00:00
ebanks
c9c3cf477a
Based on feedback from Kiran, we know uniquify sample names as sample.rodName (instead of sample.1, sample.2, ...)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2005 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-10 02:41:37 +00:00
ebanks
2fa2ae43ec
Enough people have found this useful, so...
...
Moving Callset Concordance tool to core and adding integration test.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2003 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 20:59:18 +00:00
ebanks
3793519bd4
-Added convenience method to VCF record to tell if it's a no call and have rodVCF use it before querying for info fields
...
-Don't restrict info fields to 2-letter keys
[about to move these to core]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2002 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 20:52:51 +00:00
rpoplin
740a5484c4
Added some documentation to the code, mostly especially to CovariateCounterWalker but various comments added throughout. Also changed the HashMap data structure to accept an estimated initial capacity. This had a very modest improvement to the speed.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2001 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 20:13:56 +00:00
ebanks
74751a8ed3
-Some minor fixes to get accurate vcf record merging done
...
-Improvement to snp genotype concordance test
And with that, it looks like I get revision #2000 .
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2000 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 06:40:55 +00:00
ebanks
ab705565cf
Completely refactored the Callset Concordance code. Now, it takes in VCF rods and emits a single VCF file which has merged calls from all inputs and is annotated (in the INFO fields) with the appropriate concordance test(s).
...
Still needs a bit of polish...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1999 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 05:03:13 +00:00
ebanks
bc6f24e88f
Added VCFUtils which contains some useful VCF-related functions (e.g. ability to merge VCF records).
...
Also, various minor improvements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1998 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:53:32 +00:00
ebanks
cff645e98b
convenience method to deal with genotypes that are unsorted (e.g. CA vs. AC)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1997 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:45:49 +00:00
kiran
7fde6c0bf4
One more output tweak.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1996 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:42:55 +00:00
kiran
00a7113d7a
Tweaks to formatting of output table.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1995 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:33:36 +00:00
ebanks
7ce0df76f8
Added accessors to the rod data sources so that walkers can access the name/file/type triplets for input rods. This is necessary if e.g. you want to create a vcf writer based on all of the samples being input.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1994 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:25:39 +00:00
ebanks
d07f3bb6f6
Added methods to get strand bias and to test if record has allele freq or bias fields set.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1993 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-09 04:20:35 +00:00
kiran
3313b0ddb4
Fixed a minor bug where the lodThreshold wasn't being printed in the header.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1992 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:51:36 +00:00
kiran
95d381efe2
Optionally computes the error rate using the best base and a random base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1991 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:47:34 +00:00
kiran
567f5758d2
Optionally lists read depths by read group.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1990 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:39:19 +00:00
kiran
a679bdde18
FindContaminatingReadGroupsWalker lists read groups in a single-sample BAM file that appear to be contaminants by searching for evidence of systematic underperformance at likely homozygous-variant sites.
...
Procedure:
1. Sites that are likely homozygous-variant but are called as heterozygous are identified.
2. For each site and read group, we compute the proportion of bases in the pileup supporting an alternate allele.
3. A one-sample, left-tailed t-test is performed with the null hypothesis being that the alternate allele distribution has a mean of 0.95 and the alternate hypothesis being that the true mean is statistically significantly less than expected (pValue < 1e-9).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1989 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:36:39 +00:00
kiran
2225d8176e
A convenience class for maintaining a dynamically growing table of values with access to the elements by named row and column identifiers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1988 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 16:34:35 +00:00
hanna
21c5f543fa
Fix sharding bug -- loci to which >100,000 (= 1 shard) reads are assigned an
...
alignment start will confuse the sharding system and cause it to return duplicate reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1987 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-08 14:27:26 +00:00
rpoplin
84ba604611
Sequential quality score calculation is now in place in the refactored recalibrator and matches the quality scores calculated by the old recalibrator exactly; at least on the small sets of data used so far. Validation, documentation, and optimization work is on going.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1985 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 15:55:16 +00:00
depristo
bf1bc94060
Fixes for PooledConcordance bugs and lack of safety checking
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1984 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-07 01:54:10 +00:00
rpoplin
66d4a995e6
Initial check in of refactored Recalibrator. The new walkers are called CountCovariatesRefactored and TableRecalibrationRefactored. More work is needed to finish up the sequential calculation and to document the code sufficiently. These files are not ready to be used by other people quite yet.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1982 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 22:33:55 +00:00
ebanks
6fdfc97db6
Added optional field DP to VCF output for Mark.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1981 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 20:03:22 +00:00
ebanks
0a55fa5bb1
Completely refactored the Genotype Concordance module(s).
...
Now PooledConcordance and GenotypeConcordance inherit from the same super class (and can therefore share data structures and functionality). Also, they now use ConcordanceTruthTable to keep track of necessary info.
GenotypeConcordance passes integration tests.
PooledConcordance needs to be finished by Chris.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1979 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-06 16:27:16 +00:00
ebanks
d549347f25
Refactored GenotypeLikelihoods to use an underlying 4-base model.
...
It needs to be modified a bit and then hooked up to a pooled model, but that is now possible.
At this point, there is no difference to the Unified Genotyper.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1978 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 21:59:25 +00:00
jmaguire
4d3871c655
don't flush anymore.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1977 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 19:11:51 +00:00
aaron
aacd72854f
a fix for a bug Andrey discovered: in read-based interval traversals we're dupplicating reads in rare cases. The problem was that to accomidate a bug in SAM JDK indexing, we were forced to add one to the stop of our QueryOverlapping() calls to ensure we always got all of the overlapping reads.
...
Added a PlusOneFixIterator that wraps other iterators, and eliminates reads that start outside of our intended interval (interval stop - 1). Updated and checked BamToFastqIntegrationTest MD5 sums.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1976 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-05 05:26:33 +00:00
hanna
43c3ee61d5
Fix minor mapping quality bug.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1973 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-04 14:33:23 +00:00
ebanks
a545859c62
Joint Estimation model now emits a reasonable slod
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1969 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 21:12:42 +00:00
ebanks
11d950abe0
No longer allow the lod_threshold argument - use confidence instead.
...
Have UG output qscores in all cases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1968 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:18:51 +00:00
asivache
2fb45dbd73
Make window size a command line argument
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1967 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:13:35 +00:00
asivache
55f61b1f88
Bug fix in adjustment of the shift position.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1966 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 16:08:11 +00:00
depristo
5d5dc989e7
improvements to VCF and variant eval support of VCF -- now listens to the filter field
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1963 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 12:09:30 +00:00
hanna
c63af32fc7
The BWA/C bindings were triggering the local aligner to repeatedly reload the
...
ref genome. Make sure the reference genome is cached.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1961 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-03 00:01:55 +00:00
ebanks
3a33401822
2nd stage of the genotyper output refactoring is complete.
...
Now, all output is generalized and all of the intelligence lies where it is supposed to.
Next stage is syncing up old and new models and making sure we're outputting exactly what we should.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1960 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 22:43:08 +00:00
aaron
ba67c7f02b
added a warning for those using bed files; we properly convert bed to the internal representation but the user needs to be aware that any output will be one-based closed intervals
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1959 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 21:09:18 +00:00
aaron
b71b66bd88
the underlying parameter is a float so we need to use Float.valueOf() instead; Noticed by external user Hou Huabin
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1958 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 20:22:25 +00:00
hanna
5a510e6d98
New PackageUtils interferes with the packaging utility. Revert until Aaron and
...
I can get together to make this work.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1957 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 19:14:14 +00:00
aaron
de6ae51f7e
Scala walkers can now be build and run like any other walker in the GATK. Added the getUrlsForClasspath to PackageUtils, the Reflections package isn't getting the manifest files from jars in the classpath, and so we weren't seeing any walkers outside of the GenomeAnalysisTK.jar.
...
A couple of notes:
-Commented out BaseTransitionTableCalculator.scala because it's won't build; Chris could you fix this one (or kill it if it's not needed).
-Removed the PrintReadsScala walker; moved the code over to a ScalaCountLoci walker (which is what the code was really doing).
-Added configurations items to the ivy xml file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1956 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 06:02:41 +00:00
hanna
1896f334d9
Fixed collection of bugs in reads aligning to multiple locations.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1955 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-02 04:02:09 +00:00
ebanks
af6d0003f8
-Generalized the GenotypeConcordance module to deal with any number of individuals (although it will default to its old behavior if the -samples argument is left out).
...
-Make rods return the appropriate type of Genotype calls from getGenotype().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1954 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-01 05:35:47 +00:00
hanna
b95165e39c
Make alignment (temporarily) part of main GenomeAnalysisTK.jar. Add some extra logging errors on failure.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1953 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-01 00:33:18 +00:00
asivache
4b0796ba58
After fixing a few glitches and bugs, this version finally works as intended
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1952 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-31 04:59:58 +00:00
depristo
7d0ac7c6f2
Fix for long-term VariantEval bug plus new intergration test to catch it
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1951 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-31 00:00:33 +00:00
asivache
ea8d5c7077
Some internal refactoring. Now "safely" ignores duplicate records (NOT duplicate reads but rather malformed bam files!) resulting from the bug/feature in CleanedReadInjector.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1949 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 17:50:51 +00:00
hanna
a3da475c88
Documentation and cleanup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1946 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 15:40:28 +00:00
hanna
2d15891719
Created walkers for alignment, validation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1945 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 15:04:07 +00:00
ebanks
51fffc7f69
Comments for Ryan (which also apply to ReadQualityScoreWalker).
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1944 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 14:44:04 +00:00
ebanks
ccd7440730
We can actually make this a bit simpler (and faster)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1943 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 04:21:03 +00:00
ebanks
1b6333e4ab
Enough people have asked for this that it just needed to get written.
...
One can now split up any number of sets into an N-way Venn (although it doesn't check for discordance in the calls, so you'll still want to use SimpleVenn for 2-way comparisons).
Wiki docs are updated.
To do: update to use Ryan's generic hash map when it's ready for public use.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1942 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 04:08:45 +00:00
ebanks
4bdb5b03bd
tell UnifiedGenotyper to return calls at all bases
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1941 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 03:10:44 +00:00
ebanks
4ee1d6f733
-Have the calculation models determine whether a call passes the lod/confidence thresholds (as opposed to returning everything and letting the UG decide); this way, walkers which call map() will get only the good calls.
...
-Do the right thing in all models for all-base-mode (for Kiran).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1940 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 02:35:51 +00:00
ebanks
64ac956885
Okay, I caved in:
...
CallsetConcordance now gets possible concordance types by looking at classes that implement ConcordanceType instead of having them hard-coded in.
Thanks to Kiran this was pretty easy...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1939 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-30 00:32:26 +00:00
hanna
1f0d852a48
Fix bug where alignments with indels would be busted because bwa reverses
...
the read bases to undo a previous read base reverse that doesn't occur in the
libbwa codepath.
Also fixed some memory management issues.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1938 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 21:33:13 +00:00
asivache
e3b4d4cbed
Genotyper reimplemented. Does the same thing, at least for now, but internal data structures redesign enables collecting various statistics for indel-containing/reference-matching reads. The statistics are not yet used by the caller itself to make a better judgement w.r.t. the validity of the calls it makes, but they are now printed into the output stream (--verbose). The statistics (for both normal and tumor) include: indel observation count/total coverage, av. number of mismatches per indel-containing and per ref-matching read, av. mapping quality, av. mismatch rate and av. base quality within an NQS windoew around the indel, numbers of indel and ref observations per strand.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1936 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 19:09:16 +00:00
hanna
f04b80d7db
Fixed epic memory leak.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1934 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 16:32:43 +00:00
ebanks
2b96b2e4e7
better multi-sample integration test
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1933 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 13:51:51 +00:00
ebanks
1c4ca9d383
-Mark just reminded me: actually force the ref/loc to be immutable
...
-VCF writer should be blind to the score/confidence/lod value - just print the thing out as is
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1932 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 13:41:53 +00:00
ebanks
5cdbdd9e5b
now that the design is stable, pull the setReference and setLocation methods back out of Genotype and stick them into constructors of implementing classes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1931 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 13:27:37 +00:00
ebanks
3091443dc7
Sweeping changes to the genotype output system, as per several discussions with Matt & Aaron.
...
Some things still need to be changed, but it will entail some more design decisions first (which means I get to bug M&A again tomorrow!).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1930 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 03:46:41 +00:00
depristo
86573177d1
Reverting rod walkers to use underlying refwalker implementation while we work on ROD2 and reenable the system. Added some serious sparse file parsing to variant eval tests
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1929 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-29 01:04:37 +00:00