ebanks
3303808a8f
Yet more walkers moved to oneoffprojects.
...
Made hybridselection subdir in playground.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2205 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 21:29:12 +00:00
ebanks
05923f7fba
Started transition to oneoffprojects.
...
Moved/killed a few other walkers (with permission).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2204 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 21:19:02 +00:00
ebanks
c36069355e
Trivial change to verbose
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2203 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 20:48:10 +00:00
jmaguire
74f6526e09
VCFHomogenizer: A class that extends InputStream and dynamically re-writes pilot1 VCF's to be on-spec.
...
VCFTool: A command-line tool with various useful VCF functions (validate, grep, concordance).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2202 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 17:55:42 +00:00
jmaguire
adf8f1f8b3
Add an InputStream constructor, which is immensely useful for various reasons.
...
Also a minor performance optimization.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2201 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 17:25:00 +00:00
ebanks
e581cceab6
Got Kris's permission to delete these walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2200 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 16:57:28 +00:00
rpoplin
3180fffd43
Eliminated unnecessary boxing of longs in RecalDatum. Changes to RecalDatum in preparation for new AnalyzeCovariates script. Updated TableRecalibrationWalker to make use of these changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2199 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 16:49:05 +00:00
chartl
21a9a717e4
Some minor changes and test:
...
- DepthOfCoverage is now by reference (so locus-by-locus output correctly reports zero-coverage bases)
- VariantsToVCF now lets you bind variants with any string except intervals and dbsnp (not just NA######)
- A PileupWalker integration test on a particularly nasty FHS site
- Two second-base annotation related integration tests on that same site
+ outputs were all hand-validated in matlab; within a certain tolerance for the annotations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2197 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 15:15:54 +00:00
ebanks
084337087e
Removing deprecated code and walkers for which I had the green light from repository.
...
Moved piecemealannotator and secondarybases to archive.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2195 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 05:58:20 +00:00
ebanks
2c16c18a04
Move Andrey's old indel code (plus MSG accuracy test, which depends on it) to archive.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2194 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 05:29:00 +00:00
ebanks
7c6c490652
An unfinished implementation of the Wilcoxon rank sum test and a variant annotation that uses it. I need to merge and update this code with Tim's implementation somehow - but that won't happen until later this week, so I'm committing this before I accidentally blow it away.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2193 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 04:56:17 +00:00
ebanks
00f15ea909
Improved performance of deletion-free pileup and added mapping-quality-zero-free pileup convenience method.
...
Finished converting genotyper and annotator code to new ReadBackedPileup system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2192 348d0f76-0448-11de-a6fe-93d51630548a
2009-12-01 04:50:47 +00:00
rpoplin
6bb864da2a
More misc cleanup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2191 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 22:29:07 +00:00
rpoplin
b89b9adb2c
misc code cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2190 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 21:16:00 +00:00
depristo
e793e62fc9
minor code cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2189 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 20:57:20 +00:00
rpoplin
4969cb1957
CountCovariates uses new optimized ReadBackedPileup. It also smarter about re-doing calculations for the dnsnp variation rate sanity check.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2188 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 20:35:40 +00:00
ebanks
add2fa7ab4
more use of new ReadBackedPileup optimizations
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2187 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 20:04:01 +00:00
rpoplin
817e2cb8c5
Recalibrator makes use of the new GATKSAMRecord wrapper and now no longer has to hash the SAMRecord. Covariate's getValue method signature has changed to take the SAMRecord instead of the ReadHashDatum. ReadHashDatum removed completely.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2185 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 19:59:17 +00:00
ebanks
e9a8156cfb
Use new optimized ReadBackedPileup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2184 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 18:17:18 +00:00
rpoplin
d8146ab23d
Changed the format of the recalibration csv file slightly so that it is easier to load the file into something like R and look at the values of the covariates.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2183 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 17:55:23 +00:00
ebanks
a184d28ce9
Completing the optimization started by Matt: we now wrap SAMRecords and SAMReadGroupRecords with our own versions which cache oft-used variables (e.g. platform, readString, strand flag). All walkers automagically get this speedup since the wrapping occurs in the engine.
...
I note that all integration/unit tests pass except for BaseTransitionTableCalculatorJava, which is already broken.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2182 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 17:39:29 +00:00
depristo
af22ca1b47
Bug fixes for VariantEval. dbCoverage now reports dbSNP rate, not some wierd eval_snps_in_db as before. We now separate non-indel and non-snp db sites in dbcoverage. Some dbSNP records don't fit into these two categories. Also fixed a consistency issue where novel / known sites where being determined solely by whether dbSNP had a record there, rather than the stricter dbcoverage screen for isSNP().
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2180 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-30 01:39:01 +00:00
chartl
27651d8dc2
Oops. numReads is now called size
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2175 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 06:59:17 +00:00
chartl
21744e024b
Quick walker that determines % of bases covered at (user - defined depth)x . I've been maintaining it in my directories alone, but now that i've accidentally deleted it twice, into playground it goes
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2174 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-29 06:51:19 +00:00
hanna
3300ca906a
An iterator for Eric to use when injecting his new wrapping reads -- a stopgap solution for getting additional caching
...
functionality into a SAMRecord.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2173 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-27 22:25:52 +00:00
rpoplin
26db15be5c
Added SingleReadGroupFilter to only use reads from a specific read group, filtering out all others.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2172 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-27 20:33:59 +00:00
rpoplin
91f5672a32
misc cleanup
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2171 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-27 19:56:20 +00:00
rpoplin
d1298dda13
Encapsulated the sections of code that were shared by the two Recalibration walkers. This includes both the shared command line arguments and the section of code in the map methods which pull out data from the SAMRecord and stuff it into the ReadHashDatum. Command line arguments are now passed to the Covariates using a new initialize method that all Covariates must implement. Updated the dbsnp sanity check warning message to be less cryptic.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2170 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-27 19:54:10 +00:00
depristo
75b61a3663
Updated, optimized REadBackedPileup. Updated test that was breaking the build -- it created a pileup from reads without bases...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2169 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 23:30:39 +00:00
alecw
ac1b289d55
Add tile to ReadHashDatum, and implement TileCovariate
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2166 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 21:41:42 +00:00
depristo
db40e28e54
ReadBackedPileup in all its glory. Documented, aligned with the output of LocusIteratorByState, and caching common outputs for performance
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2165 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 20:54:44 +00:00
rpoplin
b44363d20a
Removed silly casts from Integer to int.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2164 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 19:59:21 +00:00
ebanks
d0f673f0c0
Use Math.abs so we don't get (inconsistent) -0's
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2160 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 19:08:34 +00:00
rpoplin
6ff8526592
Added arguments to the recalibration walkers so the user can specify the default read group id and platform to use when a read has no read group. There are also options to force every read group and every platform to be the specified values. Added integration tests that use a bam file with no read groups. Added comments to all the covariates to explain what each of the methods in the Covariate interface are used for.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2157 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 15:41:12 +00:00
aaron
cfbd9332b0
small cleanups for the GATK paper genotyper; switched to the managed output system.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2156 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 08:04:13 +00:00
ebanks
e1e5b35b19
Don't have the spanning deletions argument be a hard cutoff, but instead be a percentage of the reads in the pileup. Default is now 5% of reads.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2155 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 04:54:44 +00:00
depristo
03342c1fdd
Restructuring and interface change to ReadBackedPileup. We now lower support the Pileup interface, the BasicPileup static methods, and the ReadBackedPileup class. Now everything is a ReadBackedPileup and all methods to manipulate pileups are off of it. Also provides the recommended iterable() interface of pileup elements so you can use the syntax for (PileupElement p : pileup) and access directly from p.getBase() and p.getQual() and p.getSecondBase(). Only a few straggler walkers use the old style interface -- but those walkers will be retired soon. Documentation coming in the AM. Please everyone use the new syntax, it's safer, and will be more efficient as soon as the LocusIteratorByState directly emits the ReadBackedPileup for the Alignment context, as opposed to the current interface. In the process of the change over, discovered several bugs in the second-best base code due to things getting out of sync, but these changes were resolved manually. All other integrationtests passed without modification.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2154 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-25 03:51:41 +00:00
ebanks
2cb3e53b0b
Verbose mode shouldn't be printing out 'NaN's and 'Infinity's
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2153 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 22:01:00 +00:00
rpoplin
c9ff5f209c
Added a CountCovariates integration test that uses a vcf file as the list of variant sites to skip over instead of the usual dbSNP rod.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2152 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 21:51:38 +00:00
ebanks
3484f652e7
1. Variation is now passed to VariantAnnotator along with the List of Genotypes so non-genotype calls has access to all relevant info.
...
2. Killed OnOffGenoype
3. SpanningDeletions is now SpanningDeletionFraction
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2151 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 21:47:20 +00:00
ebanks
e05cb346f3
GenotypeLocusData now extends Variation.
...
Also, Variations should be INSERTIONs or DELETIONs (and not just INDELs).
Technically, VCF records can be indels now.
More changes coming
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2150 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 21:07:55 +00:00
rpoplin
8b30279edc
style update
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2149 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 20:56:31 +00:00
rpoplin
dffa46b380
BAM files created by TableRecalibration now have the version number and list of covariates used appended to their header with a new 'PG' tag. Eventually the entire list of command line args will be put in there as well. Big thanks to Matt and Aaron. The integration test uses the --no_pg_tag so that the md5 doesn't change every time the version number changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2148 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 20:53:57 +00:00
aaron
8fbc0c8473
fix for bug GSA-234: fasta index files couldn't handle anything but letters, numbers, or spaces in the contig name
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2147 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 19:19:47 +00:00
andrewk
3fca23cd16
Added a stub treeReduce function for debugging multi-threaded execution.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2146 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:51:19 +00:00
rpoplin
277e6d6b32
Further optimizations of TableRecalibration. This completes my goal of having the only math done in the map function be addition, subtraction and rounding the quality score to an integer. Everything else has been moved to the initialize method and only done once.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2145 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:21:57 +00:00
andrewk
e4546f802c
Accumulates coverage across hybrid selection bait intervals to assess effect of bait adjacency. Requires input bait intervals that have an overhang beyond the actual bait interval to capture coverage data at these points. Outputs R parseable file that has all data in lists and then does some basic plotting.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2144 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:12:34 +00:00
andrewk
e5106c9924
Hybrid selection performance statistics now include counts of the number of adjacent baits (0,1,2) using OverlapDetector and optionally include assayed bait quantities input via interval lists.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2143 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 18:07:23 +00:00
ebanks
87c1860398
I'm not sure I believe it, but JProfiler claims that calling FourBaseProbs.isVerbose() was taking 5% of my runtime...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2142 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 17:00:32 +00:00
ebanks
b3f561710f
Optimizations:
...
1. Only do calculations in UG for alternate allele with highest sum of quality scores (note that this also constitutes a bug fix for a precision problem we were having).
2. Avoid using Strings in DiploidGenotype when we can (it was taking 1.5% of my compute according to JProfiler)
UG now runs in half the time for JOINT_ESTIMATE model.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@2141 348d0f76-0448-11de-a6fe-93d51630548a
2009-11-24 16:27:39 +00:00