Commit Graph

454 Commits (d3b78338da2fcdd3c7711cbbaef7f9e4ea9ae3f9)

Author SHA1 Message Date
depristo 4f7ed69242 toString() implemented
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1472 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 01:03:58 +00:00
depristo a639459112 Trival consistency change from char in to char out, not char in to byte out
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1466 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-27 23:37:37 +00:00
chartl 8740124cda @ListUtils - Bugfix in getQScoreOrderStatistic: method would attempt to access an empty list fed into it. Now it checks for null pointers and returns 0.
@MathUtils - added a new method: cumBinomialProbLog which calculates a cumulant from any start point to any end point using the BinomProbabilityLog calculation.

@PoolUtils - added a new utility class specifically for items related to pooled sequencing. A major part of the power calculation is now to calculate powers
             independently by read direction. The only method in this class (currently) takes your reads and offsets, and splits them into two groups
             by read direction.

@CoverageAndPowerWalker - completely rewritten to split coverage, median qualities, and power by read direction. Makes use of cumBinomialProbLog rather than
                          doing that calculation within the object itself.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1462 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-27 19:31:53 +00:00
chartl 92ea947c33 Added binomProbabilityLog(int k, int n, double p) to MathUtils:
binomialProbabilityLog uses a log-space calculation of the
       binomial pmf to avoid the coefficient blowing up and thus
       returning Infinity or NaN (or in some very strange cases
       -Infinity). The log calculation compares very well, it seems
       with our current method. It's in MathUtils but could stand
       testing against rigorous truth data before becoming standard.

Added median calculator functions to ListUtils

getQScoreMedian is a new utility I wrote that given reads and
       offsets will find the median Q score. While I was at it, I wrote
       a similar method, getMedian, which will return the median of any
       list of Comparables, independent of initial order. These are in
       ListUtils.

Added a new poolseq directory and three walkers

CoverageAndPowerWalker is built on top of the PrintCoverage walker
       and prints out the power to detect a mutant allele in a pool of
       2*(number of individuals in the pool) alleles. It can be flagged
       either to do this by boostrapping, or by pure math with a
       probability of error based on the median Q-score. This walker
       compiles, runs, and gives quite reasonable outputs that compare
       visually well to the power calculation computed by Syzygy.

ArtificialPoolWalker is designed to take multiple single-sample
       .bam files and create a (random) artificial pool. The coverage of
       that pool is a user-defined proportion of the total coverage over
       all of the input files. The output is not only a new .bam file,
       but also an auxiliary file that has for each locus, the genotype
       of the individuals, the confidence of that call, and that person's
       representation in the artificial pool .bam at that locus. This
       walker compiles and, uhh, looks pretty. Needs some testing.

AnalyzePowerWalker extends CoverageAndPowerWalker so that it can read previous power
calcuations (e.g. from Syzygy) and print them to the output file as well for direct
downstream comparisons.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1460 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-25 21:27:50 +00:00
aaron 811503d67b vcf changes from Richards comments, fixed a test case
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1456 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-25 14:32:16 +00:00
hanna ccdb4a0313 General-purpose management of output streams.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1454 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-23 00:56:02 +00:00
aaron b316abd20f catch a malformed column header name more gracefully
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1453 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 21:05:28 +00:00
aaron 0364f8e989 added the ability of the VCFReader to take in compressed gzipped files natively, which is really useful for the validator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1452 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 18:40:38 +00:00
aaron 647a367680 Made the size zero interval file checker emit a warnUser if we're not in unsafe mode.
Also changed the default logger level from error to warn.  Does anyone object?  It makes sense for users to always get their warn user statements in the default logging level.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1451 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 14:40:57 +00:00
aaron df9133c90b the doc on File.length states it returns 0L if it doesn't exist, added a check to make sure it exists (and length < 1)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1450 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 05:55:17 +00:00
aaron cd711d7697 Added detection of interval files with zero length to the GATK, and removed it from the interval merger walker: this was a critical blocking emergency issue for Eric.
also fixed some verbage in the GAEngine.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1449 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 05:35:49 +00:00
aaron 6313c465fb we want the RMS of the reads qualities not the RMS of the RMS of the read qualities.
Also the VCF version tag seems to be standardized as VCR.  Updated the VCF code.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1447 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-20 21:56:29 +00:00
aaron 0386e110cf some documentation changes, add a couple of simple checks
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1445 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-20 05:20:27 +00:00
aaron 5725de56dc fixes in VCF, some changes to get it ready to move out of the GATK
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1441 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 23:31:03 +00:00
aaron 0b927f44fa created a better seperation between instantiation of an VCF object and the object itself
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1440 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 20:32:50 +00:00
hanna 21091b9839 Fix for invalid format error when outputting BAM files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1438 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 19:42:39 +00:00
aaron 4cf9110468 Adding a lot of changes to the VCF code, plus a new basic validator. Also removing an extra copy of the Artificial SAM generator that got checked in at some point.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1437 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 05:08:28 +00:00
aaron 63d90702d6 another iteration of the VCFReader and VCFRecord, introducing the VCFWriter
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1429 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 22:17:34 +00:00
aaron 8403618846 the start to the VCF implementation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1425 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 04:34:15 +00:00
asivache 144b424933 Added : String reverse(String s) - reverses a string
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1416 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 20:16:22 +00:00
depristo bbd7bec5db Continuing cleanup of SSG. GenotypeLikelihoods now have extensive testing routines. DiploidGenotype supports het, homref, etc calculations. SSG has been cleaned up to remove old garbage functionality. Also now supports output to standard output by simply omitting varout
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1387 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 22:25:30 +00:00
hanna 48713e154c Windowed access to the reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1383 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 16:29:15 +00:00
depristo 5487ab0ee6 Added several useful routines to MathUtils for summing and bounds checking of doubles
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1379 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 00:41:31 +00:00
hanna 21d1eba502 Cleaned division of responsibilities between arguments to map function. Reference has been changed
from an array of bases to an object (ReferenceContext), and LocusContext has been renamed to reflect
the fact that it contains contextual information only about the alignments, not the locus in general.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1376 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 21:01:37 +00:00
depristo 4986b2abd6 Fixing bug in SSG -- genotyping and discovery were mixed up by name
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1371 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 22:13:35 +00:00
depristo 3485397483 Reorganization of the genotyping system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1370 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 20:55:31 +00:00
ebanks 9f1d3aed26 -Output single filtration stats file with input from all filters
-move out isHet test to GenotypeUtils so all can use it


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1369 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 20:44:21 +00:00
depristo 880a01cb5d Slight reorganization of genotype interface
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1367 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:18:41 +00:00
depristo d840a47b11 Slight reorganization of genotype interface
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1366 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:17:15 +00:00
depristo 20986a03de cleanup before moving files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1365 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:08:24 +00:00
ebanks e3b08f245f Pull out RMS calculation into MathUtils for all to use
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1364 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 17:00:20 +00:00
ebanks ba07f057ac finish the math for RMS
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1362 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 16:18:09 +00:00
aaron 9dfee7a75c the "-genotype" option now acts correctly as a discovery mode caller in SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1359 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 18:31:45 +00:00
aaron c2c80dd946 cleanup and moving some things around to more logical locations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1358 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 16:28:39 +00:00
aaron 9a0761cd8f accidentally committed some debug code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1356 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 15:25:22 +00:00
aaron 2f2c8576a5 GLF output is now well validated, and some changes for new Genotypes interface code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1355 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 15:21:28 +00:00
aaron 2a7dfce9ae fix the header string mismatch that Andrew found
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1349 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 22:26:34 +00:00
aaron 0087234ed7 small code cleanup, a couple of little changes to SSGGenotypeCall
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1343 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 19:47:37 +00:00
aaron 4033c718d2 moving some code around for better organizations, some fixes to the fields out of SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1340 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 15:09:43 +00:00
aaron 9cd53d3273 some initial changes from the first review of the genotype redesign, more to come.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1338 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 07:04:05 +00:00
aaron bca894ebce Adding the intial changes for the new Genotyping interface. The bullet points are:
- SSG is much simpler now
- GeliText has been added as a GenotypeWriter
- AlleleFrequencyWalker will be deleted when I untangle the AlleleMetric's dependance on it
- GenotypeLikelihoods now implements GenotypeGenerator, but could still use cleanup

There is still a lot more work to do, but this is a good initial check-in.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1335 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 19:43:59 +00:00
hanna 7a13647c35 Support for specifying SAMFileReaders and SAMFileWriters as @Arguments directly. *Very*
rough initial implementation, but should provide enough support so that people can stop
creating SAMFileWriters in reduceInit.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1332 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 16:11:45 +00:00
ebanks 3c4410f104 -add basic indel metrics to variant eval
-variants need a length method (can't assume it's a SNP)!


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1324 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 03:25:03 +00:00
hanna 2024fb3e32 Better division of responsibilities between sources and type descriptors.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1314 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 22:15:57 +00:00
hanna 2db86b7829 Move the cleaned read injector test from playground to core. Remove CovariateCounterTest's dependency on the CleanedReadInjector. Start doing a bit of cleanup on the CLP's FieldParsers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1312 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 19:44:04 +00:00
ebanks 59f0c00d77 -set indel cleaning walkers to be in core package
-move Andrey's alignment utility classes to core


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1307 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 05:23:29 +00:00
kiran 7c20be157c Added ability to sample from a list *without* replacement.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1304 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 21:00:19 +00:00
ebanks 4efe26c59a Major: allow genotyper to optionally output in 1KG format, including outputting the samples in which indels are found.
Minor: refactor 454 filtering


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1300 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 19:53:51 +00:00
hanna 298cc24524 Fix minor bug introduced in filtration, and cleaned up the artificial sam records so that they use SAMRecord.NO_ALIGNMENT_REFERENCE_INDEX and SAMRecord.NO_ALIGNMENT_START rather than hardcoded -1's.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1296 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 22:37:41 +00:00
hanna cac04a407a For Manny: filter out reads where the the ref index ==
NO_ALIGNMENT_REFERENCE_INDEX but the alignment start != NO_ALIGNMENT_START.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1295 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 21:19:24 +00:00
depristo 9c12c02768 AlleleBalance and on/off primary base filters -- version 0.0.1 -- for experimental use only
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1294 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 17:54:44 +00:00
hanna 6e4fd8db4a Better formatting of available walkers, and only output them along with help. Cleanup JVMUtils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1290 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 22:23:28 +00:00
hanna b43925c01e Switched to Reflections (http://code.google.com/p/reflections/) project for
inspecting the source tree and loading walkers, rather than trying to roll
our own by hand.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1286 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 18:32:22 +00:00
aaron b4adb5133a GLF rod as a AllelicVariant object.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1282 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 00:55:52 +00:00
kcibul 129ad97ce5 performance improvement to GenomeLocParser -- moved regex pattern compile out of local field
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1272 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-17 02:56:25 +00:00
depristo 107f42a01e Hacks for getting GLFs support in the Rod system working
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1268 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 21:03:47 +00:00
ebanks 692b1e206f stop throwing an exception here: we don't always have allele counts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1259 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 20:34:01 +00:00
ebanks 5be5e1d45f added conversion from iupac format and new rod to deal with FLT file format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1254 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 18:34:41 +00:00
aaron 9ecb3e0015 adding GLFRods with tests and some other code changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1251 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 15:30:19 +00:00
aaron 99ddd8ab15 bug fix for transitioning between chromosomes in GLF output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1237 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 17:58:04 +00:00
aaron 01fc8da270 adding the GenotypeLikelihoodsWalker, which generates GLF genotype likelihoods that are pretty much identical to the samtools calls.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1235 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 16:57:18 +00:00
aaron 36819ed908 Initial changes to the SSG to output GLF by default
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1231 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 08:46:04 +00:00
aaron e4152af387 added a big speed-up for interval list input processing. With large interval sets this was taking way too long...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1227 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 22:00:00 +00:00
hanna 9f0fb9f3aa Fix for GSA-90: GATK banner and error messages should point to the wiki website.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1226 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 21:56:41 +00:00
hanna b18caa2052 Fix for GSA-90: System isn't failing with an error when you use the wrong reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1225 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 20:42:12 +00:00
hanna 5c321f9630 Oops! Accidentally deactivated the ArgumentFactory, needed by the CleanedReadInjector, while refactoring last night.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1223 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 16:41:55 +00:00
hanna b61f9af4d7 Cleaning up, preparing to incorporate a better fix for Eric's problems with validation stringency in BAM files opened directly from the walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1222 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 01:42:13 +00:00
hanna aa4f60d980 Make sure that only reads marked as 'mapped' are filtered based on validity of alignment.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1217 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 17:44:06 +00:00
hanna 03e1713988 Better support for specifying read filters to apply directly from the walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1212 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 23:59:53 +00:00
aaron d86717db93 Refactoring of the traversal engine base class, I removed a lot of old code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1209 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 21:57:00 +00:00
hanna 60a86fb34a Better handling of fasta files with non-standard extensions.x
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1206 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 18:18:48 +00:00
aaron 8ee5c7de8e GLF reader and writer check in.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1202 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 23:06:37 +00:00
hanna da4d26b1ea Enum support for command-line argument system, and some cleanup for hacks to the CleanedReadInjector that were required because Enum support was missing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1199 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 20:26:16 +00:00
aaron e106cf73d8 A quick change to provide more verbose output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1197 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 19:08:19 +00:00
hanna 433ad1f060 Cleanup...deprecate FastaSequenceFile2 in favor of IndexedFastaSequenceFile or ReferenceSequenceFile from Picard, depending on the application.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1196 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 18:49:08 +00:00
ebanks 787c84d68b only compare pair position for paired end reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1190 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 04:07:08 +00:00
andrewk d3daecfc4d Added unit tests for function in ListUtils to randomly sample lists with replacement, updated AlleleFrequencyEstimate to provide a callType of HomRef, HetSNP, HomSNP, update indices in CoverageEval.py, and made a lot of changes to CoverageWalker biggest one being that it directly calls SingleSampleGenotyper instead of implementing some parts of SSG itself.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1189 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 02:05:40 +00:00
hanna 4ba2194b5e Filter reads whose alignment starts past the end of the contig to which it allegedly aligns.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1188 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 22:27:44 +00:00
hanna 5d7393d7cb Temporary fix for Eric's problems with SOLiD reads: make sure the command-line argument system takes the --validation-strictness command-line argument into account when creating SAMFileReaders.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1183 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 15:18:05 +00:00
aaron 033bafe7a1 fixed sam by reads test for the new filtering code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1180 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 05:45:50 +00:00
aaron 2a86f2f833 an initial pass at the GLF reader, and some other genotype changes to phase out the LikelihoodObject I created.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1179 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 04:30:27 +00:00
hanna 5735c87581 Basic infrastructure for filtering malformed reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1178 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 22:50:22 +00:00
depristo b9d533042e Two-tailed HardyWeinberg test implemented. VariantEval now separate violations from summary outputs for clarity; Fixing problems with CovariateCounterTest and TabularRodTest
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1177 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 22:02:04 +00:00
hanna d19366eaad Cleanup emergency fixes for out-of-bounds issues in reference retrieval. Fix spelling mistakes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1173 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 15:41:30 +00:00
andrewk dcb8892568 Lot of code for coverage evaluation tools including first version of python script to evaluate the downsampled SSG callls made and the java code to make all the calls at Hapmap chip sites at various downsampling levels; ListUtils contains functions for randomnly subsetting lists (with replacement) which are useful for subsetting the same elements in both the reads and the offsets lists of a LocusWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1162 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 08:07:02 +00:00
depristo 6684cb8bc9 copySamFileHeader() utility function
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1154 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 12:55:51 +00:00
aaron d4d3af20f2 made a fake fasta generator, so we can now generate a complete bam / fasta combo of made up data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1150 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 21:35:34 +00:00
asivache 7462f3f344 Bug in setContig() fixed: sequence dictionary's .getSequences().contains() and .getSequences().indexOf() do NOT work when applied to contig names (Strings), since getSequences() returns a list of SAMSequenceRecord's; changed to querying the dictionary directly for specified contig name
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1147 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 20:50:09 +00:00
hanna b43d4d909e Fix CleanedReadInjectorTest to work with new CleanedReadInjector.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1142 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 15:48:06 +00:00
aaron f5cba5a6bb Fixed genome loc to be immutable, the only way to now change it's values is through the GenomeLocParser.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1132 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:17:24 +00:00
asivache 177d6d00b8 added setContigIndex(). NOTE: both setContig() and setContigIndex are UNSAFE as one does not automatically involve updating the other, and there's also no validation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1130 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 17:40:37 +00:00
aaron d7d4298917 Some files to support generic genotype outputing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1112 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 15:43:41 +00:00
depristo 5289230eb8 Version 0.2.1 (released) of the TableRecalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1108 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:50:55 +00:00
aaron 0c3aabd1c5 logger output should be less verbose by default. Also fixed a printout in my read validation walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1102 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 19:47:29 +00:00
aaron 4e04370f14 forgot a file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1096 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 17:56:17 +00:00
ebanks ea2426dcd0 one more change needed to commit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1093 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 15:09:53 +00:00
aaron 61ce4e5983 quick doc change
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1086 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 16:35:46 +00:00
kiran 3112302ec9 A priority-queue-like container that allows you to add a specified number of elements. When the limit has been reached, new additions replace the lower scoring elements.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1083 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 15:39:47 +00:00
depristo 0a50f2e160 Updated and near final version of tabular recalibration system. Uses 'yates' correction for low-occupancy quality bins. Faster and more robust handling of input and output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1082 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 03:52:12 +00:00
hanna ef546868bf Pooling of unmapped reads -- improves runtime of files with tons of unmapped reads by an order of magnitude.
Desperately needs cleanup.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1080 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 23:48:06 +00:00
aaron 4a92a999a0 made the constructors protected. Protected also mean package-protected, so other methods in the utils class can call these constructors (mainly the parser), as well as any inheriting classes. Also fixed some Intellij suggested clean-ups and documentation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1071 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 16:01:59 +00:00
aaron bcb64d92e9 Aaron: 1, GenomeLoc: 0. I changed our GenomeLoc class, seperating the creation of a genome loc (with the reference setup) to a parser class. GenomeLoc now just represents the actual genomic postion. The constructors are now package-protected (to enforce using the parser), but we may want to expose some constructors in the future.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1069 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 14:39:41 +00:00
depristo 8ac40e8e2d Updated version of the recalibration tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1060 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 17:45:47 +00:00
ebanks aef519b427 more comparisons
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1059 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 16:46:05 +00:00
kiran 0a0ef573f7 Methods for finding classes given a path and finding classes that implement a given interface. This stuff was mostly copied from private methods in WalkerManager, so there's some code redundancy. At some point, those calls could be replaced with these.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1053 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 22:43:19 +00:00
depristo d748c85dc4 Cleaned code and reorganized -- moving in the right direction for v2
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1052 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 22:28:34 +00:00
aaron b947fd586f FIxed a nasty bug in GenomeLoc compareContigs; we were using '==' to compare Integer contig ID's. The surprising thing is that it actually works for Integers > -127 and < 128 (they're cached by the JVM, so it's actually comparing the underlying ints). Switched over GenomeLoc contigs to int based.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1033 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 20:19:47 +00:00
hanna 43a28750e0 Package level documentation -- helps new users get acclimated to the codebase more quickly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1029 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 16:27:48 +00:00
depristo 7d281296a7 Finishing checking for building
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1027 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 14:12:40 +00:00
aaron 78b7fb25c7 allow contig names to have spaces in the fai. This is not yet supported by samtools fai generator (which truncates at the first space), but we might as well fix it on our side.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1022 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 22:23:12 +00:00
aaron 6ee64c7e43 added changes to support alec toUnmappedRead seek. Huge improvements (orders of magnitude) in unmapped read performance.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1021 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 22:15:56 +00:00
jmaguire 4f6d26849f Behold MultiSampleCaller!
Complete re-write of PoolCaller algorithm, now basically beta quality code. 

Improvements over PoolCaller include:

	- more correct strand test
	- fractional counts from genotypes (which means no individual lod threshold needed)
	- signifigantly cleaner code; first beta-quality code I've written since BaitDesigner so long ago.
	- faster, less likely to crash!	




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1020 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 20:03:24 +00:00
hanna 5859948e80 Fixed bugs in CleanedReadInjector arising from integration testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@999 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 17:37:33 +00:00
depristo fb7ba47fff Now does really neightbor distance calculation, as well as true snp cluster counting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@998 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 16:29:26 +00:00
hanna 71e3825fa1 First pass of a walker for Eric that searches through an input BAM file for unclean reads, injecting the cleaned reads in their place and outputting the composite result.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@989 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 20:18:13 +00:00
aaron 63b5c12cbd Changed dataSources to datasources, to be consistant with the rest of our package names. Also, this makes me champion in the largest check-in contest.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@985 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 18:13:22 +00:00
aaron 195b4ea7b4 a rename for consistancy of Sam to SAM, creating a genotype utils dir, and moving the GLF code into it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@984 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 17:46:06 +00:00
kiran b0cc763eb5 Added some methods to format bases such that read bases on the forward strand are in uppercase, while those on the negative strand are lowercase. This does *not* affect the default functionality of the standard PileupWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@969 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:31:00 +00:00
aaron ec2f015447 fixed a bunch of comments and license headers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@964 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 14:10:46 +00:00
kiran 2b0e7f612b Handles bam pileups where some of the reads have SQ tags and some don't.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@958 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:17:15 +00:00
aaron 36c98b9d6c added tools to test read based traversals using the artificial in-memory SAM file tools, and testing of the PrintReadsWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@957 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 01:52:25 +00:00
aaron eb962fe52a adding an artificial sam file writer, used to unit test some of the walkers (mainly the PrintReadsWalker)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@956 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 21:47:49 +00:00
kiran 681e67c72c Added some methods to generate random bases or random base indexes, optionally disallowing the generation of a specified base or base index.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@943 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:47:54 +00:00
asivache ce431b5d2d added hashCode()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@937 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 16:52:02 +00:00
asivache 3b4dc6e7b5 added sequencePeriod(String seq, int minPeriod) - finds smallest period equal to or greater than minPeriod for the specified text string seq; this is a trivial (hopefully correct) back-of-the-envelope implementation for a well-known and well-studied problem; there should be more efficient algorithms in the wild
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@925 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 23:05:24 +00:00
depristo 819862e04e major restructuring of generalized variant analysis framework. Now trivally easy to add additional analyses. Easy partitioning of all analyses by features, such as singleton status. Now has transition/transversional bias, counting, dbSNP coverage, HWE violation, selecting of variants by presence/absense in dbs. Also restructured the ROD system to make it easier to add tracks. Also, added the interval track -- if you provide an interval list, then the system autoatmically makese this available to you as a bound rod -- you can always find out where you are in the interval at every site. Python scripts improved to handle more merging, etc, into population snps.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@918 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 23:34:37 +00:00
aaron 199be46c36 changed the warning that is outputted when the GenomeLoc constructor can't find the given contig in the reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@913 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 15:49:03 +00:00
aaron 37efd78c7e fixed the logger call so we get output that indicates this class generated the message
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@911 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 15:02:17 +00:00
ebanks 36fb6ca3c5 Allow user to specify the compression to be used when writing out BAM files.
Updated most of the walkers to reflect this change.
Now it won't take forever to write BAMs!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@909 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 08:48:34 +00:00
aaron 109bef6c08 We're no longer in the read-dropping business.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@901 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 22:37:51 +00:00
hanna 40bc4ae39a The building blocks for segmenting covariate counting data by read group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@899 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 19:55:24 +00:00
depristo 13be846c2a qualsAsInt argument for Pileup -- fixing stupid bug [again]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@898 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 18:52:12 +00:00
depristo 97c8ff75dd qualsAsInt argument for Pileup -- fixing stupid bug
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@897 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 18:51:17 +00:00
depristo 9de3e58aa8 qualsAsInt argument for Pileup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@896 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 18:37:39 +00:00
asivache 4d654f30d4 slightly improved error message printed upon failure to parse interval list file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@895 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 18:24:43 +00:00
aaron 40af4f085c Adding some utilities to test unmapped reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@887 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-03 07:40:34 +00:00
hanna fa93661133 Eric wins the prize for pointing out that doubles weren't valid command-line arguments. Made all primitive types parseable as command-line arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@884 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 22:41:10 +00:00
depristo 7e7c83ddca fixing insidious bugs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@879 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 18:33:45 +00:00
hanna 6e60cddfed A fix for the 'rod blows up when it hits a GenomeLoc outside the reference' issu
e.  Really a stopgap; error handling in the RODs needs to be addressed in a more comprehensive way.  Right now, hasNext() isn't guaranteed to be correct.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@878 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 18:14:46 +00:00
aaron 82aa0533b8 added some more documentation to the GLF writer and it's supporting classes, and some other fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@875 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 14:53:58 +00:00
aaron e712d69382 GLF writing support
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@872 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 21:30:18 +00:00
hanna fc7320133c Cleaned up error when fasta index is missing. Code still throws an exception, but the message is more direct (no more 'error while micromanaging') and tells the user to run 'samtools faidx' to fix the issue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@867 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 15:34:38 +00:00
asivache d601548d53 added reallocate(int[] orig_array, int new_size) and int[] indexOfAll(String s, int ch); the former is self-explanatory, while the latter returns array of indices of all occurences of ch in the specified string
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@856 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 20:15:00 +00:00
asivache fe3b843b65 intercept NullPointerException and rethrow it with (marginally) comprehensible error message when an attempt to get class source code location fails
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@854 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 15:56:55 +00:00
aaron b43deda6c9 iterative changes to GLF files; also a test of checking-in over sshfs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@850 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:24:30 +00:00
hanna 5e8c08ee63 Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@849 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:13:01 +00:00
hanna aa17c4a468 Farewell, functionalj. You promised much, but you could not deliver.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@847 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 01:35:49 +00:00
aaron d275c18e58 adding some objects we need for the GLF format.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@846 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 22:32:25 +00:00
aaron 6fab1a64fa Started work on GLF input / output basics. Do not use.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@827 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 22:49:59 +00:00
hanna a488d2dbb2 Lazy creation of output streams. Only create output streams when absolutely necessary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@824 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:56:57 +00:00
asivache 9ef1a21112 minor changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@817 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:03:06 +00:00
aaron d994544c47 Added back end code support for Sharding based on genomic location for reads. Changed the sharding
code to take GenomeLocSortedSet instead of a list<GenomeLoc>, and added a bunch of much simplier 
and cleaner test cases.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@816 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 20:57:46 +00:00
aaron d056f9f3e8 Changed the name to reflect the sorted nature of the set, added some fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@810 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 22:34:24 +00:00
aaron 831d430025 Added a collection for storing GenomeLocs, that also has functions for removing by genomic region (that may span multiple GenomeLoc's in the collection), and adding regions, which are then merged with any overlapping regions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@809 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:52:40 +00:00
kiran 454a6d1df7 Fixed an egregious error in simpleReverseComplement wherein the RC'd string would be composed entirely of the last base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@804 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:32:20 +00:00
asivache 02fc4f145f refactoring: a couple of general purpose (hopefully useful?) methods/classes extracted into a standalone utils class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@802 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 18:54:40 +00:00
depristo 7a979859a9 Intermediate checking for evaluation -- now supports transition / transversion evaluation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@793 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:05:06 +00:00
depristo dc17a5661d Better accessors for dealing with second base prob pileups
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@785 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:25:16 +00:00
depristo d261459c48 Useful function to create a string with N copies of a same char
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@784 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:23:52 +00:00
kiran 83e1454a11 Added a method to determine the fraction of a sequence that's taken up by the most frequent base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@781 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:35:31 +00:00
kiran 1a9d5cea29 Added a method to reverse-complement a String object, preserving 'N' and '.' bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@776 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:39:39 +00:00
kiran a687c6bc03 Added a method to refresh an NFS mount point (necessary to prevent NFS flakiness when running on the LSF farm.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@774 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:31:54 +00:00
aaron 8515247575 Adding some functions I keep reinventing, especially for testing purposes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@772 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:30:44 +00:00
andrewk 0219d33e10 QualityUtils: added reverse function to reverse an array of bytes (and not complement it), BaseUtils: split qualToProb into itself and qualToErrProb, CovariateCounterWalker and LogisticRecalibrationWalker: several changes including a properly acocunting (only partly complete) for reversing AND complementing bases that are negative strand, PrintReadsWalker: created option to output reads to a BAM file rather than just to the sceern (useful for creating a downsampled BAM file)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@770 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 18:30:45 +00:00
hanna dc748d9c9c Integrate more feedback on command-line argument system. Focus on help
formatter: separate required from optional but otherwise keep ordering
the same, reorder GATK arguments by usage.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@764 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 19:01:25 +00:00
hanna 01a3cb27c7 @Required / @Allows flags for main arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@751 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-19 23:26:17 +00:00
kiran 40dbc21df7 Moved ParseException to it's own file and made it public.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@750 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-19 14:42:44 +00:00
hanna e6ce80c8e3 Fix for GSA-44...don't throw exception when user specifies -h.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@742 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 00:42:00 +00:00
hanna d35e20ce21 Better error checking for missing .dict file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@741 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 21:57:12 +00:00
hanna 7161b8f927 Disable support for short name values directly abutting their arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@740 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 16:09:32 +00:00
hanna d152c2b911 New GATKArgumentCollection caused a subtle bug with argument grouping and the help system. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@738 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-17 14:54:25 +00:00
depristo 8e9e2f4502 Revised ROD system. Split the system in Basic type and interface. Enabled more control over rod accessing, including an initialize() function to fetch headers and other options from the file. Added general tabular rod, which has a named columns and supports a map<String,String> interface. Comes with shiny new Junit system for RODs. Also, added simple python script for accessing picard data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@716 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 21:06:28 +00:00
hanna 67293168e7 Support periods in sequence names.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@715 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 20:17:57 +00:00
kiran 68c9455c0f Moved the base complement method to BaseUtils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@711 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 18:57:48 +00:00
kiran 64c65c7751 New methods to generated compressed SQ quality elements in line with the SAM spec.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@699 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:50:31 +00:00
hanna 12ae3a22b6 Break locus context data access providers into modular components in preparation for traverse by loci.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@689 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 18:51:16 +00:00
jmaguire 11723fbcc2 added method indelPileup. Generates a pileup of indel alleles given reads and ofsets (as from a locus walker).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@663 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:08:24 +00:00
hanna 32696b13f5 Fixed method override issue with old-style traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@660 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 01:22:18 +00:00
hanna 23e9e29964 Changed reads traversals from providing a LocusContext from which the reference sequence
could be extracted to a char[] containing the reference bases.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@657 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:45:11 +00:00
ebanks 009e71fcd9 We need to sort cleaned reads ourselves (instead of letting SAMFileWriter
do it) because the SAM headers are often screwed up and claim to be
"unsorted".  While here, I broke off the module from the SortSamIterator
in case someone else wants to use it.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@654 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 15:43:42 +00:00
aaron 4ce3feba4d my move ended up being a copy, so this is to delete dupplicate files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@651 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 02:10:26 +00:00
aaron 898f65547e Added code to split GenomeAnalysisTK.java into an object concerned with loading command line args, and one that runs the engines. This will allow us to run the GATK from other tools (like Matlab). Also some cleanup to seperate out the legacy traversals and the new style traversals. This is not live yet, and any modifications you need should be made to GenomeAnalysisTK.java for now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@650 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 02:07:20 +00:00
aaron ee02b61068 added support for the argument collections code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@648 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-09 07:07:33 +00:00
aaron 742840017b added the argument collection annotation for situations where fields in a command line args have embedded fields that should be checked for command line args
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@647 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-09 06:59:17 +00:00
aaron bae4256574 Started the process to make the GATK engine into a runnable object so we can call it from other processes. Step 1: make a configuration object that can serialize to and from an XML file. This way we can store the information everyone uses shell scripts for. Also we can now pull the list of params out of the GenomeAnalysisTK.java. More to come...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@636 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 01:25:26 +00:00
hanna 7f8850a8a2 Argument validation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@631 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 20:28:56 +00:00
hanna a3d8febbf2 Error message cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@630 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 19:31:32 +00:00
hanna c241d386a7 Beefed up command-line usage string.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@629 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 19:08:19 +00:00
depristo 5a6892900e fixing oddities in duplicates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@628 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:55:45 +00:00
depristo 93211c1cd8 template for windowmaker utility -- total non-functional
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@625 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:13:03 +00:00
depristo 71e8f47a6c boundQual function for capping qual values
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@623 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:04:18 +00:00
depristo e848f34896 countOccurances of char in string and max of a list of bytes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@622 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:03:49 +00:00
depristo 5a4bb76cc3 More capabilities for the pileup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@621 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:03:13 +00:00
depristo 89a26a7078 Utilities for handling duplicates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@620 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:02:24 +00:00
hanna 4f85062004 Cleanup parsing method to make it less generic.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@619 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 16:21:17 +00:00
hanna 2f3ab53888 Oops. Arguments didn't load into applications with non-plugins (basically everything except the GATK).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@617 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 13:37:19 +00:00
hanna 4177560543 Mutually exclusive options.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@616 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 13:27:48 +00:00
hanna 752928df94 Switch to better mechanism for supplying a default.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@615 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 01:22:01 +00:00
hanna 9c0b81e946 Default flags to 'not required'.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@612 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 22:09:49 +00:00
hanna 1fe8155111 Some critical fixes for cases where argument values directly abut argument names
and for arguments with missing short names.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@610 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 21:47:34 +00:00