Commit Graph

410 Commits (be31d7f4cc87b4ffd850cddb9c7c498afe9be3ac)

Author SHA1 Message Date
ebanks 1b214c0de5 Fixed logic: throw exception if contigs are NOT equal
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1827 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 02:48:44 +00:00
ebanks aeca14d052 On our side of 5CC, we spell multi M-U-L-T-I.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1826 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 01:41:25 +00:00
ebanks c9c8fd1fef Added the discovery LOD score to the meta data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1825 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-14 01:24:06 +00:00
ebanks b28446acac Multi-sample calls now have associated meta-data (SLOD, allele freq), which wil
l soon actually be used...


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1820 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 20:08:43 +00:00
aaron 77499e35ac fixes for GSA-199: Need easier way to write binary outputs to standard output. GLF and VCF now have stream constructors, and can get dumped to standard out.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1818 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 15:50:20 +00:00
ebanks be92a1e603 Don't try to close if the lazy initialize hasn't triggered
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1815 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-13 01:20:25 +00:00
aaron 62c484b57a Fixes for GSA-201, where enumerated types in command line arguments had to be defined as all uppercase for the system to work.
Also a little playground walker that changes the sort order flag of a BAM file.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1805 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 18:11:32 +00:00
hanna 32d55eb2ff Fix issue Eric was seeing with java.lang.Error in unmap0.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1804 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 17:46:56 +00:00
ebanks 9f3482ef11 VCF is both a multi- and single- sample format, so we shouldn't be throwing an exception when used for SS
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1803 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 17:43:26 +00:00
hanna f4b6afb42c JVM issue id 5092131 (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5092131)
was causing OOM issues with the new mmapping fasta file reader during large jobs.
Temporarily reverting the reader until a workaround can be found.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1801 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-09 04:45:46 +00:00
aaron 98e3a0bf1a VCF can now be emitted from SSG. The basic's are there (the genotype, read depth, our error estimate), but more fields need to be added for each record as nessasary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1797 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 19:50:04 +00:00
hanna fcb6a992c8 Switched IndexedFastaSequenceFile over to use memory mapping to load data rather than
the loop-with-small block size.  Performance improvements in loading refs are extreme;
segments can be loaded in <1ms.  chr1 in its entirety can be loaded in 1.5sec (down
from 30sec).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1781 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-08 00:07:15 +00:00
ebanks f9a1598d75 Reformatting
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1778 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 20:03:34 +00:00
aaron f9a0eefe4b GELI_BINARY is now functional, and can be used as a variant type in SSG (-vf=GELI_BINARY). Also fixed the max mapping quality column in both GELI output formats, we haven't been correctly outputing up until now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1774 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-07 18:20:34 +00:00
aaron 3aec76136f Removing the AllelicVariant interface, which is replaced by the Variation interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1770 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 17:44:24 +00:00
aaron 66fc8ea444 GSA-182: Adding support for BED interval files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1767 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-06 02:45:31 +00:00
aaron e885cc4b21 changes for corrected GLF likelihood output, along with better tests
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1754 348d0f76-0448-11de-a6fe-93d51630548a
2009-10-01 20:45:05 +00:00
aaron 2e4949c4d6 Rev'ing Picard, which includes the update to get all the reads in the query region (GSA-173). With it come a bunch of fixes, including retiring the FourBaseRecaller code, and updated md5 for some walker tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1751 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-30 20:37:59 +00:00
hanna 70e1aef550 Better integrate the @ArgumentCollection into the command-line argument parser. Walkers can now specify their own @ArgumentCollections. Also cleaned up a bit of the CommandLineProgram template method pattern to minimize duplicate code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1746 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 22:23:19 +00:00
ebanks 8bd345ba00 Generalized deletions in pileup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1739 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-29 15:58:43 +00:00
andrewk 5662a88ee1 Cosmetic change to list sampling functions: the typical usage of n and k were reversed. No change in functionality of the classes has been made and unit tests still pass.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1736 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-28 18:12:32 +00:00
aaron 7ffc1d97ef Cut DeNovoSNPWalker over to the new Variation system, some renaming of methods on the Variation interface, and some corrections on the interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1724 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-25 04:35:52 +00:00
aaron d262cbd41c changes to add VCF to the rod system, fix VCF output in VariantsToVCF, and some other minor changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1715 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-24 15:16:11 +00:00
aaron eeb14ec717 a couple of light changes to GenomeLocSortedSet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1708 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-23 20:38:53 +00:00
aaron 11c32b588f fixing VariantEvalWalkerIntegrationTest md5 sums, a couple comment changes, and a little bit of cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1690 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 20:54:47 +00:00
aaron 83a9eebcc4 fixed a bug I checked in that Eric found, for intervals with no start or stop coordinate. Now I owe Eric a cookie, and Milk Street is so far away. Damn.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1679 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 04:34:18 +00:00
ebanks 5ce42cbab3 After thinking about this a bit more, it makes sense to pull this functionality out of my walker and into the GenomeLocParser where everyone else can benefit from it...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1677 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-22 01:32:35 +00:00
aaron 7bfb5fad27 fixing the dbSNP test. Also removing unnessasary comments from the GenomeLocParser, added some tests, and commented out the performance test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1676 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 23:32:24 +00:00
aaron 39a47491a9 changes to make GenomeLoc string parsing 25% faster
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1675 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-21 22:37:47 +00:00
hanna 7f5778c966 Update gsadevelopers -> gsahelp.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1663 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-20 23:36:54 +00:00
aaron 3a487dd64e little fixes; also fixed a tyPo
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1662 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 22:38:51 +00:00
aaron b6d7d6acc6 fix for the eval tests, and a change to the backedbygenotypes interface, more changes to come
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1661 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 22:25:16 +00:00
aaron 7b39aa4966 Adding the VCF ROD. Also changed the VCF objects to much more user friendly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1658 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 20:19:34 +00:00
depristo 73bec6f36d Now uses expanding array list for coverage histograms. No hard limit on maximum depth now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1643 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 23:27:25 +00:00
ebanks a7c306f757 -deal with offsets that can be -1
-added option to have "D"s inserted for deleted bases in pileup strings


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1635 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 16:44:57 +00:00
aaron 542d817688 more cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1631 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 21:42:03 +00:00
aaron b401929e41 incremental clean-up and changes for VariantEval, moved DiploidGenotype to a better home, and fixed a spelling error.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1624 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 04:48:42 +00:00
asivache a009592662 the life in the magical kingdom of fully spec-conforming SAM files would be so... magical. For now, however, there are plenty of ways to end up with inconsistent SAM records. For instance, a SAM file with missing header will result in SAM records with ref. name set, but getReferenceIndex() returning null. This, in turn, was tripping isReadUnmapped(). The method is now fixed, so that it suffices to have *either* reference name *or* reference index set for the read to be considered mapped (the flag is still checked)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1612 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 16:04:19 +00:00
aaron e03fccb223 Changes to switch Variant Eval over to the new Variation system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1611 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 05:34:33 +00:00
depristo bd75a8d168 Unused code has been removed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1599 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:12:23 +00:00
depristo 3ad97e4ab4 Easier to print GenomeLoc compareTo()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1597 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:10:35 +00:00
depristo 17ab1d8b25 General purpose merging iterator implementation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1593 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:06:15 +00:00
aaron 296878e8e3 adding a basic implementation of the Variation interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1578 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 04:41:13 +00:00
aaron 5a64a80ab5 changes to the variation class, updates to SSG, updated tests based on changes to the SSGenotypeCall, and added the ability to run a single integration test from using the build script.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1577 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 04:31:33 +00:00
depristo 0093482c62 N reference base fix for SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1572 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 21:19:36 +00:00
depristo d9588e6083 bug fixes to LIBS and LIBH following ultra-aggressive regression testing across 454, solid, and solexa
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1558 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:36:12 +00:00
kiran f963cfcb21 Made enum listing header fields public.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1545 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-08 06:12:59 +00:00
depristo 1c3d67f0f3 Improvements to the CountCovariates and TableRecablirator, as well as regression tests for SLX and 454 data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1539 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 22:26:57 +00:00
depristo 2b0d1c52b2 General WalkerTest framework. Includes some minor changes to GATK core to enable creation of true command-line like GATK modules in the code. Extensive first-pass tests for SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1538 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 19:13:37 +00:00
aaron 0cc634ed5d -Renamed rodVariants to RodGeliText
-Remove KGenomesSNPROD
-Remove rodFLT
-Renamed rodGFF to RodGenotypeChipAsGFF
-Fixed a problem in SSGenotypeCall
-Added basic SSGenotype Test class
-Make VCFHeader constructors public

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1536 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 18:40:43 +00:00
depristo a08c68362e Renaming error to getNegLog10PError(); added Cached clearing method to GL; SSG now has a CallResult that counts calls; No more Adding class to System.out, now to logger.info; First major testing piece (and general approach too) to unit testing of a walker -- SingleSampleGenotyper now knows how many calls to make on a particular 1mb region on NA12878 for each call type and counts the number of calls *AND* the compares the geli MD5 sum to the expected one!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1530 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 12:39:06 +00:00
aaron 3c2ae55859 changes for the genotype overhaul. Lots of changes focusing on the output side, from single sample genotyper to the output file formats like GLF and geli. Of note the genotype formats are still emitting posteriors as likelihoods; this is the way we've been doing it but it may change soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1529 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 05:31:15 +00:00
kiran c3aaca1262 Improvements to make this work with uncompressed fastq files. Pulled the fastq parser out into it's own SAMFileReader-like entity.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1520 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 17:20:16 +00:00
asivache 58debd7e56 A convenience shortcut isReadUnmapped() added: thanks to SAM format specification, 'read unmapped' flag is not always required to be set for an unmapped read; this method checks both the flag and the alignment reference index/start (if those are set to '*' the flag is not required according to the spec!)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1506 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 17:00:39 +00:00
aaron 4a1d79cd7b added a flag, maximum_reads_at_locus, shortName "mrl", which limits the number of reads we add to the locusByHanger. In some bam files misalignment produces pile-ups of 750K or more reads. We now limit this to the default of 100K reads.
The user is warned if a locus exceeds this threshold, and no more reads are added.

Also CombineDup walker had an incorrect package name.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1496 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 04:21:58 +00:00
depristo 8e129d76fd Support for original quality scores OQ flag. pQ flag in TableRecalibation to preserve quality scores below a threshold (defaulting to 5)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1474 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 14:14:21 +00:00
depristo 4f7ed69242 toString() implemented
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1472 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 01:03:58 +00:00
depristo a639459112 Trival consistency change from char in to char out, not char in to byte out
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1466 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-27 23:37:37 +00:00
chartl 8740124cda @ListUtils - Bugfix in getQScoreOrderStatistic: method would attempt to access an empty list fed into it. Now it checks for null pointers and returns 0.
@MathUtils - added a new method: cumBinomialProbLog which calculates a cumulant from any start point to any end point using the BinomProbabilityLog calculation.

@PoolUtils - added a new utility class specifically for items related to pooled sequencing. A major part of the power calculation is now to calculate powers
             independently by read direction. The only method in this class (currently) takes your reads and offsets, and splits them into two groups
             by read direction.

@CoverageAndPowerWalker - completely rewritten to split coverage, median qualities, and power by read direction. Makes use of cumBinomialProbLog rather than
                          doing that calculation within the object itself.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1462 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-27 19:31:53 +00:00
chartl 92ea947c33 Added binomProbabilityLog(int k, int n, double p) to MathUtils:
binomialProbabilityLog uses a log-space calculation of the
       binomial pmf to avoid the coefficient blowing up and thus
       returning Infinity or NaN (or in some very strange cases
       -Infinity). The log calculation compares very well, it seems
       with our current method. It's in MathUtils but could stand
       testing against rigorous truth data before becoming standard.

Added median calculator functions to ListUtils

getQScoreMedian is a new utility I wrote that given reads and
       offsets will find the median Q score. While I was at it, I wrote
       a similar method, getMedian, which will return the median of any
       list of Comparables, independent of initial order. These are in
       ListUtils.

Added a new poolseq directory and three walkers

CoverageAndPowerWalker is built on top of the PrintCoverage walker
       and prints out the power to detect a mutant allele in a pool of
       2*(number of individuals in the pool) alleles. It can be flagged
       either to do this by boostrapping, or by pure math with a
       probability of error based on the median Q-score. This walker
       compiles, runs, and gives quite reasonable outputs that compare
       visually well to the power calculation computed by Syzygy.

ArtificialPoolWalker is designed to take multiple single-sample
       .bam files and create a (random) artificial pool. The coverage of
       that pool is a user-defined proportion of the total coverage over
       all of the input files. The output is not only a new .bam file,
       but also an auxiliary file that has for each locus, the genotype
       of the individuals, the confidence of that call, and that person's
       representation in the artificial pool .bam at that locus. This
       walker compiles and, uhh, looks pretty. Needs some testing.

AnalyzePowerWalker extends CoverageAndPowerWalker so that it can read previous power
calcuations (e.g. from Syzygy) and print them to the output file as well for direct
downstream comparisons.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1460 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-25 21:27:50 +00:00
aaron 811503d67b vcf changes from Richards comments, fixed a test case
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1456 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-25 14:32:16 +00:00
hanna ccdb4a0313 General-purpose management of output streams.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1454 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-23 00:56:02 +00:00
aaron b316abd20f catch a malformed column header name more gracefully
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1453 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 21:05:28 +00:00
aaron 0364f8e989 added the ability of the VCFReader to take in compressed gzipped files natively, which is really useful for the validator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1452 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 18:40:38 +00:00
aaron 647a367680 Made the size zero interval file checker emit a warnUser if we're not in unsafe mode.
Also changed the default logger level from error to warn.  Does anyone object?  It makes sense for users to always get their warn user statements in the default logging level.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1451 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 14:40:57 +00:00
aaron df9133c90b the doc on File.length states it returns 0L if it doesn't exist, added a check to make sure it exists (and length < 1)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1450 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 05:55:17 +00:00
aaron cd711d7697 Added detection of interval files with zero length to the GATK, and removed it from the interval merger walker: this was a critical blocking emergency issue for Eric.
also fixed some verbage in the GAEngine.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1449 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 05:35:49 +00:00
aaron 6313c465fb we want the RMS of the reads qualities not the RMS of the RMS of the read qualities.
Also the VCF version tag seems to be standardized as VCR.  Updated the VCF code.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1447 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-20 21:56:29 +00:00
aaron 0386e110cf some documentation changes, add a couple of simple checks
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1445 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-20 05:20:27 +00:00
aaron 5725de56dc fixes in VCF, some changes to get it ready to move out of the GATK
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1441 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 23:31:03 +00:00
aaron 0b927f44fa created a better seperation between instantiation of an VCF object and the object itself
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1440 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 20:32:50 +00:00
hanna 21091b9839 Fix for invalid format error when outputting BAM files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1438 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 19:42:39 +00:00
aaron 4cf9110468 Adding a lot of changes to the VCF code, plus a new basic validator. Also removing an extra copy of the Artificial SAM generator that got checked in at some point.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1437 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 05:08:28 +00:00
aaron 63d90702d6 another iteration of the VCFReader and VCFRecord, introducing the VCFWriter
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1429 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 22:17:34 +00:00
aaron 8403618846 the start to the VCF implementation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1425 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 04:34:15 +00:00
asivache 144b424933 Added : String reverse(String s) - reverses a string
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1416 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-12 20:16:22 +00:00
depristo bbd7bec5db Continuing cleanup of SSG. GenotypeLikelihoods now have extensive testing routines. DiploidGenotype supports het, homref, etc calculations. SSG has been cleaned up to remove old garbage functionality. Also now supports output to standard output by simply omitting varout
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1387 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 22:25:30 +00:00
hanna 48713e154c Windowed access to the reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1383 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 16:29:15 +00:00
depristo 5487ab0ee6 Added several useful routines to MathUtils for summing and bounds checking of doubles
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1379 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 00:41:31 +00:00
hanna 21d1eba502 Cleaned division of responsibilities between arguments to map function. Reference has been changed
from an array of bases to an object (ReferenceContext), and LocusContext has been renamed to reflect
the fact that it contains contextual information only about the alignments, not the locus in general.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1376 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 21:01:37 +00:00
depristo 4986b2abd6 Fixing bug in SSG -- genotyping and discovery were mixed up by name
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1371 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 22:13:35 +00:00
depristo 3485397483 Reorganization of the genotyping system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1370 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 20:55:31 +00:00
ebanks 9f1d3aed26 -Output single filtration stats file with input from all filters
-move out isHet test to GenotypeUtils so all can use it


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1369 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 20:44:21 +00:00
depristo 880a01cb5d Slight reorganization of genotype interface
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1367 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:18:41 +00:00
depristo d840a47b11 Slight reorganization of genotype interface
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1366 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:17:15 +00:00
depristo 20986a03de cleanup before moving files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1365 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:08:24 +00:00
ebanks e3b08f245f Pull out RMS calculation into MathUtils for all to use
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1364 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 17:00:20 +00:00
ebanks ba07f057ac finish the math for RMS
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1362 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 16:18:09 +00:00
aaron 9dfee7a75c the "-genotype" option now acts correctly as a discovery mode caller in SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1359 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 18:31:45 +00:00
aaron c2c80dd946 cleanup and moving some things around to more logical locations
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1358 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 16:28:39 +00:00
aaron 9a0761cd8f accidentally committed some debug code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1356 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 15:25:22 +00:00
aaron 2f2c8576a5 GLF output is now well validated, and some changes for new Genotypes interface code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1355 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-31 15:21:28 +00:00
aaron 2a7dfce9ae fix the header string mismatch that Andrew found
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1349 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 22:26:34 +00:00
aaron 0087234ed7 small code cleanup, a couple of little changes to SSGGenotypeCall
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1343 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 19:47:37 +00:00
aaron 4033c718d2 moving some code around for better organizations, some fixes to the fields out of SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1340 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 15:09:43 +00:00
aaron 9cd53d3273 some initial changes from the first review of the genotype redesign, more to come.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1338 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 07:04:05 +00:00
aaron bca894ebce Adding the intial changes for the new Genotyping interface. The bullet points are:
- SSG is much simpler now
- GeliText has been added as a GenotypeWriter
- AlleleFrequencyWalker will be deleted when I untangle the AlleleMetric's dependance on it
- GenotypeLikelihoods now implements GenotypeGenerator, but could still use cleanup

There is still a lot more work to do, but this is a good initial check-in.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1335 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 19:43:59 +00:00
hanna 7a13647c35 Support for specifying SAMFileReaders and SAMFileWriters as @Arguments directly. *Very*
rough initial implementation, but should provide enough support so that people can stop
creating SAMFileWriters in reduceInit.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1332 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 16:11:45 +00:00
ebanks 3c4410f104 -add basic indel metrics to variant eval
-variants need a length method (can't assume it's a SNP)!


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1324 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 03:25:03 +00:00
hanna 2024fb3e32 Better division of responsibilities between sources and type descriptors.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1314 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 22:15:57 +00:00