gatk-3.8

Commit Graph

Author	SHA1	Message	Date
aaron	83a9eebcc4	fixed a bug I checked in that Eric found, for intervals with no start or stop coordinate. Now I owe Eric a cookie, and Milk Street is so far away. Damn. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1679 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-22 04:34:18 +00:00
ebanks	5ce42cbab3	After thinking about this a bit more, it makes sense to pull this functionality out of my walker and into the GenomeLocParser where everyone else can benefit from it... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1677 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-22 01:32:35 +00:00
aaron	7bfb5fad27	fixing the dbSNP test. Also removing unnessasary comments from the GenomeLocParser, added some tests, and commented out the performance test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1676 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 23:32:24 +00:00
aaron	39a47491a9	changes to make GenomeLoc string parsing 25% faster git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1675 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 22:37:47 +00:00
ebanks	b1dc6d65e4	interval merging is now blazingly fast git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1674 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 21:15:04 +00:00
asivache	15135788ca	OK, let's bite the bullet. Now rodDbSNP objects are 'isSNP()' only when they are annotated as 'exact', not a 'range'. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1673 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 19:25:16 +00:00
asivache	8ad181f46f	Note to myself: do 'ant clean' now and then or old versions of the code that suddenly became invalid will stick around. The world is not perfect, and neither is automatic dependency resolution. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1672 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 17:40:52 +00:00
asivache	fb09835ef8	Changed to accomodate new ROD system git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1671 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 17:10:56 +00:00
asivache	d2d1354199	Now uses BrokenRODSimulator class to pass the test. CHANGE the code to use new ROD system directly and MODIFY MD5 in corresponding tests, since a few snps are seen differently now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1670 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 17:03:49 +00:00
asivache	f4d270cba4	These classes now use BrokenRODSimulator class to pass the test. CHANGE the code to use new ROD system directly and MODIFY MD5 in corresponding tests, since a few snps are seen differently now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1669 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 17:03:15 +00:00
asivache	29adc0ca1c	Little class that can be used to simulate the results returned by the old ROD system. This is needed to keep couple of tests from breaking. All the code that uses this class must be changed urgently to accomodate the data as returned by new ROD system, and the corresponding tests (MD5 sums) have to be modified as well since some data as seen through the new ROD system is indeed different. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1668 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 16:58:56 +00:00
asivache	a6bd509593	Changing the carpet under your feet!! New incremental update to th eROD system has arrived. all the updated classes now make use of new SeekableRodIterator instead of RODIterator. RODIterator class deleted. This batch makes only trivial updates to tests dictated by the change in the ROD system interface. Few less trivial updates to follow. This is a partial commit; a few walkers also still need to be updated, hold on... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1667 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 16:55:22 +00:00
asivache	4c67a49ccb	Removed unused imports git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1666 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 16:45:22 +00:00
hanna	e7f44ada98	Make unpackList public static so that Doug can use it in the scatter/gather framework. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1665 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 15:32:49 +00:00
ebanks	7b627fd622	Check for empty interval lists to merge git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1664 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-21 04:34:26 +00:00
hanna	7f5778c966	Update gsadevelopers -> gsahelp. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1663 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-20 23:36:54 +00:00
aaron	3a487dd64e	little fixes; also fixed a tyPo git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1662 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 22:38:51 +00:00
aaron	b6d7d6acc6	fix for the eval tests, and a change to the backedbygenotypes interface, more changes to come git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1661 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 22:25:16 +00:00
depristo	4318f75910	tiny cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1660 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 21:04:25 +00:00
depristo	3a341b2f06	Fixes for VariantEval for genotyping mode git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1659 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 21:01:43 +00:00
aaron	7b39aa4966	Adding the VCF ROD. Also changed the VCF objects to much more user friendly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1658 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 20:19:34 +00:00
sjia	83e6e5a3e4	Calculates Probability for each allele combination (using likelihood score and allele frequencies only) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1656 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 18:46:38 +00:00
ebanks	b19fd4d45c	Damn unit tests have a null Toolkit()... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1654 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 17:10:49 +00:00
ebanks	90626c843d	oops - we don't need reference bases, but we still need reference git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1653 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 16:24:45 +00:00
ebanks	2b2df4e1ba	- Fix the CleanedReadInjector to deal with -L intervals correctly. - Some walkers don't use the ref base, so speed up traversals by not requiring it git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1652 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 16:17:58 +00:00
ebanks	7da9ff2a9e	Put back the check that both chip and variant are not null. Also, sanity check that ref is not 'N'. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1651 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 16:03:54 +00:00
asivache	94618044e8	Starting an update of ROD system. These basic classes will completely replace old ones, but with this update they are not linked to anything, so this checkpoint should be safe. The main reason for the change is that there can be (and are!) multiple RODs overlapping with a single reference base position in a single track. There can be two "trivial" RODs at the same location (e.g. samtools pileup will have two point-like records at putative indel sites: one for the reference, the other one for the indel itself). Or there can be one or more "extended" RODs (length >1), eg. dbSNP can report an indel at Z:510-525 AND a SNP at Z:515. The ReferenceOrderedDatum object (and children) will not be changed, but it is now explicitly interpreted as a single data record, possibly out of many available from a given track for the current site. As long as single data record occupies one line in a data file, the new ROD system will take care of loading and keeping multiple records, including extended (length > 1) ones, and will automatically drop the records when they finally go out of scope. For one-line-per-record, multiple-records-per-site RODs, there is no need anymore for the hack used so far that involved passing ROD's own implementation of iterator through reflection mechanism (though it will still work) * RODRecordList: the ROD system (its iterators) will now always return a LIST of all RODs available at current position or at current query interval (see below). This class is a trivial wrapper for a list of ROD objects, with added location argument for the whole collection. The location of the RODRecordList is where the ROD system is currently sitting at: a single, current base on the reference (if next() traversal is performed), or the location of the query interval when returned by seekForward() (see below). The ROD objects themselves will have their locations set according to the original data in the file. Hence, perusing the above example of a dbSNP indel at Z:510-525 and SNP at Z:515, when moving to the position Z:515 the ROD system will return a RODRecorList with location Z:515, and with two ROD objects packaged inside, one with location Z:510-525, the other with Z:515. RODRecodIterator: Almost identical to old SimpleRODIterator used by ReferenceOrderedData; this is a low-level iterator that walks over records in the data file (with a callback to ROD's ::parseLine() to parse real data) SeekableRODIterator: a decorator class that wraps around Iterator<ROD> (such as RODRecordIterator) and makes the data traversable by reference position, rather than record by record. This is reimplementation of the old RODIterator. SeekableRODIterator's ::next() moves to the next position on the ref and returns all RODs overlapping with that position (as a RODRecordList). This iterator also adds a seekForward(loc) operation, that allows fast forwarding to a specified position or interval. Length > 1 query arguments (extended intervals) are fully supported by seekForward(), the returned RODRecordList wil contain all RODs overlapping with the specified interval, and the location of the returned RODRecordList object will be set to that query interval. NOTE: it is ILLEGAL to perform next() after a seekForward() query with length > 1 interval. seekForward() with point-like (length=1) interval reenables next(). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1650 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 15:58:37 +00:00
ebanks	66a4de9a1d	Genotype check should be case-insensitive git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1649 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 03:23:30 +00:00
hanna	c186a49d55	Time for a reorganization. Repackage generally useful alignment classes lower in the package structure, and create a subpackage for bwa-specific code. Repackage BWA alignment code away from BWT representation. Isolate byte- and word-packing streams in another package that will ultimately be killed off en masse. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1648 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-17 23:28:47 +00:00
hanna	b4df089b59	Putting some of the required data structures together for imperfect lookup. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1647 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-17 22:43:11 +00:00
hanna	355136928e	Play nice with other jobs in this VM -- don't close stdout / stderr. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1646 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-17 18:55:08 +00:00
sjia	0e73b2ba8e	Use population allele frequencies to distinguish between top candidates git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1645 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-17 15:49:19 +00:00
chartl	534486a254	Output formatting changed: - summary output now reported as a percentage rather than proportion; 2 sigfigs - fixed minor bug where FNR was calculated over total calls rather than total variant sites - column headers are_now_contiguous_strings - spacing fixed - "No Call" separated from "Ref Call" as its own column git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1644 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-17 14:00:25 +00:00
depristo	73bec6f36d	Now uses expanding array list for coverage histograms. No hard limit on maximum depth now git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1643 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 23:27:25 +00:00
chartl	4ad46590a3	Changes to PooledGenotypeConcordance: Additional output & better output formatting. It has now undergone a good five hours of testing; and for pools of size 1 outputs exactly the same statistics as GenotypeConcordance (when GenotypeConcordance is modified to do nothing on reference='N'); and for pools of many sizes outputs close to the expected (by genetics) statistics. Looks like this is working properly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1642 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 21:45:01 +00:00
chartl	386a6442ba	Actually deleted now. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1641 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 20:28:06 +00:00
chartl	8fce376792	Changes: Deletion: PooledGenotypeConcordanceNew Rewrite: PooledGenotypeConcordance. It works, and is blazing fast compared to the earlier version (1 order of magnitude speedup)! And is now entirely non-hackey, as opposed to before when there were some hacky bits. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1640 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 20:22:16 +00:00
asivache	3e289fcaa4	A little piece that PairMaker needs in order to compile ;) Iterates synchronously over two (name-ordered) single-end alignment SAM files with, possibly, multiple alignments per read and for each read name encountered returns pairs<all alignments for end1, all alignments for end2> git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1639 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 19:17:40 +00:00
asivache	2f29cf59ba	Very early, half-baked version. All it can do right now is to take two SAM files with end1 and end2 individual single-end alignmnets from a pair-end run and spit out a "paired" BAM file that contains ONLY properly paired ends (both ends align uniquely && both ends align to the same chromosome && the ends align in proper orientation). Insert size is currently not used (and not set in the output). Unpaired/unmapped reads are NOT transferred into the output bam. For the pairs that do get written, the output is (should be) standard-conforming: all flags are properly set and mate pair information is correct. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1637 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 18:38:18 +00:00
ebanks	5d85bd9671	By default, VF should ask for deleted bases so that they show up in coverage. The Strand filter then needs to ignore those bases when determining bias. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1636 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 16:46:09 +00:00
ebanks	a7c306f757	-deal with offsets that can be -1 -added option to have "D"s inserted for deleted bases in pileup strings git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1635 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 16:44:57 +00:00
hanna	01a9b1c63b	Fix for problem where err stream remapped to output stream in certain cases, (hopefully) completing Matt's hat trick of fail. Thanks, unit tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1634 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 08:33:56 +00:00
aaron	eedf55e94d	temp fix for a broken test, we'll fix the test tomorrow. We promise, we're engineers, we love our tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1633 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 04:36:42 +00:00
chartl	f6bdb47bb6	Addition: @PooledGenotypeConcordanceNew - a new version of the pooled genotype concordance test for Variant Eval. Code altered to be more extensible, use a private class for handling the count tables so it doesn't gunk up the code in the test itself, and for easy debugging. The hackier methods from the original were rewritten properly. Currently computes more statistics that it outputs. Code compiles, is never called by anything, and breaks none of the tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1632 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 04:14:58 +00:00
aaron	542d817688	more cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1631 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-15 21:42:03 +00:00
hanna	9f7cf73411	Output stream management fixes. I completely screwed up the output stream management system, but cleverly masked this fact by breaking some other stream management functionality that masked the problem. Sigh. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1630 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-15 21:06:45 +00:00
hanna	17758b381c	Properly initialize redirected output streams in case of out and err. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1629 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-15 19:47:43 +00:00
andrewk	00dfe014b7	Added option to FastaReferenceWalker to change output FASTA file format's line width and to remove header lines; allows dumping raw sequence using intervals git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1628 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-15 18:00:30 +00:00
hanna	b69eb208a6	Always create output files, even if no output was written to them. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1627 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-15 17:58:14 +00:00
aaron	b401929e41	incremental clean-up and changes for VariantEval, moved DiploidGenotype to a better home, and fixed a spelling error. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1624 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-15 04:48:42 +00:00
ebanks	6783fda42a	Updated unit test to reflect changes to vcf output git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1623 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-15 01:56:08 +00:00
andrewk	fb254759cb	Trivial: Don't print reduce result git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1621 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 23:42:20 +00:00
hanna	118071cfd8	Proof-of-concept perfect read aligner, implemented as described in sec 2.4 of BWA paper. Has successfully aligned a handful of reads. Requires significant cleanup and refactoring. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1617 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 21:54:56 +00:00
ebanks	01e7b39c8d	1. Don't print out values in filter field of the VCF. 2. Fix ratio printouts (for params file) 3. Rename ratio filter's get counts method to avoid confusion; more changes on the way this week. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1616 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 21:03:39 +00:00
ebanks	436f543b3b	I owe Doug a beer for finding this: don't print out intervals to be merged if they're not within the global -L intervals git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1615 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 20:22:30 +00:00
chartl	7d6d114ab5	Additions: @NQSMismatchCovariantWalker - Walks along the gene calculating the table # NQS # Q score # mismatches at non-dbsnp sites # total number of bases at non-dbsnp sites And prints it out at the end. Changes: @PooledGenotypeConcordance now works. Takes a path to a file listing a bunch of hapmap IDs in whatever pool we want to check, reads those in, and checks for concordance by name. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1614 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 20:12:04 +00:00
sjia	9be1832d7b	Phasing version 1 git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1613 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 16:10:37 +00:00
asivache	a009592662	the life in the magical kingdom of fully spec-conforming SAM files would be so... magical. For now, however, there are plenty of ways to end up with inconsistent SAM records. For instance, a SAM file with missing header will result in SAM records with ref. name set, but getReferenceIndex() returning null. This, in turn, was tripping isReadUnmapped(). The method is now fixed, so that it suffices to have either reference name or reference index set for the read to be considered mapped (the flag is still checked) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1612 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 16:04:19 +00:00
aaron	e03fccb223	Changes to switch Variant Eval over to the new Variation system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1611 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 05:34:33 +00:00
aaron	5b41ef5f70	rod DBSNP had a bug where the reference wasn't calculated correctly under certain conditions. Fixed getRefBasesFWD and getRefSnpFWD so that they were more in line with getAltBasesFWD and getAltSnpFWD. Also updated Variant Eval tests to reflect this change. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1609 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 23:48:58 +00:00
chartl	5cf1d6c104	Bugfix - this walker was never changed to work with the new PoolUtils methods after those methods were changed to return ReadOffsetQuad objects rather than nested pairs. This broke the build :(. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1608 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 19:39:23 +00:00
ebanks	c669e8d5ad	Use constant seed in the random generator so we can be stable (and thus unit tests will work) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1607 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 17:40:56 +00:00
ebanks	15178977e1	Naive tool to convert from vcf to geli text git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1606 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 17:25:02 +00:00
chartl	794bd26b20	Changed some ShortNames so they made more sense. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1604 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 01:32:12 +00:00
chartl	b353bd6f81	Added a Quad toString() method. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1603 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 01:13:57 +00:00
chartl	2e237a12e9	This commit has a bunch to do with cleaning up the CoverageAndPowerWalker code: implementing some new printing options, but mostly altering the code so it's much more readable and understandable, and much less hacky-looking. ADDED: @Quad: This is just like Pair, except with four fields. In the original CoverageAndPowerWalker I often used a pair of pairs to hold things, which made the code nigh unreadable. @SQuad: An extension of Quad for when you want to store objects of the same type. Let's you simply declare new SQuad<X> rather than new Quad<X,X,X,X> @ReadOffsetQuad: An extension of Quad specifically for holding two lists of reads and two lists of offsets Supports construction from AlignmentContexts and conversion to AlignmentContexts (given a GenomeLoc). There are methods that make it very clear what the code is doing (getSecondRead() rather than the cryptic getThird() ) @PowerAndCoverageWalker: The new version of CoverageAndPowerWalker. If the tests all go well, then I'll remove the old version. New to this version is the ability to give an output file directly to the walker, so that locus information prints to the file, while the final reduce prints to standard out. Bootstrap iterations are now a command line argument rather than a final int; and users can instruct the walker to print out the coverage/power statistics for both the original reads, and those reads whose quality score exceeds a user-defined threshold. CHANGES: @PoolUtils: Altered methods to accept as argumetns, and return, Quad objects. Added a random partition method for bootstrapping. @CoverageAndPowerWalker: Altered methods to work with the new PoolUtils methods. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1602 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 01:00:04 +00:00
depristo	6c7a300664	Missing file git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1601 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:17:09 +00:00
depristo	6e13a36059	Framework for ROD walkers -- totally experiment and not working right now git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1600 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:13:15 +00:00
depristo	bd75a8d168	Unused code has been removed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1599 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:12:23 +00:00
depristo	e8d544869d	Alignment context now supports the idea of skipped bases -- not currently in use git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1598 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:11:38 +00:00
depristo	3ad97e4ab4	Easier to print GenomeLoc compareTo() git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1597 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:10:35 +00:00
depristo	3949b4ac72	commented out version of next() and hasNext() that appear to be correct but are causing testing problems git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1596 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:09:21 +00:00
depristo	58105636c8	getBoundRods() convenience method git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1595 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:07:57 +00:00
depristo	4e1eded389	Fixed bad compareTo operator git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1594 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:07:10 +00:00
depristo	17ab1d8b25	General purpose merging iterator implementation git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1593 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:06:15 +00:00
hanna	275707f5f6	Data structure for counts, to isolate the user from wonky 'sometimes counts are cumulative, other times base-by-base' gotchas. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1592 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 20:53:24 +00:00
depristo	7c8b17b456	fix for SSG with pl name git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1591 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 20:39:34 +00:00
andrewk	5354c1876c	De Novo SNP caller as presented at 1KG meeting on 9/10/09 with min LOD 5 calls required from both parents and a LOD 5 call in the daugter gold standard concordant call set. All SNP calls must be present as bound RODs. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1590 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 19:30:23 +00:00
hanna	0f3049652a	Start to build BWT abstractions, so we can present a reasonable facsimile of the BWT to the user no matter how it's represented on disk. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1589 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 18:23:15 +00:00
chartl	c3f77acd5e	Alteration to CoverageAndPowerWalker. It can now be flagged with -uc which will cause it to print not only the coverage on each strand that exceeds the quality score threshold, but also the total coverage on each strand as well. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1588 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 17:55:44 +00:00
chartl	d6a0b65ac9	Changes: Rollback of Variant-related changes of r1585, additional PGC code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1586 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 16:23:01 +00:00
chartl	0c54aba92a	Changes: @VariantEvalWalker - added a command line option to input a file path to a pooled call file for pooled genotype concordance checking. This string is to be passed to the PooledGenotypeConcordance object. @AllelicVariant - added a method isPooled() to distinguish pooled AllelicVariants from unpooled ones. @ all the rest - implemented isPooled(); for everything other than PooledEMSNProd it simply returns false, for PooledEMSNProd it returns true. Added: @PooledGenotypeConcordance - takes in a filepath to a pool file with the names of hapmap individuals for concordance checking with pooled calls and does said concordance checking over all pools. Commented out as all the methods are as yet unwritten. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1585 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 15:01:50 +00:00
ebanks	e24c8d00d5	So, the VCF spec allows for an optional meta field in the header representing the date. However, using this field means that integration tests run on the vcf file will fail the MD5 test (which is what happened to the VariantFiltration test this morning after working just fine yesterday). After consulting our resident expert (Aaron), we're going to (temporarily) remove the date from the vcf output until we can come up with a better solution. However, this shouldn't cause any short-term problems because the data truly is optional. VF test's MD5s are updated. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1580 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-10 14:28:43 +00:00
aaron	296878e8e3	adding a basic implementation of the Variation interface. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1578 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-10 04:41:13 +00:00
aaron	5a64a80ab5	changes to the variation class, updates to SSG, updated tests based on changes to the SSGenotypeCall, and added the ability to run a single integration test from using the build script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1577 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-10 04:31:33 +00:00
depristo	c988205884	Notes for Aaron in SSG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1576 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-10 03:18:51 +00:00
ebanks	1362a56227	Added fasta tests and small fix to cleaner test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1575 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-10 03:13:11 +00:00
ebanks	8ca89279aa	Added a test for VariantFiltration and the VECs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1574 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-10 02:21:14 +00:00
hanna	6de54dcd2a	Higher-level readers and writers for BWTs and suffix arrays. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1573 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 22:45:32 +00:00
depristo	0093482c62	N reference base fix for SSG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1572 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 21:19:36 +00:00
hanna	bc9fe31cf5	Cleanup of int-packed file readers / writers. All primitive writers for BWTs and SAs are in place; time to move on to compound reader / writers. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1571 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 20:36:39 +00:00
asivache	d9f3e9493f	Does not return 0-length cigar elements anymore (used to do so when previous cigar element ended exactly at the segment boundary) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1570 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 20:05:55 +00:00
ebanks	cb31d5a0ab	VariantFiltration now outputs VCF. Important changes: 1. VariantsToVCF can now be called statically to output VCF for a single ROD instance; this is temporary until we have a VCF ROD. 2. VariantFiltration now outputs only 2 files, both mandatory: all variants that pass filters in geli text, and all variants in VCF. If there are any problems, go find Aaron. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1569 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 20:04:32 +00:00
asivache	dd0085c428	1) now is tolerant to sloppy cigar strings with 0-length elements (at the price of extra recursive call) 2) when reads with deletions are requested, adds to the pile just those: reads with 'D' over the current reference base, but not 'N' 3) next() now implements a loop: recursive forward iteration calls to next() until ref. position with non-zero coverage is encountered were OK for (short) deletions, but with long stretches of N's they end up with stack overflow git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1568 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 20:04:04 +00:00
ebanks	542af6402e	output correct format for Sequenom SNPs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1567 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 19:21:53 +00:00
hanna	43d1c6741c	Cleanup. Separate common packing functionality into utils class. Make base packing utility as generic as possible. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1566 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 17:54:12 +00:00
kiran	3b1e966b4c	Lowercases the sequencing platform so that a difference in case doesn't lead to the failure to look up an entry in the hash. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1565 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 17:35:45 +00:00
kiran	d82d6c0665	Excludes variants that fall below a certain LOD that changes as a function of depth. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1564 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 17:34:16 +00:00
kiran	06eae52292	Throws an exception if you attempt to use a filter that doesn't exist. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1563 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 17:33:27 +00:00
asivache	1060b36288	Bug fix: 'N' cigar elements now treated properly; for all practical intents and purposes, N is the same as D and should be treated as such, the difference is only in logical interpretation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1562 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 17:08:35 +00:00
ebanks	bed646e4f6	Adding cleaner test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1561 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 16:05:56 +00:00
chartl	9c7f456510	Changed the short name on the PoolSize cmd line argument git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1560 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:53:22 +00:00
chartl	9d69bd2c84	Modifications: @CoverageAndPowerWalker - removed a hanging colon that was being printed after the reference position @VariantEvalWalker - added a command line argument for pool size for eventual use in doing pooled caller evaluations. As now, the variable is unused. @AlignmentContext - altered the scope of class variables from private to protected in order that child objects might have access to them New Additions: Filtered Contexts Sometimes we want to filter or partition reads by some aspect (quality score, read direction, current base, whatever) and use only those reads as part of the alignment context. Prior to this I've been doing the split externally and creating a new AlignmentContext object. This new approach makes it a bit easier, as each of these objects are children of AlignmentContext, and can be instantiated from a "raw" AlignmentContext. @FilteredAlignmentContext is an abstract class that defines the behavior. The abstract method 'filter' is called on the input AlignmentContext, filtering those reads and offsets by whatever you can think of. The filtered reads/offsets are then maintained in the reads and offsets fields. These classes can be passed around as AlignmentContexts themselves. Writing a new kind of read-filtered alignment context boils down to implementing the filter method. @ReverseReadsContext - a FilteredAlignmentContext that takes only reads in the reverse direction @ForwardReadsContext - a FilteredAlignmentContext that takes only reads in the forward direction @QualityScoreThresholdContext - a FilteredAlignmentContext that takes only reads above a given quality score threshold (defaults to 22 if none provided). A unit test bamfile and associated unit tests for these are in the works. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1559 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:49:52 +00:00
depristo	d9588e6083	bug fixes to LIBS and LIBH following ultra-aggressive regression testing across 454, solid, and solexa git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1558 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:36:12 +00:00
asivache	0721c450c2	Bug fix: single unmapped read now keeps mapping qual 0 after remapping, not 37! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1557 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:29:34 +00:00
asivache	df11618092	Set default value of useLocusIteratorByHanger to FALSE. Otherwise the -LIBH flag is useless and there'd be no wayto "unset" the 'true' value. Old version was (always) using LocusIteratorByHanger. Now default iterator is indeed LocusIteratorByState, and -LIBH will switch back to the old one git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1556 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:09:09 +00:00
aaron	0df6a9da5c	-Seperating out normal (unit) tests and integration tests. From now on if your test are more of an integration test (i.e. you're testing a walker and all the subunits it relies on) please name the test "______IntegrationTest.java" instead of "______Test.java". -Bamboo will now run the integration tests once a day, and the normal units tests on each check-in. -Also added a bunch of unit tests for VariantEval walker git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1555 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:01:40 +00:00
depristo	eeb9b6eb13	GenotypeLikelhoods now support a cache per subclass, avoiding genotyping clashes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1554 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 10:39:14 +00:00
ebanks	0cc219c0df	-Added unit test for walkers dealing with intervals for cleaning -I also uncovered a corner case in the cleaner that for some reason was commented out but shouldn't have been. Hooray for unit tests! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1553 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 02:35:17 +00:00
depristo	ec0f6f23c7	LocusIterationByState is now the system deafult. Fixed Aaron's build problem git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1552 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 01:28:05 +00:00
aaron	ea6ffd3796	initial VariantEvalWalker test. More to be added soon... Also fixed the case where MD5 sums had leading zero's clipped off git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1551 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 01:02:04 +00:00
hanna	adce3bd536	My reference implementation is now generating a BWT which matches BWT-SW's. Note to self: never give project status in an svn log. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1550 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 22:11:03 +00:00
hanna	f22f590192	Successfully writing .sa files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1549 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 17:34:34 +00:00
sjia	600c234643	Starting code on phasing git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1548 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 15:20:38 +00:00
aaron	3276e01e5f	fixing the build git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1546 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 13:13:55 +00:00
kiran	f963cfcb21	Made enum listing header fields public. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1545 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 06:12:59 +00:00
kiran	fd20f5c2e8	For a file or files backed by a ROD implementing AllelicVariant, outputs a VCF file summarizing the information. Metadata like Hapmap and dbSNP membership, genotype LOD, read depth, etc, are annotated appropriately. The results output by this program are equivalent to those given by Gelis2PopSNPs.py. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1544 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 06:12:18 +00:00
ebanks	4a95f2181d	print out the right variant git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1543 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-08 01:37:35 +00:00
sjia	5791da17ae	Updated to reference HLA database of unique 4 digit alleles git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1542 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-07 22:12:56 +00:00
ebanks	5dbba6711c	Lots of changes: (I'll send email out in a sec) 1) Moved various disparate concordance / set splitting functionalities to a new parent tool which works like VariantFiltration (i.e. people can write various modules that fit inside and can be run though it). 2) Fixed up argument parsing in VariantFiltration to use key=value format so we don't accidentally mox up values (like I had been doing). 3) Have indel rod print samples git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1540 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-07 01:12:09 +00:00
depristo	1c3d67f0f3	Improvements to the CountCovariates and TableRecablirator, as well as regression tests for SLX and 454 data git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1539 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 22:26:57 +00:00
depristo	2b0d1c52b2	General WalkerTest framework. Includes some minor changes to GATK core to enable creation of true command-line like GATK modules in the code. Extensive first-pass tests for SSG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1538 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 19:13:37 +00:00
sjia	471ca8201e	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1537 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 19:12:46 +00:00
aaron	0cc634ed5d	-Renamed rodVariants to RodGeliText -Remove KGenomesSNPROD -Remove rodFLT -Renamed rodGFF to RodGenotypeChipAsGFF -Fixed a problem in SSGenotypeCall -Added basic SSGenotype Test class -Make VCFHeader constructors public git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1536 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 18:40:43 +00:00
ebanks	fd1c72c151	Fixed package name git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1535 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 15:40:06 +00:00
ebanks	6c476514f8	Moved to core. Wiki pages are going up; unit tests will be written soon. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1533 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 15:09:11 +00:00
ebanks	42c71b4382	Fix for Kris: now SNPs aren't masked by default (only when they come from a mask rod) and we can design Sequenom validation assays for them. I'll move this all to core in a bit... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1532 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 14:52:06 +00:00
ebanks	849dce799d	This rod was all wrong for generating the alternate snp alleles (it returned null or even the wrong value); fixed. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1531 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 14:21:46 +00:00
depristo	a08c68362e	Renaming error to getNegLog10PError(); added Cached clearing method to GL; SSG now has a CallResult that counts calls; No more Adding class to System.out, now to logger.info; First major testing piece (and general approach too) to unit testing of a walker -- SingleSampleGenotyper now knows how many calls to make on a particular 1mb region on NA12878 for each call type and counts the number of calls AND the compares the geli MD5 sum to the expected one! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1530 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 12:39:06 +00:00
aaron	3c2ae55859	changes for the genotype overhaul. Lots of changes focusing on the output side, from single sample genotyper to the output file formats like GLF and geli. Of note the genotype formats are still emitting posteriors as likelihoods; this is the way we've been doing it but it may change soon. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1529 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 05:31:15 +00:00
ebanks	2241173fff	In order to help learn python, I decided to convert Michael's DoC python script to Java; the CoverageHistogram now spits out standard deviations for a good Gaussian fit. This code eventually needs to end up in the VariantFiltration system - when we are ready to parameterize on the fly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1528 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 02:23:57 +00:00
chartl	544900aa99	Migration of some core calculations (log-likelihood probabilties, etc.) from CoverageAndPowerWalker into static methods in PoolUtils git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1527 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 21:43:29 +00:00
chartl	93cedf4285	--------------- \| Added items \| --------------- @/varianteval/PoolAnalysis Interface to identify variant analyses that are pool-specific. @/varianteval/BasicPoolVariantAnalysis Nearly the same as BasicVariantAnalysis with the addition of a protected integer (numIndividualsInPool) which holds the pool size. One soulcrushing change is that "protected String filename" needed to become "protected String[] filename" since now multiple truth files may be looked at. It was tempting to make the change in BasicVariantAnalysis with some default methods that would maintain usability of the remainder of the VariantAnalysis objects, but I decided to hold off. We can always merge these together later. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1526 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 21:26:04 +00:00
sjia	ee06c7f29f	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1525 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 19:41:12 +00:00
sjia	043c97eede	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1524 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 19:34:42 +00:00
aaron	c849282e44	reverting the HLA walker changes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1523 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 19:11:57 +00:00
asivache	5202d959bf	NM attribute changed in sam jdk (?) from Integer to Short, or maybe it is presented differently by the reader depending on whether SAM or BAM is processed; in any case, both Integer and Short are safe now git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1522 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 19:03:32 +00:00
sjia	ada4c5a13c	Small change to debug printing code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1521 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 18:31:21 +00:00
kiran	c3aaca1262	Improvements to make this work with uncompressed fastq files. Pulled the fastq parser out into it's own SAMFileReader-like entity. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1520 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 17:20:16 +00:00
asivache	499b3536a4	Changed to use AlignmentUtils.isReadUnmapped() for better consistency with SAM spec; also, it is now explicitly enforced that unmapped reads have <NO_...> values set for ref contig and start upon "remapping" git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1519 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 16:45:07 +00:00
ebanks	5bd99fc1c4	VariantFiltration moved to core. Another win for the team. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1517 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 15:41:41 +00:00
chartl	5130ca9b94	git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1516 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 15:17:02 +00:00
depristo	bdd0a6f9fa	change to make build work git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1511 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 13:43:10 +00:00
depristo	b01ac9de0c	High performance LocusIterator implementation. Now with greatly reduced memory impact and 2x (and more potentially) speed ups of raw locus iteration. General performance improvements to SSG with empirical probs. You can enable high-performance locus iteration with the -LIBS arg. It's still testing but passes validing pileup. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1510 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 03:06:25 +00:00
jmaguire	e2780c17af	Checkin of the Multi-Sample SNP caller. Doesn't work yet; same command I used to use now causes GATK to throw an exception. Will check with Matt & Aaron tomorrow, then do a regression test. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1509 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-03 00:23:28 +00:00
hanna	e2a79c5cd9	Checkpoint. The BWT that we generate now matches the first 16% of the BWT that BWT-SW generates. Cleaned up output streams to separate the byte packing / word packing from the data structure generation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1508 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 22:18:17 +00:00
ebanks	3dfc77dc89	Add an indel rod which represents the initial point of the indel only (useful for alternate reference making) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1507 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 19:32:29 +00:00
asivache	58debd7e56	A convenience shortcut isReadUnmapped() added: thanks to SAM format specification, 'read unmapped' flag is not always required to be set for an unmapped read; this method checks both the flag and the alignment reference index/start (if those are set to '*' the flag is not required according to the spec!) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1506 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 17:00:39 +00:00
aaron	0e6feff8f2	fixed locus pile-up limiting problem git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1505 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 16:56:44 +00:00
hanna	d8aff9a925	Bug fixes. Was ignoring the '$' character in a few places where I shouldn't have been. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1504 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 16:27:31 +00:00
ebanks	55013eff78	Re-revert back to point estimation for now. We need to do this right, just not yet. Also, it's safer to let colt do the log factorial calculations for us. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1503 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 15:33:18 +00:00
hanna	1ada085970	Cruddy implementation of BWT creation, for understanding and testing purposes. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1501 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 02:16:56 +00:00
ebanks	24d809133d	Oops - comment out the printouts git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1500 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 01:45:56 +00:00
ebanks	91ccb0f8c5	Revert to having these filters use integration over binomial probs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1499 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-02 01:40:22 +00:00
aaron	05c164ec69	changing the default behavior to allow any sized read pile-up (which may exceed the memory limit); the user can then select their own read limit. The default of 100K was arbitrary. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1498 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-01 14:46:00 +00:00
ebanks	54c0b6c430	Allow this ROD to consist of just the positions git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1497 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-01 12:43:18 +00:00
aaron	4a1d79cd7b	added a flag, maximum_reads_at_locus, shortName "mrl", which limits the number of reads we add to the locusByHanger. In some bam files misalignment produces pile-ups of 750K or more reads. We now limit this to the default of 100K reads. The user is warned if a locus exceeds this threshold, and no more reads are added. Also CombineDup walker had an incorrect package name. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1496 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-01 04:21:58 +00:00
ebanks	0addae967a	IndelArtifact filter can now handle filtering false SNPs that occur within the span of an indel but after the first position git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1495 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-01 03:34:39 +00:00
hanna	85ca68fab6	Initial version: creates a packed file from a fasta, suitable for consumption by BWT-SW. Works with E coli fasta, but will not work at this moment with multi-chr fastas. Will be made into a utility routine when BWA comes together. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1494 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 18:39:19 +00:00
asivache	591f8eedbb	Added setName() and getName() (however, not used anywhere yet). Now can set the name of the fasta record manually to whatever, however it will work only if done early enough. If the fasta record already started printing itself (i.e. the header line is already done), setName() will throw an exception. Could be too entangled, may reverse this back... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1493 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 18:09:55 +00:00
asivache	c9eb193c7f	Now recognizes a special name for a bound rod track: snpmask. If a rod with this name is bound, then ONLY snps from that track will be used (to set alt reference bases to N's), but indels will be ignored. This helps when an alt. ref has to be created for a set of indel calls, and another rod (e.g. dbSNP) is used to put N's in (for sequenom). If dbSNP rod is not marked as "snpmask", the indels reported there will make their way into the alt. reference output and mess it up. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1492 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 18:05:57 +00:00
ebanks	8e3c3324fa	Added filter for SNPs cleaned out by the realigner. It uses the realigner output for filtering; in addition, dbsnp indels partially work; IndelGenotyper calls don't yet work. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1489 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 04:32:32 +00:00
ebanks	8bc7afe781	Smarter SW penalties git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1488 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 04:29:19 +00:00
ebanks	463f80c03e	Require each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1487 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 03:37:24 +00:00
ebanks	1a299dd459	Require each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1486 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 03:31:37 +00:00
ebanks	e70101febc	Add a VEC filter for clustered SNP calls that takes advantage of the new windowed approach; delete the old standalone walker. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1485 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 03:14:42 +00:00
ebanks	215e908a11	Reworking of the VariantFiltration system to allow for a windowed view of variants and inclusion of more data to the various filters. This now allows us to incorporate both the clustered SNP filter and a SNP-near-indels filter, which otherwise wasn't possible. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1484 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-31 02:16:39 +00:00
depristo	813a4e838f	Removing old code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1482 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-30 19:27:11 +00:00
depristo	49a7babb2c	Better organization of Genotype likelihood calculations. NewHotness is now just GenotypeLikelihoods. There are 1, 3, and empirical base error models available as subclasses, along with a simple way to make this (see the factory). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1481 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-30 19:16:30 +00:00
depristo	522e4a77ae	Caching support across multiple technologies git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1480 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-30 18:10:14 +00:00
depristo	5af4bb628b	Intermediate checking before code reorganization. Full blown support for empirical transition probs in SSG for all platforms. Support for defaultPlatform arg in SSG. Renaming classes for final cleanup git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1479 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-30 17:34:43 +00:00
depristo	6ab9ddf9f5	Significant output formatting improvements. SNPs as indels analysis. heterozygosity rate calculations git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1478 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-29 21:49:09 +00:00
depristo	bde67428fd	Better formatting of the code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1477 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-29 21:46:47 +00:00
aaron	8331c195fb	changed the full name of maximum_reads to maximum_iterations for consistancy git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1475 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-28 16:03:46 +00:00
depristo	8e129d76fd	Support for original quality scores OQ flag. pQ flag in TableRecalibation to preserve quality scores below a threshold (defaulting to 5) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1474 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-28 14:14:21 +00:00
depristo	f0179109fa	Removing min confidence for on/off genotype git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1473 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-28 01:04:13 +00:00
depristo	4f7ed69242	toString() implemented git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1472 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-28 01:03:58 +00:00
depristo	dc9d40eb9a	Now requires a minimum genotype LOD before applying tests git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1471 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-28 00:19:23 +00:00
depristo	37a9b84276	corresponding test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1470 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-28 00:17:42 +00:00
depristo	bf60980653	Experitmental support for empirical P(B_true \| B_miscall). --useEmpiricalTransitions flag to SSG enables this support. Much better implementation of Genotype likelihoods -- the system should scream along now. Continuing progress towards deleting old model git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1469 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-28 00:17:24 +00:00
depristo	7cf9a54b64	change for new char/byte in BaseUtils git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1467 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 23:47:56 +00:00
depristo	a639459112	Trival consistency change from char in to char out, not char in to byte out git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1466 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 23:37:37 +00:00
chartl	6012f7602b	@ minor fixes to CoverageAndPowerWalker and AnalyzePowerWalker (switching to By Reference traversal, spitting out Syzygy position for sanity check) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1465 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 21:44:18 +00:00
chartl	bd1e679bc5	@ Fixed issues with AnalyzePowerWalker which depended on CoverageAndPowerWalker. The latter was changed but not the former. Now fixed git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1464 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 20:23:41 +00:00
kiran	a17dad5fa9	Converts from fastq.gz to unaligned BAM format. Accepts a single fastq (for single-end run) or two fastqs (for paired-end run). Also allows you to set certain BAM metadata (read groups, etc.). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1463 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 20:20:09 +00:00
chartl	8740124cda	@ListUtils - Bugfix in getQScoreOrderStatistic: method would attempt to access an empty list fed into it. Now it checks for null pointers and returns 0. @MathUtils - added a new method: cumBinomialProbLog which calculates a cumulant from any start point to any end point using the BinomProbabilityLog calculation. @PoolUtils - added a new utility class specifically for items related to pooled sequencing. A major part of the power calculation is now to calculate powers independently by read direction. The only method in this class (currently) takes your reads and offsets, and splits them into two groups by read direction. @CoverageAndPowerWalker - completely rewritten to split coverage, median qualities, and power by read direction. Makes use of cumBinomialProbLog rather than doing that calculation within the object itself. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1462 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-27 19:31:53 +00:00
chartl	1da45cffb3	New: Minor changes to CoverageAndPowerWalker bootstrapping (faster selection of indeces). Entirely new Aritifical Pool Walker (ArtificialPoolWalkerMk2), will likely replace ArtificialPoolWalker on the next commit. Adapted the method of sampling, and added a helper context class: ArtificialPoolContext which carries much of the burden of calculation and data handling for the walker. The walker itself maps and reduces ArtificialPoolContexts. Cheers! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1461 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-26 21:42:35 +00:00
chartl	92ea947c33	Added binomProbabilityLog(int k, int n, double p) to MathUtils: binomialProbabilityLog uses a log-space calculation of the binomial pmf to avoid the coefficient blowing up and thus returning Infinity or NaN (or in some very strange cases -Infinity). The log calculation compares very well, it seems with our current method. It's in MathUtils but could stand testing against rigorous truth data before becoming standard. Added median calculator functions to ListUtils getQScoreMedian is a new utility I wrote that given reads and offsets will find the median Q score. While I was at it, I wrote a similar method, getMedian, which will return the median of any list of Comparables, independent of initial order. These are in ListUtils. Added a new poolseq directory and three walkers CoverageAndPowerWalker is built on top of the PrintCoverage walker and prints out the power to detect a mutant allele in a pool of 2*(number of individuals in the pool) alleles. It can be flagged either to do this by boostrapping, or by pure math with a probability of error based on the median Q-score. This walker compiles, runs, and gives quite reasonable outputs that compare visually well to the power calculation computed by Syzygy. ArtificialPoolWalker is designed to take multiple single-sample .bam files and create a (random) artificial pool. The coverage of that pool is a user-defined proportion of the total coverage over all of the input files. The output is not only a new .bam file, but also an auxiliary file that has for each locus, the genotype of the individuals, the confidence of that call, and that person's representation in the artificial pool .bam at that locus. This walker compiles and, uhh, looks pretty. Needs some testing. AnalyzePowerWalker extends CoverageAndPowerWalker so that it can read previous power calcuations (e.g. from Syzygy) and print them to the output file as well for direct downstream comparisons. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1460 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 21:27:50 +00:00
kiran	478f426727	Fixed a missing method implementation in these two files. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1459 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 21:21:58 +00:00
kiran	f12ea3a27e	Added ability for all filters to return a probability for a given variant - interpreted as the probability that the given variant should be included in the final set. The joint probability of all the filters is computed to determine whether a variant should stay or go. At the moment, this is only visible in verbose mode (specify -V). Also removed 'learning mode'; now, filters emit important stats no matter what. Various code cleanups. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1458 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 21:17:56 +00:00
hanna	e5115409fa	Force columnSpacing to be at least one. We need a general-purpose, working tool for outputting columnar data to a PrintStream; will add JIRA. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1457 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 19:54:54 +00:00
aaron	811503d67b	vcf changes from Richards comments, fixed a test case git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1456 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-25 14:32:16 +00:00
hanna	ccdb4a0313	General-purpose management of output streams. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1454 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-23 00:56:02 +00:00
aaron	b316abd20f	catch a malformed column header name more gracefully git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1453 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-21 21:05:28 +00:00
aaron	0364f8e989	added the ability of the VCFReader to take in compressed gzipped files natively, which is really useful for the validator git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1452 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-21 18:40:38 +00:00
aaron	647a367680	Made the size zero interval file checker emit a warnUser if we're not in unsafe mode. Also changed the default logger level from error to warn. Does anyone object? It makes sense for users to always get their warn user statements in the default logging level. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1451 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-21 14:40:57 +00:00
aaron	df9133c90b	the doc on File.length states it returns 0L if it doesn't exist, added a check to make sure it exists (and length < 1) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1450 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-21 05:55:17 +00:00
aaron	cd711d7697	Added detection of interval files with zero length to the GATK, and removed it from the interval merger walker: this was a critical blocking emergency issue for Eric. also fixed some verbage in the GAEngine. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1449 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-21 05:35:49 +00:00
asivache	0bdecd8651	A most stupid bug. In cases when more than one indel variant was present in cleaned bam file, the "consensus" (max. # of occurences) call was computed incorrectly, and most of the times the call itself was not made at all. Fortunately, the locations where we see multiple indels are a minority, and many of them are suspicious anyway (manifestation of alignment problems?). Could change results of POOLED calls though. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1448 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 22:31:44 +00:00
aaron	6313c465fb	we want the RMS of the reads qualities not the RMS of the RMS of the read qualities. Also the VCF version tag seems to be standardized as VCR. Updated the VCF code. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1447 348d0f76-0448-11de-a6fe-93d51630548a	2009-08-20 21:56:29 +00:00

... 2 3 4 5 6 ...

1530 Commits (52d2e0ca0768b4b3b32a3e8a2d05ff3449dd5b4f)