gatk-3.8

Commit Graph

Author	SHA1	Message	Date
depristo	3a341b2f06	Fixes for VariantEval for genotyping mode git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1659 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 21:01:43 +00:00
aaron	7b39aa4966	Adding the VCF ROD. Also changed the VCF objects to much more user friendly. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1658 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 20:19:34 +00:00
ebanks	b19fd4d45c	Damn unit tests have a null Toolkit()... git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1654 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 17:10:49 +00:00
ebanks	90626c843d	oops - we don't need reference bases, but we still need reference git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1653 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 16:24:45 +00:00
ebanks	2b2df4e1ba	- Fix the CleanedReadInjector to deal with -L intervals correctly. - Some walkers don't use the ref base, so speed up traversals by not requiring it git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1652 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 16:17:58 +00:00
asivache	94618044e8	Starting an update of ROD system. These basic classes will completely replace old ones, but with this update they are not linked to anything, so this checkpoint should be safe. The main reason for the change is that there can be (and are!) multiple RODs overlapping with a single reference base position in a single track. There can be two "trivial" RODs at the same location (e.g. samtools pileup will have two point-like records at putative indel sites: one for the reference, the other one for the indel itself). Or there can be one or more "extended" RODs (length >1), eg. dbSNP can report an indel at Z:510-525 AND a SNP at Z:515. The ReferenceOrderedDatum object (and children) will not be changed, but it is now explicitly interpreted as a single data record, possibly out of many available from a given track for the current site. As long as single data record occupies one line in a data file, the new ROD system will take care of loading and keeping multiple records, including extended (length > 1) ones, and will automatically drop the records when they finally go out of scope. For one-line-per-record, multiple-records-per-site RODs, there is no need anymore for the hack used so far that involved passing ROD's own implementation of iterator through reflection mechanism (though it will still work) * RODRecordList: the ROD system (its iterators) will now always return a LIST of all RODs available at current position or at current query interval (see below). This class is a trivial wrapper for a list of ROD objects, with added location argument for the whole collection. The location of the RODRecordList is where the ROD system is currently sitting at: a single, current base on the reference (if next() traversal is performed), or the location of the query interval when returned by seekForward() (see below). The ROD objects themselves will have their locations set according to the original data in the file. Hence, perusing the above example of a dbSNP indel at Z:510-525 and SNP at Z:515, when moving to the position Z:515 the ROD system will return a RODRecorList with location Z:515, and with two ROD objects packaged inside, one with location Z:510-525, the other with Z:515. RODRecodIterator: Almost identical to old SimpleRODIterator used by ReferenceOrderedData; this is a low-level iterator that walks over records in the data file (with a callback to ROD's ::parseLine() to parse real data) SeekableRODIterator: a decorator class that wraps around Iterator<ROD> (such as RODRecordIterator) and makes the data traversable by reference position, rather than record by record. This is reimplementation of the old RODIterator. SeekableRODIterator's ::next() moves to the next position on the ref and returns all RODs overlapping with that position (as a RODRecordList). This iterator also adds a seekForward(loc) operation, that allows fast forwarding to a specified position or interval. Length > 1 query arguments (extended intervals) are fully supported by seekForward(), the returned RODRecordList wil contain all RODs overlapping with the specified interval, and the location of the returned RODRecordList object will be set to that query interval. NOTE: it is ILLEGAL to perform next() after a seekForward() query with length > 1 interval. seekForward() with point-like (length=1) interval reenables next(). git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1650 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-18 15:58:37 +00:00
hanna	355136928e	Play nice with other jobs in this VM -- don't close stdout / stderr. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1646 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-17 18:55:08 +00:00
ebanks	5d85bd9671	By default, VF should ask for deleted bases so that they show up in coverage. The Strand filter then needs to ignore those bases when determining bias. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1636 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 16:46:09 +00:00
hanna	01a9b1c63b	Fix for problem where err stream remapped to output stream in certain cases, (hopefully) completing Matt's hat trick of fail. Thanks, unit tests. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1634 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-16 08:33:56 +00:00
hanna	9f7cf73411	Output stream management fixes. I completely screwed up the output stream management system, but cleverly masked this fact by breaking some other stream management functionality that masked the problem. Sigh. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1630 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-15 21:06:45 +00:00
hanna	17758b381c	Properly initialize redirected output streams in case of out and err. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1629 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-15 19:47:43 +00:00
andrewk	00dfe014b7	Added option to FastaReferenceWalker to change output FASTA file format's line width and to remove header lines; allows dumping raw sequence using intervals git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1628 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-15 18:00:30 +00:00
hanna	b69eb208a6	Always create output files, even if no output was written to them. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1627 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-15 17:58:14 +00:00
aaron	b401929e41	incremental clean-up and changes for VariantEval, moved DiploidGenotype to a better home, and fixed a spelling error. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1624 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-15 04:48:42 +00:00
ebanks	01e7b39c8d	1. Don't print out values in filter field of the VCF. 2. Fix ratio printouts (for params file) 3. Rename ratio filter's get counts method to avoid confusion; more changes on the way this week. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1616 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 21:03:39 +00:00
ebanks	436f543b3b	I owe Doug a beer for finding this: don't print out intervals to be merged if they're not within the global -L intervals git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1615 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 20:22:30 +00:00
aaron	e03fccb223	Changes to switch Variant Eval over to the new Variation system. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1611 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-14 05:34:33 +00:00
aaron	5b41ef5f70	rod DBSNP had a bug where the reference wasn't calculated correctly under certain conditions. Fixed getRefBasesFWD and getRefSnpFWD so that they were more in line with getAltBasesFWD and getAltSnpFWD. Also updated Variant Eval tests to reflect this change. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1609 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 23:48:58 +00:00
ebanks	c669e8d5ad	Use constant seed in the random generator so we can be stable (and thus unit tests will work) git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1607 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-13 17:40:56 +00:00
depristo	6c7a300664	Missing file git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1601 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:17:09 +00:00
depristo	6e13a36059	Framework for ROD walkers -- totally experiment and not working right now git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1600 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:13:15 +00:00
depristo	e8d544869d	Alignment context now supports the idea of skipped bases -- not currently in use git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1598 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:11:38 +00:00
depristo	3949b4ac72	commented out version of next() and hasNext() that appear to be correct but are causing testing problems git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1596 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:09:21 +00:00
depristo	58105636c8	getBoundRods() convenience method git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1595 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:07:57 +00:00
depristo	4e1eded389	Fixed bad compareTo operator git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1594 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-12 19:07:10 +00:00
depristo	7c8b17b456	fix for SSG with pl name git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1591 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 20:39:34 +00:00
chartl	d6a0b65ac9	Changes: Rollback of Variant-related changes of r1585, additional PGC code git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1586 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 16:23:01 +00:00
chartl	0c54aba92a	Changes: @VariantEvalWalker - added a command line option to input a file path to a pooled call file for pooled genotype concordance checking. This string is to be passed to the PooledGenotypeConcordance object. @AllelicVariant - added a method isPooled() to distinguish pooled AllelicVariants from unpooled ones. @ all the rest - implemented isPooled(); for everything other than PooledEMSNProd it simply returns false, for PooledEMSNProd it returns true. Added: @PooledGenotypeConcordance - takes in a filepath to a pool file with the names of hapmap individuals for concordance checking with pooled calls and does said concordance checking over all pools. Commented out as all the methods are as yet unwritten. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1585 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-11 15:01:50 +00:00
aaron	5a64a80ab5	changes to the variation class, updates to SSG, updated tests based on changes to the SSGenotypeCall, and added the ability to run a single integration test from using the build script. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1577 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-10 04:31:33 +00:00
depristo	c988205884	Notes for Aaron in SSG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1576 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-10 03:18:51 +00:00
ebanks	1362a56227	Added fasta tests and small fix to cleaner test git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1575 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-10 03:13:11 +00:00
depristo	0093482c62	N reference base fix for SSG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1572 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 21:19:36 +00:00
ebanks	cb31d5a0ab	VariantFiltration now outputs VCF. Important changes: 1. VariantsToVCF can now be called statically to output VCF for a single ROD instance; this is temporary until we have a VCF ROD. 2. VariantFiltration now outputs only 2 files, both mandatory: all variants that pass filters in geli text, and all variants in VCF. If there are any problems, go find Aaron. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1569 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 20:04:32 +00:00
asivache	dd0085c428	1) now is tolerant to sloppy cigar strings with 0-length elements (at the price of extra recursive call) 2) when reads with deletions are requested, adds to the pile just those: reads with 'D' over the current reference base, but not 'N' 3) next() now implements a loop: recursive forward iteration calls to next() until ref. position with non-zero coverage is encountered were OK for (short) deletions, but with long stretches of N's they end up with stack overflow git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1568 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 20:04:04 +00:00
ebanks	542af6402e	output correct format for Sequenom SNPs git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1567 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 19:21:53 +00:00
kiran	3b1e966b4c	Lowercases the sequencing platform so that a difference in case doesn't lead to the failure to look up an entry in the hash. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1565 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 17:35:45 +00:00
kiran	d82d6c0665	Excludes variants that fall below a certain LOD that changes as a function of depth. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1564 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 17:34:16 +00:00
kiran	06eae52292	Throws an exception if you attempt to use a filter that doesn't exist. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1563 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 17:33:27 +00:00
asivache	1060b36288	Bug fix: 'N' cigar elements now treated properly; for all practical intents and purposes, N is the same as D and should be treated as such, the difference is only in logical interpretation. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1562 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 17:08:35 +00:00
chartl	9d69bd2c84	Modifications: @CoverageAndPowerWalker - removed a hanging colon that was being printed after the reference position @VariantEvalWalker - added a command line argument for pool size for eventual use in doing pooled caller evaluations. As now, the variable is unused. @AlignmentContext - altered the scope of class variables from private to protected in order that child objects might have access to them New Additions: Filtered Contexts Sometimes we want to filter or partition reads by some aspect (quality score, read direction, current base, whatever) and use only those reads as part of the alignment context. Prior to this I've been doing the split externally and creating a new AlignmentContext object. This new approach makes it a bit easier, as each of these objects are children of AlignmentContext, and can be instantiated from a "raw" AlignmentContext. @FilteredAlignmentContext is an abstract class that defines the behavior. The abstract method 'filter' is called on the input AlignmentContext, filtering those reads and offsets by whatever you can think of. The filtered reads/offsets are then maintained in the reads and offsets fields. These classes can be passed around as AlignmentContexts themselves. Writing a new kind of read-filtered alignment context boils down to implementing the filter method. @ReverseReadsContext - a FilteredAlignmentContext that takes only reads in the reverse direction @ForwardReadsContext - a FilteredAlignmentContext that takes only reads in the forward direction @QualityScoreThresholdContext - a FilteredAlignmentContext that takes only reads above a given quality score threshold (defaults to 22 if none provided). A unit test bamfile and associated unit tests for these are in the works. git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1559 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:49:52 +00:00
depristo	d9588e6083	bug fixes to LIBS and LIBH following ultra-aggressive regression testing across 454, solid, and solexa git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1558 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:36:12 +00:00
asivache	df11618092	Set default value of useLocusIteratorByHanger to FALSE. Otherwise the -LIBH flag is useless and there'd be no wayto "unset" the 'true' value. Old version was (always) using LocusIteratorByHanger. Now default iterator is indeed LocusIteratorByState, and -LIBH will switch back to the old one git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1556 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 15:09:09 +00:00
depristo	eeb9b6eb13	GenotypeLikelhoods now support a cache per subclass, avoiding genotyping clashes git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1554 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 10:39:14 +00:00
ebanks	0cc219c0df	-Added unit test for walkers dealing with intervals for cleaning -I also uncovered a corner case in the cleaner that for some reason was commented out but shouldn't have been. Hooray for unit tests! git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1553 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 02:35:17 +00:00
depristo	ec0f6f23c7	LocusIterationByState is now the system deafult. Fixed Aaron's build problem git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1552 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-09 01:28:05 +00:00
ebanks	5dbba6711c	Lots of changes: (I'll send email out in a sec) 1) Moved various disparate concordance / set splitting functionalities to a new parent tool which works like VariantFiltration (i.e. people can write various modules that fit inside and can be run though it). 2) Fixed up argument parsing in VariantFiltration to use key=value format so we don't accidentally mox up values (like I had been doing). 3) Have indel rod print samples git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1540 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-07 01:12:09 +00:00
depristo	1c3d67f0f3	Improvements to the CountCovariates and TableRecablirator, as well as regression tests for SLX and 454 data git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1539 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 22:26:57 +00:00
depristo	2b0d1c52b2	General WalkerTest framework. Includes some minor changes to GATK core to enable creation of true command-line like GATK modules in the code. Extensive first-pass tests for SSG git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1538 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 19:13:37 +00:00
aaron	0cc634ed5d	-Renamed rodVariants to RodGeliText -Remove KGenomesSNPROD -Remove rodFLT -Renamed rodGFF to RodGenotypeChipAsGFF -Fixed a problem in SSGenotypeCall -Added basic SSGenotype Test class -Make VCFHeader constructors public git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1536 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 18:40:43 +00:00
ebanks	fd1c72c151	Fixed package name git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1535 348d0f76-0448-11de-a6fe-93d51630548a	2009-09-04 15:40:06 +00:00

1 2 3 4 5 ...

546 Commits (3a341b2f06d1a39c0efef5a5991566f39c0649da)