Commit Graph

1850 Commits (da2e879bbc07033e7b5bd90e9f941fcaec937914)

Author SHA1 Message Date
hanna 7f5778c966 Update gsadevelopers -> gsahelp.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1663 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-20 23:36:54 +00:00
aaron 3a487dd64e little fixes; also fixed a tyPo
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1662 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 22:38:51 +00:00
aaron b6d7d6acc6 fix for the eval tests, and a change to the backedbygenotypes interface, more changes to come
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1661 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 22:25:16 +00:00
depristo 4318f75910 tiny cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1660 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 21:04:25 +00:00
depristo 3a341b2f06 Fixes for VariantEval for genotyping mode
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1659 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 21:01:43 +00:00
aaron 7b39aa4966 Adding the VCF ROD. Also changed the VCF objects to much more user friendly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1658 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 20:19:34 +00:00
ebanks b19fd4d45c Damn unit tests have a null Toolkit()...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1654 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 17:10:49 +00:00
ebanks 90626c843d oops - we don't need reference bases, but we still need reference
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1653 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 16:24:45 +00:00
ebanks 2b2df4e1ba - Fix the CleanedReadInjector to deal with -L intervals correctly.
- Some walkers don't use the ref base, so speed up traversals by not requiring it


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1652 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 16:17:58 +00:00
asivache 94618044e8 Starting an update of ROD system. These basic classes will completely replace old ones, but with this update they are not linked to anything, so this checkpoint should be safe.
The main reason for the change is that there can be (and are!) multiple RODs overlapping with a single reference base position in a single track. There can be two "trivial" RODs at the same location (e.g. samtools pileup will have two point-like records at putative indel sites: one for the reference, the other one for the indel itself). Or there can be one or more "extended" RODs (length >1), eg. dbSNP can report an indel at Z:510-525 AND a SNP at Z:515.

The ReferenceOrderedDatum object (and children) will not be changed, but it is now explicitly interpreted as a single data *record*, possibly out of many available from a given track for the current site. As long as single data record occupies one line in a data file, the new ROD system will take care of loading and keeping multiple records, including extended (length > 1) ones, and will automatically drop the records when they finally go out of scope. For one-line-per-record, multiple-records-per-site RODs, there is no need anymore for the hack used so far that involved passing ROD's own implementation of iterator through reflection mechanism (though it will still work)

* RODRecordList: 
the ROD system (its iterators) will now always return a LIST of all RODs available at current position or at current query interval (see below). This class is a trivial wrapper for a list of ROD objects, with added location argument for the whole collection. The location of the RODRecordList is where the ROD system is currently sitting at: a single, current base on the reference (if next() traversal is performed), or the location of the query interval when returned by seekForward() (see below). The ROD objects themselves will have their locations set according to the original data in the file. Hence, perusing the above example of a dbSNP indel at Z:510-525 and SNP at Z:515, when moving to the position Z:515 the ROD system will return a RODRecorList with location Z:515, and with two ROD objects packaged inside, one with location Z:510-525, the other with Z:515.

*RODRecodIterator:
Almost identical to old SimpleRODIterator used by ReferenceOrderedData; this is a low-level iterator that walks over records in the data file (with a callback to ROD's ::parseLine() to parse real data)

*SeekableRODIterator:
a decorator class that wraps around Iterator<ROD> (such as RODRecordIterator) and makes the data traversable by reference position, rather than record by record. This is reimplementation of the old RODIterator.  SeekableRODIterator's ::next() moves to the next position on the ref and returns all RODs overlapping with that position (as a RODRecordList). This iterator also adds a seekForward(loc) operation, that allows fast forwarding to a specified position or interval. Length > 1 query arguments (extended intervals) are fully supported by seekForward(), the returned RODRecordList wil contain all RODs overlapping with the specified interval, and the location of the returned RODRecordList object will be set to that query interval. NOTE: it is ILLEGAL to perform next() after a seekForward() query with length > 1 interval. seekForward() with point-like (length=1) interval reenables next().


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1650 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-18 15:58:37 +00:00
hanna 355136928e Play nice with other jobs in this VM -- don't close stdout / stderr.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1646 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-17 18:55:08 +00:00
ebanks 5d85bd9671 By default, VF should ask for deleted bases so that they show up in coverage.
The Strand filter then needs to ignore those bases when determining bias.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1636 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 16:46:09 +00:00
hanna 01a9b1c63b Fix for problem where err stream remapped to output stream in certain cases, (hopefully) completing Matt's hat trick of fail. Thanks, unit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1634 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-16 08:33:56 +00:00
hanna 9f7cf73411 Output stream management fixes. I completely screwed up the output stream management system, but cleverly masked this fact by breaking some other stream management functionality that masked the problem.
Sigh.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1630 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 21:06:45 +00:00
hanna 17758b381c Properly initialize redirected output streams in case of out and err.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1629 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 19:47:43 +00:00
andrewk 00dfe014b7 Added option to FastaReferenceWalker to change output FASTA file format's line width and to remove header lines; allows dumping raw sequence using intervals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1628 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 18:00:30 +00:00
hanna b69eb208a6 Always create output files, even if no output was written to them.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1627 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 17:58:14 +00:00
aaron b401929e41 incremental clean-up and changes for VariantEval, moved DiploidGenotype to a better home, and fixed a spelling error.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1624 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-15 04:48:42 +00:00
ebanks 01e7b39c8d 1. Don't print out values in filter field of the VCF.
2. Fix ratio printouts (for params file)
3. Rename ratio filter's get counts method to avoid confusion; more changes on the way this week.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1616 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 21:03:39 +00:00
ebanks 436f543b3b I owe Doug a beer for finding this:
don't print out intervals to be merged if they're not within the global -L intervals


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1615 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 20:22:30 +00:00
aaron e03fccb223 Changes to switch Variant Eval over to the new Variation system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1611 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-14 05:34:33 +00:00
aaron 5b41ef5f70 rod DBSNP had a bug where the reference wasn't calculated correctly under certain conditions. Fixed getRefBasesFWD and getRefSnpFWD so that they were more in line with getAltBasesFWD and getAltSnpFWD. Also updated Variant Eval tests to reflect this change.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1609 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 23:48:58 +00:00
ebanks c669e8d5ad Use constant seed in the random generator so we can be stable (and thus unit tests will work)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1607 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-13 17:40:56 +00:00
depristo 6c7a300664 Missing file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1601 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:17:09 +00:00
depristo 6e13a36059 Framework for ROD walkers -- totally experiment and not working right now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1600 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:13:15 +00:00
depristo e8d544869d Alignment context now supports the idea of skipped bases -- not currently in use
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1598 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:11:38 +00:00
depristo 3949b4ac72 commented out version of next() and hasNext() that appear to be correct but are causing testing problems
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1596 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:09:21 +00:00
depristo 58105636c8 getBoundRods() convenience method
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1595 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:07:57 +00:00
depristo 4e1eded389 Fixed bad compareTo operator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1594 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-12 19:07:10 +00:00
depristo 7c8b17b456 fix for SSG with pl name
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1591 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 20:39:34 +00:00
chartl d6a0b65ac9 Changes:
Rollback of Variant-related changes of r1585, additional PGC code




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1586 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 16:23:01 +00:00
chartl 0c54aba92a Changes:
@VariantEvalWalker - added a command line option to input a file path to a pooled call file for pooled genotype concordance checking. This string is to be passed to the PooledGenotypeConcordance object.

@AllelicVariant - added a method isPooled() to distinguish pooled AllelicVariants from unpooled ones.

@ all the rest - implemented isPooled(); for everything other than PooledEMSNProd it simply returns false, for PooledEMSNProd it returns true.

Added:

@PooledGenotypeConcordance - takes in a filepath to a pool file with the names of hapmap individuals for concordance checking with pooled calls
 and does said concordance checking over all pools. Commented out as all the methods are as yet unwritten.




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1585 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-11 15:01:50 +00:00
aaron 5a64a80ab5 changes to the variation class, updates to SSG, updated tests based on changes to the SSGenotypeCall, and added the ability to run a single integration test from using the build script.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1577 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 04:31:33 +00:00
depristo c988205884 Notes for Aaron in SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1576 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 03:18:51 +00:00
ebanks 1362a56227 Added fasta tests and small fix to cleaner test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1575 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-10 03:13:11 +00:00
depristo 0093482c62 N reference base fix for SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1572 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 21:19:36 +00:00
ebanks cb31d5a0ab VariantFiltration now outputs VCF. Important changes:
1. VariantsToVCF can now be called statically to output VCF for a single ROD instance; this is temporary until we have a VCF ROD.
2. VariantFiltration now outputs only 2 files, both mandatory: all variants that pass filters in geli text, and all variants in VCF.
If there are any problems, go find Aaron.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1569 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 20:04:32 +00:00
asivache dd0085c428 1) now is tolerant to sloppy cigar strings with 0-length elements (at the price of extra recursive call)
2) when reads with deletions are requested, adds to the pile just those: reads with 'D' over the current reference base, but not 'N'
3) next() now implements a loop: recursive forward iteration calls to next() until ref. position with non-zero coverage is encountered were OK for (short) deletions, but with long stretches of N's they end up with stack overflow

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1568 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 20:04:04 +00:00
ebanks 542af6402e output correct format for Sequenom SNPs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1567 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 19:21:53 +00:00
kiran 3b1e966b4c Lowercases the sequencing platform so that a difference in case doesn't lead to the failure to look up an entry in the hash.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1565 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:35:45 +00:00
kiran d82d6c0665 Excludes variants that fall below a certain LOD that changes as a function of depth.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1564 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:34:16 +00:00
kiran 06eae52292 Throws an exception if you attempt to use a filter that doesn't exist.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1563 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:33:27 +00:00
asivache 1060b36288 Bug fix: 'N' cigar elements now treated properly; for all practical intents and purposes, N is the same as D and should be treated as such, the difference is only in logical interpretation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1562 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 17:08:35 +00:00
chartl 9d69bd2c84 Modifications:
@CoverageAndPowerWalker - removed a hanging colon that was being printed after the reference position

@VariantEvalWalker - added a command line argument for pool size for eventual use in doing pooled caller evaluations. As now, the variable is unused.

@AlignmentContext - altered the scope of class variables from private to protected in order that child objects might have access to them


New Additions:

Filtered Contexts

Sometimes we want to filter or partition reads by some aspect (quality score, read direction, current base, whatever) and use only those reads as
part of the alignment context. Prior to this I've been doing the split externally and creating a new AlignmentContext object. This new approach makes
it a bit easier, as each of these objects are children of AlignmentContext, and can be instantiated from a "raw" AlignmentContext.

@FilteredAlignmentContext is an abstract class that defines the behavior. The abstract method 'filter' is called on the input AlignmentContext, filtering
those reads and offsets by whatever you can think of. The filtered reads/offsets are then maintained in the reads and offsets fields. These classes can
be passed around as AlignmentContexts themselves. Writing a new kind of read-filtered alignment context boils down to implementing the filter method.

@ReverseReadsContext - a FilteredAlignmentContext that takes only reads in the reverse direction

@ForwardReadsContext - a FilteredAlignmentContext that takes only reads in the forward direction

@QualityScoreThresholdContext - a FilteredAlignmentContext that takes only reads above a given quality score threshold (defaults to 22 if none provided).

A unit test bamfile and associated unit tests for these are in the works.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1559 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:49:52 +00:00
depristo d9588e6083 bug fixes to LIBS and LIBH following ultra-aggressive regression testing across 454, solid, and solexa
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1558 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:36:12 +00:00
asivache df11618092 Set default value of useLocusIteratorByHanger to FALSE. Otherwise the -LIBH flag is useless and there'd be no wayto "unset" the 'true' value. Old version was (always) using LocusIteratorByHanger. Now default iterator is indeed LocusIteratorByState, and -LIBH will switch back to the old one
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1556 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 15:09:09 +00:00
depristo eeb9b6eb13 GenotypeLikelhoods now support a cache per subclass, avoiding genotyping clashes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1554 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 10:39:14 +00:00
ebanks 0cc219c0df -Added unit test for walkers dealing with intervals for cleaning
-I also uncovered a corner case in the cleaner that for some reason was commented out but shouldn't have been.  Hooray for unit tests!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1553 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 02:35:17 +00:00
depristo ec0f6f23c7 LocusIterationByState is now the system deafult. Fixed Aaron's build problem
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1552 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-09 01:28:05 +00:00
ebanks 5dbba6711c Lots of changes: (I'll send email out in a sec)
1) Moved various disparate concordance / set splitting functionalities to a new parent tool which works like VariantFiltration (i.e. people can write various modules that fit inside and can be run though it).
2) Fixed up argument parsing in VariantFiltration to use key=value format so we don't accidentally mox up values (like I had been doing).
3) Have indel rod print samples


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1540 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-07 01:12:09 +00:00
depristo 1c3d67f0f3 Improvements to the CountCovariates and TableRecablirator, as well as regression tests for SLX and 454 data
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1539 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 22:26:57 +00:00
depristo 2b0d1c52b2 General WalkerTest framework. Includes some minor changes to GATK core to enable creation of true command-line like GATK modules in the code. Extensive first-pass tests for SSG
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1538 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 19:13:37 +00:00
aaron 0cc634ed5d -Renamed rodVariants to RodGeliText
-Remove KGenomesSNPROD
-Remove rodFLT
-Renamed rodGFF to RodGenotypeChipAsGFF
-Fixed a problem in SSGenotypeCall
-Added basic SSGenotype Test class
-Make VCFHeader constructors public

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1536 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 18:40:43 +00:00
ebanks fd1c72c151 Fixed package name
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1535 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 15:40:06 +00:00
ebanks 6c476514f8 Moved to core. Wiki pages are going up; unit tests will be written soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1533 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 15:09:11 +00:00
ebanks 849dce799d This rod was all wrong for generating the alternate snp alleles (it returned null or even the wrong value); fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1531 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 14:21:46 +00:00
depristo a08c68362e Renaming error to getNegLog10PError(); added Cached clearing method to GL; SSG now has a CallResult that counts calls; No more Adding class to System.out, now to logger.info; First major testing piece (and general approach too) to unit testing of a walker -- SingleSampleGenotyper now knows how many calls to make on a particular 1mb region on NA12878 for each call type and counts the number of calls *AND* the compares the geli MD5 sum to the expected one!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1530 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 12:39:06 +00:00
aaron 3c2ae55859 changes for the genotype overhaul. Lots of changes focusing on the output side, from single sample genotyper to the output file formats like GLF and geli. Of note the genotype formats are still emitting posteriors as likelihoods; this is the way we've been doing it but it may change soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1529 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-04 05:31:15 +00:00
ebanks 5bd99fc1c4 VariantFiltration moved to core.
Another win for the team.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1517 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 15:41:41 +00:00
depristo bdd0a6f9fa change to make build work
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1511 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 13:43:10 +00:00
depristo b01ac9de0c High performance LocusIterator implementation. Now with greatly reduced memory impact and 2x (and more potentially) speed ups of raw locus iteration. General performance improvements to SSG with empirical probs. You can enable high-performance locus iteration with the -LIBS arg. It's still testing but passes validing pileup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1510 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-03 03:06:25 +00:00
ebanks 3dfc77dc89 Add an indel rod which represents the initial point of the indel only
(useful for alternate reference making)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1507 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 19:32:29 +00:00
aaron 0e6feff8f2 fixed locus pile-up limiting problem
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1505 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-02 16:56:44 +00:00
aaron 05c164ec69 changing the default behavior to allow any sized read pile-up (which may exceed the memory limit); the user can then select their own read limit. The default of 100K was arbitrary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1498 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 14:46:00 +00:00
ebanks 54c0b6c430 Allow this ROD to consist of just the positions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1497 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 12:43:18 +00:00
aaron 4a1d79cd7b added a flag, maximum_reads_at_locus, shortName "mrl", which limits the number of reads we add to the locusByHanger. In some bam files misalignment produces pile-ups of 750K or more reads. We now limit this to the default of 100K reads.
The user is warned if a locus exceeds this threshold, and no more reads are added.

Also CombineDup walker had an incorrect package name.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1496 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 04:21:58 +00:00
ebanks 0addae967a IndelArtifact filter can now handle filtering false SNPs that occur within the span of an indel but after the first position
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1495 348d0f76-0448-11de-a6fe-93d51630548a
2009-09-01 03:34:39 +00:00
ebanks 8e3c3324fa Added filter for SNPs cleaned out by the realigner.
It uses the realigner output for filtering; in addition, dbsnp indels partially work; IndelGenotyper calls don't yet work.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1489 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 04:32:32 +00:00
ebanks 8bc7afe781 Smarter SW penalties
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1488 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 04:29:19 +00:00
ebanks 1a299dd459 Require each filter or feature to declare whether or not they want mapping quality zero reads in the alignment context
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1486 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 03:31:37 +00:00
ebanks 215e908a11 Reworking of the VariantFiltration system to allow for a windowed view of variants and inclusion of more data to the various filters.
This now allows us to incorporate both the clustered SNP filter and a SNP-near-indels filter, which otherwise wasn't possible.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1484 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-31 02:16:39 +00:00
depristo 813a4e838f Removing old code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1482 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 19:27:11 +00:00
depristo 49a7babb2c Better organization of Genotype likelihood calculations. NewHotness is now just GenotypeLikelihoods. There are 1, 3, and empirical base error models available as subclasses, along with a simple way to make this (see the factory).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1481 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 19:16:30 +00:00
depristo 522e4a77ae Caching support across multiple technologies
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1480 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 18:10:14 +00:00
depristo 5af4bb628b Intermediate checking before code reorganization. Full blown support for empirical transition probs in SSG for all platforms. Support for defaultPlatform arg in SSG. Renaming classes for final cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1479 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-30 17:34:43 +00:00
depristo bde67428fd Better formatting of the code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1477 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-29 21:46:47 +00:00
aaron 8331c195fb changed the full name of maximum_reads to maximum_iterations for consistancy
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1475 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 16:03:46 +00:00
depristo 8e129d76fd Support for original quality scores OQ flag. pQ flag in TableRecalibation to preserve quality scores below a threshold (defaulting to 5)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1474 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 14:14:21 +00:00
depristo bf60980653 Experitmental support for empirical P(B_true | B_miscall). --useEmpiricalTransitions flag to SSG enables this support. Much better implementation of Genotype likelihoods -- the system should scream along now. Continuing progress towards deleting old model
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1469 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-28 00:17:24 +00:00
depristo 7cf9a54b64 change for new char/byte in BaseUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1467 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-27 23:47:56 +00:00
hanna e5115409fa Force columnSpacing to be at least one. We need a general-purpose, working tool for outputting columnar data to a PrintStream; will add JIRA.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1457 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-25 19:54:54 +00:00
hanna ccdb4a0313 General-purpose management of output streams.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1454 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-23 00:56:02 +00:00
aaron cd711d7697 Added detection of interval files with zero length to the GATK, and removed it from the interval merger walker: this was a critical blocking emergency issue for Eric.
also fixed some verbage in the GAEngine.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1449 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-21 05:35:49 +00:00
aaron 6313c465fb we want the RMS of the reads qualities not the RMS of the RMS of the read qualities.
Also the VCF version tag seems to be standardized as VCR.  Updated the VCF code.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1447 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-20 21:56:29 +00:00
ebanks ed8c92a12a make isReference do the right thing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1439 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-19 20:32:29 +00:00
ebanks b3fe566c0c Fix descriptions of walker args
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1436 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 19:46:48 +00:00
ebanks 53153fcd79 Allow RODs to specify that incomplete records are okay (i.e. that they allow optional fields)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1433 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 15:26:10 +00:00
ebanks b2a18a9d61 - first pass at a basic indel filter (for now, based on size and homopolymer runs)
- fix simple indel rod printout


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1431 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-18 03:04:12 +00:00
jmaguire 1e8b97b560 quietly skip empty intervals files rather than crash.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1428 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 20:19:14 +00:00
jmaguire 92c63fb530 It's just "lod" not discovery_lod now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1427 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 18:44:09 +00:00
ebanks df5744bcd3 update this walker so any variants can be passed in
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1426 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-17 16:30:39 +00:00
aaron d101c20b30 added the ability to pass in a csv file of ROD triplets (one triplet per line) to the -B option
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1412 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 22:10:20 +00:00
ebanks 2c3f56cb8d fix length calculation (it was including +/- char when it shouldn't)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1410 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 20:28:24 +00:00
aaron fc1c76f1d2 fixing a bug where reads in overlapping interval based locus traversals could get assigned to only one of two the regions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1407 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-11 17:50:16 +00:00
ebanks ecae619a1b warn user when dbSNP rod looks suspicious
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1400 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 20:20:20 +00:00
asivache 2841e151d0 javadoc comments only
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1399 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 18:44:35 +00:00
ebanks 02f1af0743 Don't die when a readgroup is absent from the covariates table - it could
happen when all reads are unmapped (or have MQ0); instead, just don't alter
the quals.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1394 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-10 03:10:33 +00:00
depristo 6d3ef73868 Now includes statistics on the allele agreement with dbSNP -- counts concordant calls as dbSNP = A/C and we say A/C, vs. we say A/T
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1392 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 19:37:07 +00:00
depristo 20baa80751 Updated polarized reference priors, need DiploidGenotypePriors class that is directly used by the NewHotness genotypelikelihoods, more bug fixes and refactoring, etc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1391 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-07 19:01:04 +00:00
depristo bbd7bec5db Continuing cleanup of SSG. GenotypeLikelihoods now have extensive testing routines. DiploidGenotype supports het, homref, etc calculations. SSG has been cleaned up to remove old garbage functionality. Also now supports output to standard output by simply omitting varout
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1387 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 22:25:30 +00:00
hanna 48713e154c Windowed access to the reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1383 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 16:29:15 +00:00
depristo 65e9dcf5b7 Fully operational version of the new genotype likelihoods class. (1) Much cleaner interface. Now explicitly stores likelihoods, priors, and posteriors in separate arrays indexed by an enum, (2) no longer can be used to make calls, it relies on SSGGenotypeCall to order the likelihoods, calculate best to ref, etc, this is just for calculating genotype likelihoods now; (3) Now performs extensive error checking with validate() to ensure the system is behaving properly. (4) fixed incorrect treatment of N bases, which we being counted against everyone (5) likely found a stats bug in which heterozyosity was being applied incorrectly to the genotype priors
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1382 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 01:00:55 +00:00
depristo 4dc23f2763 Trivial formatting changes as I moved more legacy code into this system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1381 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 00:54:26 +00:00
depristo 34af669dbb Explicit ENUM representation of the diploid genotypes. Please use this from now on to represent strings like AA or AT
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1380 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-05 00:53:43 +00:00
hanna 21d1eba502 Cleaned division of responsibilities between arguments to map function. Reference has been changed
from an array of bases to an object (ReferenceContext), and LocusContext has been renamed to reflect
the fact that it contains contextual information only about the alignments, not the locus in general.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1376 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-04 21:01:37 +00:00
depristo 20ff603339 New hotness and old and Busted genotype likelihood objects are now in the code base as I work towards a bug-free SSG along with a cleaner interface to the genotype likelihood object
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1372 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 23:07:53 +00:00
depristo 4986b2abd6 Fixing bug in SSG -- genotyping and discovery were mixed up by name
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1371 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 22:13:35 +00:00
depristo 3485397483 Reorganization of the genotyping system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1370 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 20:55:31 +00:00
depristo d840a47b11 Slight reorganization of genotype interface
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1366 348d0f76-0448-11de-a6fe-93d51630548a
2009-08-03 19:17:15 +00:00
ebanks 4366ce16e0 Made sure all RODs have a (good) toString() method - and use it in the Venn walker. (thanks, Mark)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1339 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 14:53:27 +00:00
ebanks feb7238f10 Wasn't always returning the correct alt base
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1337 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-30 03:08:04 +00:00
hanna 5429b4d4a8 A bit of reorganization to help with more flexible output streams. Pushed construction of data
sources and post-construction validation back into the GATKEngine, leaving the MicroScheduler
to just microschedule.  


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1336 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 23:00:15 +00:00
hanna 7a13647c35 Support for specifying SAMFileReaders and SAMFileWriters as @Arguments directly. *Very*
rough initial implementation, but should provide enough support so that people can stop
creating SAMFileWriters in reduceInit.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1332 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-29 16:11:45 +00:00
ebanks 3c4410f104 -add basic indel metrics to variant eval
-variants need a length method (can't assume it's a SNP)!


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1324 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-28 03:25:03 +00:00
aaron f1109e9070 Added the interator to SAMDataSource to prevent seeing dupplicate reads, only in a byReads traversal. The iterator discards any reads in the current interval that would have been seen in the previous interval.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1317 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-25 22:36:29 +00:00
asivache a361e7b342 SAMDataSource is now exposed by GATK engine; SamFileHeaderMerger is exposed from Resources all the way up to SAMDataSource, so now we can see underlying individual readers should we need them; GATK engine has new methods getSamplesByReaders(), getLibrariesByReaders(), and getMergedReadGroupsByReaders(): each of these methods returns a list of sets, with each element (set) holding, respectively, samples, libraries, or (merged) read groups coming from an individual input bam file (so now when using multiple -I options we can still find out which of the input bams each read comes from)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1315 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 22:59:49 +00:00
hanna 2024fb3e32 Better division of responsibilities between sources and type descriptors.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1314 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 22:15:57 +00:00
ebanks 59f0c00d77 -set indel cleaning walkers to be in core package
-move Andrey's alignment utility classes to core


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1307 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-24 05:23:29 +00:00
aaron 0b16253db3 an iterator to fix the problem where read-based interval traversals are getting duplicate reads because reads span the two intervals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1305 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 23:59:48 +00:00
ebanks 477502338f moved major indel cleaning pieces to core (yippee!)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1301 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 19:59:51 +00:00
ebanks 4efe26c59a Major: allow genotyper to optionally output in 1KG format, including outputting the samples in which indels are found.
Minor: refactor 454 filtering


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1300 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 19:53:51 +00:00
ebanks ee8ed534e0 print full genotype for alt allele
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1297 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-23 01:35:23 +00:00
depristo 9c12c02768 AlleleBalance and on/off primary base filters -- version 0.0.1 -- for experimental use only
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1294 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-22 17:54:44 +00:00
hanna 6e4fd8db4a Better formatting of available walkers, and only output them along with help. Cleanup JVMUtils.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1290 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 22:23:28 +00:00
depristo 761d70faa1 Better printing of multiple rods -- now produces a comma-separated set of values
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1289 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 21:58:27 +00:00
depristo 8588f75eb6 Better printing with toSimpleString() -- now prints out chip-genotype string
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1288 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 21:57:59 +00:00
hanna 1843684cd2 Cleanup: GATKEngine no longer needs to be lazy loaded, b/c the plugin directory no longer exists.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1287 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 18:50:51 +00:00
hanna b43925c01e Switched to Reflections (http://code.google.com/p/reflections/) project for
inspecting the source tree and loading walkers, rather than trying to roll
our own by hand.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1286 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 18:32:22 +00:00
kiran 436a196e2b Bug fixes to support hapmap genotyping concordance.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1285 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 16:20:10 +00:00
aaron f13a1e8591 adding a couple of small changes to support contract with VariantEval
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1283 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 03:49:15 +00:00
aaron b4adb5133a GLF rod as a AllelicVariant object.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1282 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-21 00:55:52 +00:00
ebanks 54fce98056 duh, don't print newline
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1280 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-20 03:04:27 +00:00
ebanks 1d2b545608 add FLT toString method (to be used in PrintRODs) and add it to ROD list
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1279 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-20 02:47:50 +00:00
ebanks 387316ebe1 added indel rod
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1276 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-17 16:05:51 +00:00
ebanks da4af3b620 print indels in the format required for 1KG submissions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1275 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-17 15:59:18 +00:00
ebanks d45c90b166 ROD to represent simple output from IndelGenotyper
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1274 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-17 14:36:12 +00:00
hanna df1c61e049 Re-add the plugin path.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1271 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 22:48:44 +00:00
hanna 7c30c30d26 Cleaned up some duplicate code in preparation for making plugin dir configurable.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1270 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 22:02:21 +00:00
depristo 107f42a01e Hacks for getting GLFs support in the Rod system working
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1268 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 21:03:47 +00:00
ebanks 88ffb08af4 Need to return real values for some of the AllelicVariant methods
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1264 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-16 02:31:10 +00:00
ebanks ba349e8d52 add FLT ROD
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1257 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 19:40:50 +00:00
ebanks 800f7e6360 make AllelicVariant extend ReferenceOrderedDatum (not Comparable) since ROD itself is Comparable. Then we can generalize RMD tags.
Blame Matt if this doesn't work - he said it wouldn't break anything.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1256 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 19:25:06 +00:00
ebanks 5be5e1d45f added conversion from iupac format and new rod to deal with FLT file format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1254 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 18:34:41 +00:00
aaron d36e232ed3 adding GLF rods to the module list
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1252 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 15:42:34 +00:00
aaron 9ecb3e0015 adding GLFRods with tests and some other code changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1251 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 15:30:19 +00:00
hanna c25f84a01c Regression: we lost our hack to work around BAM files with index problems (affects BAM files created before 23 Apr 2009 and traversed by interval). Added the hack back in, along with a much more explicit comment about why its there.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1248 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 14:41:37 +00:00
ebanks 513d43b5f3 now implements AllelicVariant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1246 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 14:06:25 +00:00
ebanks d369136bda depricate this ROD yet again
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1245 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-15 13:33:03 +00:00
ebanks efcbb16688 un-deprecate this ROD and make it implement Genotype
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1240 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 19:45:41 +00:00
depristo 84d407ff3f Fixing odd merge problem with VariantEval -- better cluster analysis (no cumsum), rodVariant is now an AllelicVariant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1239 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 18:53:27 +00:00
hanna 76b09a879b Display a more intelligent error message if the user runs a locus traversal across an unmapped reads file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1238 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 18:36:09 +00:00
hanna 99f9cd84ed Warning for possibly mismatched reads / reference was very aggressive. Relax
the criteria a bit.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1234 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 16:21:22 +00:00
hanna 12b5d9c70c The number of loci can easily overflow an int. Change reduce type to a Long.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1233 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 16:07:00 +00:00
depristo 5bf7647498 0.2.3 -- now preserves Q0 bases throughout the reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1232 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 12:27:31 +00:00
hanna 0f6bfaaf73 Skip validation in case of no reads aligning.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1230 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 02:03:36 +00:00
hanna bfe90af5e2 Some quick and dirty fixes to support querying unmapped BAM files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1228 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-14 01:25:20 +00:00
hanna 9f0fb9f3aa Fix for GSA-90: GATK banner and error messages should point to the wiki website.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1226 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 21:56:41 +00:00
hanna b18caa2052 Fix for GSA-90: System isn't failing with an error when you use the wrong reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1225 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 20:42:12 +00:00
hanna 5c321f9630 Oops! Accidentally deactivated the ArgumentFactory, needed by the CleanedReadInjector, while refactoring last night.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1223 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-13 16:41:55 +00:00
ebanks 0070b8ea6a Until 454 goes far, far away, at least we can completely ignore it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1219 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 18:31:53 +00:00
asivache b08b121756 synchronyzing; debug statements commented out, so nothing changed really
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1215 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 16:38:33 +00:00
asivache a1eb128377 few more detailed debug printouts conditioned on if (DEBUG), so no real changes...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1214 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-10 16:36:57 +00:00
hanna 03e1713988 Better support for specifying read filters to apply directly from the walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1212 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 23:59:53 +00:00
aaron ce08f5f0c3 Removed some unused variables, fixed some javadoc. The usual.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1211 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 22:10:22 +00:00
aaron 9cfd89c54f a small refactoring, and some documentation cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1210 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 22:03:45 +00:00
aaron d86717db93 Refactoring of the traversal engine base class, I removed a lot of old code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1209 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 21:57:00 +00:00
ebanks 3519323156 Output the correct geli text format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1208 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 19:45:18 +00:00
ebanks 99631cdaa1 fix and then deprecate the rodGELI class (GELIs suck)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1207 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 19:18:13 +00:00
hanna 5e26770634 Hack the MicroScheduler to be tolerant of RefWalkers. We need to implement a longer-term solution to make it easier for datasources to report problems they've encountered along the way (GSA-103).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1205 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 17:26:59 +00:00
ebanks 3fe7104963 Added walker to filter out clustered SNPs from a call set
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1203 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-09 03:16:27 +00:00
hanna 3f0304de5a Get rid of unused iterator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1200 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 20:39:16 +00:00
hanna da4d26b1ea Enum support for command-line argument system, and some cleanup for hacks to the CleanedReadInjector that were required because Enum support was missing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1199 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 20:26:16 +00:00
ebanks aacec3aeb0 rod for binary GELI files (still needs to be tested)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1198 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 20:25:56 +00:00
hanna 433ad1f060 Cleanup...deprecate FastaSequenceFile2 in favor of IndexedFastaSequenceFile or ReferenceSequenceFile from Picard, depending on the application.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1196 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 18:49:08 +00:00
hanna d8fbb2b62c Refactoring; make a better home for the MalformedReadFilteringIterator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1194 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-08 16:54:20 +00:00
hanna 4ba2194b5e Filter reads whose alignment starts past the end of the contig to which it allegedly aligns.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1188 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 22:27:44 +00:00
hanna 5d7393d7cb Temporary fix for Eric's problems with SOLiD reads: make sure the command-line argument system takes the --validation-strictness command-line argument into account when creating SAMFileReaders.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1183 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-07 15:18:05 +00:00
hanna 5735c87581 Basic infrastructure for filtering malformed reads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1178 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 22:50:22 +00:00
hanna 31313481f6 Temporary patch to filter out bad alignments that aren't quite fully reported as bad.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1176 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 18:41:55 +00:00
hanna d19366eaad Cleanup emergency fixes for out-of-bounds issues in reference retrieval. Fix spelling mistakes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1173 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-06 15:41:30 +00:00
jmaguire 4019cd2bd7 Added ROD for parsing hapmap3 genotype files.
Tweak to TabularROD to allow HapMapGenotypeROD to work.
Added HapMapGenotypeROD to list of RODs in ReferenceOrderedData.java.
Modified MultiSampleCaller to return a single object with most of the relvant information.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1169 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-05 16:28:24 +00:00
ebanks e5e249d4ac temporary fix to deal with screwy SOLiD reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1168 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-05 03:25:57 +00:00
depristo cf1854b339 Fix for monsterous problems with solid data -- now can dynamically expand recalibration tables on the fly as reads declare additional read groups -- use assumeFaultyHeader flag
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1167 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 17:15:49 +00:00
depristo bcda66d2db Simple performance improvements
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1166 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 16:45:23 +00:00
hanna 0d00823332 Fix for performance bug in extending the read with X's in cases where the read is aligned off the end of the contig.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1165 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-03 16:17:38 +00:00
hanna 62807139fc Cleanup pileup and depth of coverage in preparation for release. Add pileup, depth of coverage, and print reads to package for distribution.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1159 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:54:01 +00:00
aaron 1c83b4d949 forgot to take out some test code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1157 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:18:37 +00:00
aaron bc17ff567a When you get the reference string for a read that is mapped partially off the end of a contig, the string is masked with X's for base positions without corresponding reference positions. Now with a test case!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1156 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 14:15:50 +00:00
depristo 47cb9f169e Stable tool that's the reverse of merging -- splits a file into individual BAM files, one for each sample ID in the SAM header
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1155 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 12:56:46 +00:00
aaron bb92eb8b1c added a fix for overlapping reads in the locus context
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1153 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-02 02:08:59 +00:00
hanna 9b182e3063 Prep for documenting command-line arguments: delete some arguments that don't make sense any more given
the state of the traversals and GATK input requirements: all_loci (replaced by walker annotation), max
OTF sorts (bam files must be sorted and indexed), threaded io (replaced by data sharding framework).


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1144 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 18:23:35 +00:00
aaron d58eeb7539 Don't cry wolf: only one warning is now emitted, instead of tons of warnings.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1139 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 13:50:37 +00:00
hanna a3e0ec20c4 Kill the TraverseByLocusWindows traversal. TraverseLocusWindows will take its place.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1138 348d0f76-0448-11de-a6fe-93d51630548a
2009-07-01 13:46:35 +00:00
hanna e93f751bd7 First step in replacing the Hello, World! document. Revamped the HelloWalker and checked it into the source tree, created a special build file for it, and added it to the packaging tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1135 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 21:59:54 +00:00
aaron f5cba5a6bb Fixed genome loc to be immutable, the only way to now change it's values is through the GenomeLocParser.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1132 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 19:17:24 +00:00
depristo 9fca79ed62 Read groups are now sorted in the output data, for convenience
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1129 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-30 16:50:44 +00:00
aaron 03f8177a53 When you get the reference string for a read that is mapped partially off the end of a contig, the string is masked with X's for base positions without corresponding reference positions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1121 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 20:51:55 +00:00
depristo 7ecc43e9a7 Fixed subtle null ptr exception discovered by Kiran. Now deals with the rare situation where you have only say Q28 bases at dbSNP sites, so you fail in the Table recalibration step with a null pointer error into the data structure indexed by quality score. If you are Q score above those seen before you aren't modified in any way.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1118 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 18:57:42 +00:00
ebanks 95e2ae0171 Deal with reads whose ends are aligned off the end of a chromosome.
Includes update to ignore non-ATCG bases (not just 'N')
(Also, create a BWA dir for future work)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1117 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:50:05 +00:00
jmaguire 65a788f18a Added a ROD (SangerSNP) for parsing the Sanger's chr20 pilot1 SNP calls.
Some doodling around with indel calling in an EM context.
 



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1116 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-29 16:32:12 +00:00
aaron d7d4298917 Some files to support generic genotype outputing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1112 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-26 15:43:41 +00:00
hanna 491ed70b44 TraverseByLocusWindow -- asstd bug fixes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1109 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:51:38 +00:00
depristo 5289230eb8 Version 0.2.1 (released) of the TableRecalibrator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1108 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 22:50:55 +00:00
hanna ad3a3aa350 First pass at passing lists of files / lists of interval arguments work. Note that the interval
ROD system will throw up its hands and not deal with intervals at all if multiple interval files 
are passed in (see JIRA GSA-95). 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1105 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:44:23 +00:00
ebanks 83816fb801 Stop using the annoying refIterator (temp change until new traversal is green lighted)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1103 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 20:05:39 +00:00
ebanks 0d9041380d remove printouts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1100 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:54:14 +00:00
hanna 102b38c055 Sketch of new version of TraverseByLocusWindow, and a flag to conditionally turn it on.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1097 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 18:20:56 +00:00
aaron 5b1c23a7f2 changes to fix and test the interval based traversals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1095 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 17:54:15 +00:00
ebanks 347608cfe0 remove hacked traversal in preparation for move to Matt's new one
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1091 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-25 14:32:05 +00:00
depristo 0a50f2e160 Updated and near final version of tabular recalibration system. Uses 'yates' correction for low-occupancy quality bins. Faster and more robust handling of input and output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1082 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-24 03:52:12 +00:00
hanna ef546868bf Pooling of unmapped reads -- improves runtime of files with tons of unmapped reads by an order of magnitude.
Desperately needs cleanup.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1080 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-23 23:48:06 +00:00
aaron 8b4d0412ca Changed the duplicate traversal over to the new style of traversal and plumbed into the genome analysis engine. Also added a CountDuplicates walker, to validate the engine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1072 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 21:11:18 +00:00
aaron bcb64d92e9 Aaron: 1, GenomeLoc: 0. I changed our GenomeLoc class, seperating the creation of a genome loc (with the reference setup) to a parser class. GenomeLoc now just represents the actual genomic postion. The constructors are now package-protected (to enforce using the parser), but we may want to expose some constructors in the future.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1069 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-22 14:39:41 +00:00
depristo 26eb362f52 Added novel / known split to variant eval. That is, emits all of the standard analyses on SNP partitioned into those known in the provided known db and those novel. Also fixed problem with counting bases within subsets
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1068 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 21:27:40 +00:00
depristo d3f0c51944 longer update times so we don't overwhelm when running genome-wide
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1067 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-21 14:10:02 +00:00
depristo 9e26550b0d Apprach v2. Added python analysis script, so java no longer must be used to analyses quality score data. About to refactor out lots of unneeded code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1063 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-20 16:00:23 +00:00
hanna dde52e33eb Cleanup of the cleaned read injector based on Eric's feedback.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1062 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 22:04:47 +00:00
depristo 8ac40e8e2d Updated version of the recalibration tool
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1060 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-19 17:45:47 +00:00
depristo d748c85dc4 Cleaned code and reorganized -- moving in the right direction for v2
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1052 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 22:28:34 +00:00
depristo 1bca144119 Moving things around
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1049 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:06:46 +00:00
depristo 3c40db260d Added REFERENCE_BASES required annotation for performance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1047 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 21:03:57 +00:00
kiran 7a921c908c Can now adjust the genotype likelihoods of a variant returned from the rod. This automatically causes the lodBtr, lodBtnb, and genotype to be recomputed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1041 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-18 07:26:37 +00:00
kiran c4d9058f32 Added module rodVariants.class to the list of allowable RODs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1037 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 21:33:13 +00:00
kiran ab2a80f3ea A new ROD type that allows one to input a geli.calls file back into a walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1036 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 21:32:21 +00:00
hanna cba9025983 More package-level documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1030 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 16:28:45 +00:00
hanna 43a28750e0 Package level documentation -- helps new users get acclimated to the codebase more quickly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1029 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-17 16:27:48 +00:00
aaron 6ee64c7e43 added changes to support alec toUnmappedRead seek. Huge improvements (orders of magnitude) in unmapped read performance.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1021 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 22:15:56 +00:00
aaron 7db4497013 fixing the readTraversal output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1019 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 19:44:38 +00:00
ebanks 647b8a1ab0 Fix TabularROD printing and testing so Aaron stops nagging me.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1016 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-16 15:49:26 +00:00
aaron a0a549557f added a check of the sort ordering to the query methods, so that we detect if a file is unsorted much earlier. Also added some verbosity to the exception; it now contains an information about the raw attribute we saw for 'SO', the sort order of the bam file.
Also fixed a bunch of documentation

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1015 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 22:15:03 +00:00
ebanks 11aa715630 added capability for filtering by platform
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1011 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 19:19:50 +00:00
ebanks 8f4bc8cb6e Move filtering functionality into the PrintReadsWalker. More to come.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1010 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-15 16:38:08 +00:00
kiran 0583459839 Another formatting change to make Hapmap sites more clearly visible.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1004 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 19:53:21 +00:00
kiran e9be2a9c60 Changed a formatting issue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@1002 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-12 19:40:32 +00:00
ebanks 032d0436e6 Added ROD for 1KG SNP calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@988 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 19:53:51 +00:00
ebanks ffffe3b2f6 -Support for 1KG SNP calls in RODs
-Minor bug fix


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@987 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 18:56:37 +00:00
aaron 63b5c12cbd Changed dataSources to datasources, to be consistant with the rest of our package names. Also, this makes me champion in the largest check-in contest.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@985 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 18:13:22 +00:00
hanna 678ddd914f Stopgap fixes GFF, DbSNP being half-open rather than half-closed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@980 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 21:38:57 +00:00
aaron 94b0e46d12 checked in a sample xml file used to store the defaults for the SomaticCoverage tool, and added it to the SomaticCoverage.jar in build.sml. Also added a inputStream marshalling method to the GATKArgumentCollection.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@979 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:46:16 +00:00
aaron 72a81f8f25 removed the requirement that a bam file list be present in the XML version of the command line arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@975 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:01:13 +00:00
aaron f304803811 initial check-in of an easy way to create command line tools based on the GATK
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@970 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:34:02 +00:00
kiran b0cc763eb5 Added some methods to format bases such that read bases on the forward strand are in uppercase, while those on the negative strand are lowercase. This does *not* affect the default functionality of the standard PileupWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@969 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:31:00 +00:00
hanna ad80894afa Bumped picard to latest svn version.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@965 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 14:36:34 +00:00
aaron ec2f015447 fixed a bunch of comments and license headers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@964 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 14:10:46 +00:00
hanna dc6a9ca196 Pooling resources to lower memory consumption.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@962 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 13:39:32 +00:00
kiran 3adb4239e4 Same as regular Pileup, but also allows you to see flanking region around locus. This will be useful in determining that some SNPs are spurious due to being at the ends of homopolymer regions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@959 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:19:31 +00:00
depristo 7fa84ea157 10x speedup of recalibration walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@954 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 15:39:40 +00:00
aaron a62bc6b05d fixed some documentation and attached a correct license
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@953 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 14:44:27 +00:00
aaron bf6190b471 cleaned up the PrintReadsWalker, and added a lot of documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@952 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 14:28:32 +00:00
kiran fecba2cae5 Disabled option to show secondary quals as the definition has changed to conform to the spec and thus this printout is non-sensical.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@950 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 03:21:14 +00:00
aaron a8a2d0eab9 added support for the -M option in traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@935 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 15:12:24 +00:00
asivache 9f35a5aa32 Insidious bug: clipped sequences (S cigar elements) where a) processed incorrectly; b) sometimes caused IntervalCleaner to crash, if such sequence occured at the boundary of the interval. The following inconsistency occurs: LocusWindow traversal instantiates interval reference stretch up to rightmost read.getAlignmentEnd(), but this does not include clipped bases; then IntervalCleaner takes all read bases (as a string) and does not check if some of them were clipped. Inside the interval this would cause counting mismatches on clipped bases, at the boundary of the interval the clipped bases would stick outside the passed reference stretch and index-out-of-bound exception would be thrown. THIS IS A PARTIAL, TEMPORARY FIX of the problem: mismatchQualitySum() is fixed, in that it does not count mismatches on clipped bases anymore; however, we do not attempt yet to realign only meaningful, unclipped part of the read; instead all reads that have clipped bases are assigned to the original reference and we do not attempt to realign them at all (we'd need to be careful to preserve the cigar if we wanted to do this)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@933 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 05:20:29 +00:00
depristo 98396732ba Bug fixes for Andrey
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@930 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 18:19:51 +00:00
depristo 819862e04e major restructuring of generalized variant analysis framework. Now trivally easy to add additional analyses. Easy partitioning of all analyses by features, such as singleton status. Now has transition/transversional bias, counting, dbSNP coverage, HWE violation, selecting of variants by presence/absense in dbs. Also restructured the ROD system to make it easier to add tracks. Also, added the interval track -- if you provide an interval list, then the system autoatmically makese this available to you as a bound rod -- you can always find out where you are in the interval at every site. Python scripts improved to handle more merging, etc, into population snps.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@918 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 23:34:37 +00:00
aaron 199be46c36 changed the warning that is outputted when the GenomeLoc constructor can't find the given contig in the reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@913 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 15:49:03 +00:00
aaron b323c58ef2 add a place to store the walker return value, along with a method to retrieve it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@910 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 14:41:42 +00:00
ebanks 36fb6ca3c5 Allow user to specify the compression to be used when writing out BAM files.
Updated most of the walkers to reflect this change.
Now it won't take forever to write BAMs!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@909 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 08:48:34 +00:00
ebanks 45eeefbb80 Deal with randomly occurring unmapped reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@906 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 02:55:53 +00:00
aaron 109bef6c08 We're no longer in the read-dropping business.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@901 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 22:37:51 +00:00
depristo 13be846c2a qualsAsInt argument for Pileup -- fixing stupid bug [again]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@898 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 18:52:12 +00:00
depristo 97c8ff75dd qualsAsInt argument for Pileup -- fixing stupid bug
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@897 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 18:51:17 +00:00
depristo 9de3e58aa8 qualsAsInt argument for Pileup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@896 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 18:37:39 +00:00
asivache bcc7bacba1 added List<Transcript> getTranscripts(); also more comments added
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@894 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 16:25:14 +00:00
depristo b492192838 Pairwise SNP distance metrics now enabled
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@892 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 00:11:29 +00:00
hanna 6e60cddfed A fix for the 'rod blows up when it hits a GenomeLoc outside the reference' issu
e.  Really a stopgap; error handling in the RODs needs to be addressed in a more comprehensive way.  Right now, hasNext() isn't guaranteed to be correct.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@878 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 18:14:46 +00:00
aaron fc91e3e30e equals signs can be important
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@870 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 16:56:21 +00:00
aaron 4edb33788b added a fix for a bug Andrew found
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@869 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 16:53:56 +00:00
hanna b7defeae83 Fix bug in unit tests created by new filter in TraversalEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@868 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 15:50:44 +00:00
hanna fc7320133c Cleaned up error when fasta index is missing. Code still throws an exception, but the message is more direct (no more 'error while micromanaging') and tells the user to run 'samtools faidx' to fix the issue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@867 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 15:34:38 +00:00
hanna c04b67c969 Basic instrumentation support for the hierarchical microscheduler.x
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@862 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 22:19:27 +00:00
asivache c252fec1bc synchronizing, no real changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@859 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 21:56:14 +00:00
asivache eafdba7300 more efficient implementation of line parsing, runs at least 1.5 times faster
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@858 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 21:09:06 +00:00
hanna 8761ab3aff Oops. IteratorPool was occasionally creating too many RODIterators in cases where some reference-ordered data was missing. Fixed by better tracking position of RODIterator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@857 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 21:00:31 +00:00
hanna a1edb898ef Make criteria for determining whether to stop and merge inputs more sane.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@855 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 18:08:18 +00:00
depristo e0803eabd9 enabled underlying filtering of zero mapping quality reads, vastly improves system performance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@853 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 14:51:08 +00:00
hanna 1f93545c70 Always opt to merge dictionaries when creating a SAMFileHeaderMerger.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@852 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 22:38:16 +00:00
hanna 0cf90b6f8a Tie into sequence merging code in the latest version of picard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@851 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 21:48:35 +00:00
hanna 5e8c08ee63 Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@849 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:13:01 +00:00
hanna aa17c4a468 Farewell, functionalj. You promised much, but you could not deliver.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@847 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 01:35:49 +00:00
depristo ce6a0f522b First incarnation of the population-based SNP analysis tool. Also bug fixes throughout the GATK
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@845 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 22:02:24 +00:00
hanna a11bf0f43e Basic unit tests for ReferenceOrderedView, ShardDataProvider. Addressing GSA-25.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@844 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 21:15:01 +00:00
aaron 5c6163ecbf Removing the old reads traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@842 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:36:11 +00:00
aaron c7b032cc88 missed a file in the add.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@841 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:27:38 +00:00
aaron 3c3cd5bb64 Moving some of the data sharding around. A new shard catagory now exits, INTERVAL. This saved a lot of code that was mirroring the same approach in both the read and locus shard strategies.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@840 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:24:31 +00:00
asivache 99524ab6d0 package name corrected
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@839 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:20:43 +00:00
asivache b76f8c4eb5 moved from playground to gatk
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@838 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:18:33 +00:00
asivache ae0bac5696 'made public' implies the 'public' keyword, actually...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@835 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 17:57:01 +00:00
asivache 41c1a62ac4 formerly private class, factored out and made public. Represents a transcript annotation (transcript id, genomic location, genomic intervals for all exons present in this transcript, etc)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@834 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 17:52:38 +00:00
hanna 864a1e81e3 Delete stale class from previous rethink of the traversal engine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@828 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 13:52:03 +00:00
hanna a488d2dbb2 Lazy creation of output streams. Only create output streams when absolutely necessary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@824 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:56:57 +00:00
asivache d73f2e95cc refseq added to the list of known rod types
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@820 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:06:44 +00:00
aaron d994544c47 Added back end code support for Sharding based on genomic location for reads. Changed the sharding
code to take GenomeLocSortedSet instead of a list<GenomeLoc>, and added a bunch of much simplier 
and cleaner test cases.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@816 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 20:57:46 +00:00
hanna 54bb643d19 Validated Mark's assertion that GSA-27 is fixed. Also did some cleanup on the pileup walker so that it doesn't output to System.out.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@812 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 15:58:21 +00:00
hanna 008d677bea Fixed ValidatingPileup to work with Andrey's new rodSAMPileup -> GenotypeList type hierarchy.
Fixed reference-ordered data validation system to validate class hierarchies instead of specific class types.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@811 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-23 20:50:28 +00:00
hanna 34413362fd Bugfix: handle case where queue is empty.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@808 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:45:22 +00:00
hanna ec2e8d5726 Fixes for getting ValidatingPileup running in parallel.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@807 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:20:24 +00:00
hanna 2a5be1debe Cleanup in datasources.providers namespace. Make it easier for others writing traversal engines to use.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@803 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:12:00 +00:00
asivache d5bb4d9ba9 Auxiliary class that can read one line from samtools pileup file. Used by rodSAMPileup to read pairs of lines as needed. NOTE: this class implements Genotype and (a trivial) GenotypeList, but it is NOT a rod!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@798 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:20:01 +00:00
asivache 732fed9aad ALERT, ALERT! rodSAMPileup is now a GenotypeList, not a Genotype! Now it can intelligently read full samtools pileup files (containing, in general, both point and indel genotypes at the same position). No need to split/synchronize pileups from different individuals anymore, hooray!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@797 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:17:59 +00:00
asivache 26633957d9 Genotype interface is extended: now it requires implementing object to be able to tell whether it isPointGenotype() or isIndelGenotype() (and the contract requires, e.g. alleles to be represented differently)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@796 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:14:46 +00:00
asivache 8773b3a430 a trivial wrapper interface for the objects capable of holding 'full' genotype, i.e. both point (as in ref/snp) and indel variants at the same reference position
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@794 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:12:01 +00:00
depristo 7a979859a9 Intermediate checking for evaluation -- now supports transition / transversion evaluation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@793 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:05:06 +00:00
ebanks f2ea193149 For some reason the apostraphes in the comments were throwing annoying
compile-time warnings: "unmappable character for encoding UTF8"


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@792 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 14:07:07 +00:00
depristo 30c63daf89 More improvements to the duplicate quality combiner, making progress towards a clean system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@788 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:26:57 +00:00
depristo 65995887fc Releasable version of the Pileup walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@786 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:25:37 +00:00
hanna d61a5261c1 Better integration of reference-ordered data into the data sharding system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@779 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:09:32 +00:00
andrewk 0219d33e10 QualityUtils: added reverse function to reverse an array of bytes (and not complement it), BaseUtils: split qualToProb into itself and qualToErrProb, CovariateCounterWalker and LogisticRecalibrationWalker: several changes including a properly acocunting (only partly complete) for reversing AND complementing bases that are negative strand, PrintReadsWalker: created option to output reads to a BAM file rather than just to the sceern (useful for creating a downsampled BAM file)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@770 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 18:30:45 +00:00
asivache 7e5e422591 ReferenceOrdereData now inspects the ROD class using reflection. If the ROD declares a static Iterator<ROD> createIterator(String rodName, File rodFile) factory method, it is wrapped and used by the ReferenceOrderedData to read records from rodFile. If the ROD does not provide such factory method, the old behavior is the default: ReferenceOrderedData uses its own simple default iterator to read the file line by line (assuming there is only one line per record/position).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@768 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 15:23:22 +00:00
hanna 26dd3cd50e Cleanup. Move filtering functions closer to where they're used.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@767 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 21:42:48 +00:00
hanna e7a6f8cdc4 Removed evidence of a previous incarnation of data sharding.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@766 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 20:48:33 +00:00
hanna 3cad580655 Catch and rethrow the walker's required argument, so that command-line arguments will be displayed when the GATK throws an argument exception.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@765 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 19:17:16 +00:00
hanna dc748d9c9c Integrate more feedback on command-line argument system. Focus on help
formatter: separate required from optional but otherwise keep ordering
the same, reorder GATK arguments by usage.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@764 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 19:01:25 +00:00
ebanks 57918de753 add the @Requires for this walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@762 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 17:03:12 +00:00
hanna 96e73e496a Delete deprecated old-school traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@758 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-20 14:57:17 +00:00
hanna 01a3cb27c7 @Required / @Allows flags for main arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@751 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-19 23:26:17 +00:00
hanna ff798fe483 Reintroduce support for interval-based traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@749 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 22:54:18 +00:00
hanna c10741e9f5 Rename TraverseLociByReference to TraverseLoci to represent its new function.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@743 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-18 01:31:57 +00:00
hanna 2c4de7b5c5 Switch TraverseByLoci over to new sharding system, and cleanup some code in passing read files along
the pathway from command line to traversal engine.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@727 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 21:02:12 +00:00
aaron 99d4ebc26d Added functionality to return the final accumulator of a traversal, so external tools can get the result of a walker.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@724 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-15 20:20:27 +00:00
depristo 7834b969b4 Better interface to the tabular ROD, now makes writing files easier. Also has corresponding test files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@719 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 23:20:11 +00:00
aaron 50f32b7f61 Added a shard strategy for the reduce-by-interval traversals. Also fixed bugs that I found along the way.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@718 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 21:20:18 +00:00
depristo 0f8e6061b6 Simple interface improvements
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@717 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 21:08:09 +00:00
depristo 8e9e2f4502 Revised ROD system. Split the system in Basic type and interface. Enabled more control over rod accessing, including an initialize() function to fetch headers and other options from the file. Added general tabular rod, which has a named columns and supports a map<String,String> interface. Comes with shiny new Junit system for RODs. Also, added simple python script for accessing picard data.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@716 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 21:06:28 +00:00
aaron d8c1b010f1 Fixing the naming of the function I checked in earlier.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@713 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 19:27:10 +00:00
ebanks b62bddee42 The header was never being set.
Added this hack for now and will alert the authorities ASAP... 


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@708 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 17:18:51 +00:00
aaron 7aa90757ac Moved the iterators over to the StingSAMIterator interface. This will help us ensure that iterators that need to be closed get closed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@702 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:52:18 +00:00
aaron c3b2c66911 The GATK doesn't need the rest
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@698 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 16:20:45 +00:00
aaron 0215905bb6 Added an adapter class, that will adapt plain iterators and closeable iterators of SAMRecords into STingSAMIterators. Also unit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@697 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 15:17:32 +00:00
hanna 80c13f7127 Added a getter for command-line arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@695 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 13:55:52 +00:00
hanna 307c6e4ecf Oops. Forgot to add new file to svn.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@694 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-14 00:52:30 +00:00
hanna d14cab0be7 Added IterableLocusContextQueue and test. Cleaned up tests, adding BaseTest where it didn't exist. Enhanced test runner to run only classes ending in ...Test.java, so that utility classes can sit alongside the tests but won't be run by JUnit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@693 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 21:32:05 +00:00
hanna 12ae3a22b6 Break locus context data access providers into modular components in preparation for traverse by loci.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@689 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-13 18:51:16 +00:00
aaron 6e69193e3c Deprecated calls to getSamReader on both the GenomeAnalysisEngine and the TraversalEngine. This call fails in the new style traversals, but it won't disapear until the cut-over to the new traversals is complete.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@671 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 18:52:42 +00:00
ebanks 630066cc0a 1. Merge LocusWindows whose reads overlap.
2. Fix bug (we weren't clearing the "to emit" list)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@670 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 17:33:23 +00:00
aaron 9f942fdfa0 Added code to correct the violation of the parsing interface. Now the analysis type resides in the command line arg, but is stored into the argument collection before it's passed to the genomeAnalysisEngine.
Also fixed a bug where we'd exception-out if we didn't provide a interval region.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@669 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 15:33:55 +00:00
hanna ee9077fc69 LocusIterator iterated through LocusContexts, which was fine until now when we need something
that iterates through loci (GenomeLocs).  Rename LocusIterator to LocusContextIterator.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@662 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 13:54:57 +00:00
hanna 608948210c Check for a reference before extraction.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@661 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 13:29:44 +00:00
hanna 32696b13f5 Fixed method override issue with old-style traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@660 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 01:22:18 +00:00
hanna 862b8a6787 intervals_file + genome_loc => intervals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@659 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-12 01:04:18 +00:00
hanna 23e9e29964 Changed reads traversals from providing a LocusContext from which the reference sequence
could be extracted to a char[] containing the reference bases.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@657 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:45:11 +00:00
hanna 052819bed5 Switched dependencies of GenomeAnalysisTK to depend on GenomeAnalysisEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@656 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:33:00 +00:00
aaron ff1b92acc4 Switch over to the GenomeAnalysisEngine/CommandLineGATK system from the GenomeAnalysisTK code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@655 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 22:05:58 +00:00
ebanks 009e71fcd9 We need to sort cleaned reads ourselves (instead of letting SAMFileWriter
do it) because the SAM headers are often screwed up and claim to be
"unsorted".  While here, I broke off the module from the SortSamIterator
in case someone else wants to use it.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@654 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 15:43:42 +00:00
aaron c735e1f627 small javadoc cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@653 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 03:44:21 +00:00
aaron e8b8ab5985 Added code to extend Matt's getReferenceBases out to the read walkers, so they can see the corresponding reference for each read.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@652 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 03:42:38 +00:00
aaron 898f65547e Added code to split GenomeAnalysisTK.java into an object concerned with loading command line args, and one that runs the engines. This will allow us to run the GATK from other tools (like Matlab). Also some cleanup to seperate out the legacy traversals and the new style traversals. This is not live yet, and any modifications you need should be made to GenomeAnalysisTK.java for now.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@650 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 02:07:20 +00:00
aaron 8d43ec3d7e a fix for a situation where a chromosome on the reference file contains no reads, and doesn't align to the bam file. This came up using reference 18, which has chomosomes like chr1_random that aren't in all BAM files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@649 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-11 01:39:25 +00:00
hanna 55c1b688bd Fix mediocre javadoc.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@646 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 22:31:16 +00:00
hanna 522f8b58be Added second method for getting large sequences of the reference for use in reads traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@645 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 22:18:04 +00:00
aaron 517f27f331 Added sharding strat. code that picks the right kind of shard, based on the traversal engine
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@644 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 21:55:10 +00:00
hanna 6e394490cb Cleanup in preparation for ByLoci traversal. Also did some work minimizing unit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@643 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 21:27:54 +00:00
hanna ee777c89de Change the default mechanism for adding ROD bindings to the new system. TODO: create a new object type for these triplets.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@642 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 18:43:00 +00:00
ebanks 3aabc144c6 Added functionality to allow for a contract between LocusWindowTraversalEngine and LocusWindowWalker which allows the Walker to act upon reads outside of the provided intervals.
(Really, all we want to do is spit out all reads, but this allows the Walker to do other things with the reads if it wants)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@641 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 17:28:16 +00:00
hanna de1c282e62 Reference-ordered data relies on bugs in the old command-line argument system to work. Update the ROD system to from -B track1 type1 file1 track2 type2 file2 to -B track1,type1,file1 -B track2,type2,file2.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@640 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 15:28:19 +00:00
hanna 483a58627b More cleanup -- pushing shared functions down into the traversal engine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@639 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 14:12:45 +00:00
hanna 7a9cfe1f75 Push reduceInit down a level so that the walker can call into it without weird casts.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@638 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 13:46:28 +00:00
hanna a5154d99a3 Haven't heard any complaints, so I'm deleting the original implementation of TraverseByLociByReference. All TbyLbyR's will now go through the new sharding code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@637 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 13:37:00 +00:00
hanna 4c269b8496 Cleanup LinearMicroScheduler in preparation for TraverseByLoci inclusion.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@634 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-08 00:58:37 +00:00
hanna 7f8850a8a2 Argument validation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@631 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 20:28:56 +00:00
depristo 5a6892900e fixing oddities in duplicates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@628 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:55:45 +00:00
depristo 2204be43eb System for traversing duplicate reads, along with a walker to compute quality scores among duplicates and a smarter method to combine quality scores across duplicates -- v1
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@624 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 18:06:02 +00:00
hanna 752928df94 Switch to better mechanism for supplying a default.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@615 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-07 01:22:01 +00:00
hanna dc944ec69b First stage of ROD plumbing for MicroScheduler.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@614 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 23:26:21 +00:00
aaron 5136724884 Added code to the schedulers, one step closer to turning on the new reads traversals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@613 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 22:36:25 +00:00
aaron 0aba688e6f Added a interface that all our SAMRecord iterators should try to code to. This is in the effort to keep our code generic
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@609 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 21:40:41 +00:00
hanna 98716138e9 Cleanup: add support for non-public fields. Track matches as state of parsing engine as well as definitions.
Made fields of command-line argument system non-public by default.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@606 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 19:38:05 +00:00
aaron f5eae98af2 Fixed a bug where we could ask for a read when there were none in the pool (that's a bad thing).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@605 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 18:40:55 +00:00
hanna ef211f96b1 Remove old Apache CLI-based arg system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@604 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 18:37:51 +00:00
hanna 521aa40baa Bring new command-line argument parsing system live.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@603 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-06 18:16:11 +00:00
hanna 4ac9e72739 Migrate default and GATK arguments over to new attribute system in preparation for conversion.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@600 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 23:57:48 +00:00
hanna b0cdba8bb3 Acting on Kiran's suggestion to make the doc tag in the @Argument annotation required.x
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@598 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 22:43:40 +00:00
aaron f5880109a7 Added TraverseReads test, some bug fixes discovered in the traversal test
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@594 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 20:36:00 +00:00
aaron daa2163ee8 Made the MergingSamIterator2 peekable. This iterator is being a ducktaped together swiss army knife, the iterators could use a redo soon.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@593 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 19:15:07 +00:00
aaron 09b0b6b57d Fixes to try and speed up unmapped read traversals. Still not nearly as fast as they should be, but the next step would be to modify samtools code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@592 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-05 18:17:07 +00:00
hanna 6e38966349 Rename some key classes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@587 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 22:01:04 +00:00
hanna 5bdf653919 Cleanup: prepare for better output handling.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@586 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 21:40:46 +00:00
hanna 9f5f6f9bc7 N-way parallelism. Works for small test cases. Untested for large test cases.
-Needs more comprehensive unit testing.
-Needs some basic refactoring.
-Needs rethink of interface boundaries.
-Needs to play more nicely in the /tmp sandbox.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@583 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-01 19:34:09 +00:00
depristo 84dae06d5a Initial version of ByDuplicates traversal, as well as a duplicate quality score estimator
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@576 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 22:16:21 +00:00
depristo ff420f5f6f Enabled iterator() function
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@575 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 22:15:14 +00:00
aaron 63403d32cd Changes to the interface to the simple data source rippled out to a bunch of files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@572 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 20:35:56 +00:00
hanna 7f173af2ea Encapsulate output tracking a bit.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@570 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-30 15:12:13 +00:00
hanna ba9a0b5da8 Break out some of the weird inner classes out of the HierachicalMicroScheduler.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@566 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 21:07:07 +00:00
hanna 95d10ba314 Sketch of hierarchical reduce process, with unit tests for some core classes. Requires breakout of inner classes, testing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@565 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 20:26:16 +00:00
ebanks 7de5da7065 Start getting the cleaner working in Walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@561 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 14:59:53 +00:00
hanna 6ecc43f385 Provide a default logger, some config settings, and some doc updates.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@557 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-29 02:06:05 +00:00
aaron b836761104 removed the test cases from the bottom of this file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@556 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 21:50:22 +00:00
aaron d4de68e260 added changes for the readsTraversal to accomidate design changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@553 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 19:49:58 +00:00
aaron b6874f30cb Added changes to bounded read iterator, it now explicitly takes a MSRI2 instead of the interfaces ClosableIterator<SAMRecord>. It would be good to fix this in the future with an interface that lets you get the (possibly merged) header.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@552 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 17:57:54 +00:00
aaron 395aaf48b0 Added the new by reads traversal, still needs to be sewn into the micromanager code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@551 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-28 17:55:08 +00:00
aaron a343f3eab7 Fixed bug where we weren't setting the reads group correctly. Also added code to set the printMetrics field of the singleSampleGenotyper from the Pool caller, it was null excepting out for me without that set.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@548 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-27 15:17:20 +00:00
hanna 9a8902571c Placeholder for parallel MicroManager.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@542 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 23:08:12 +00:00
hanna 1daa011387 Interval-based traversals were bleeding file handles. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@541 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 18:35:54 +00:00
hanna 1e2e78265d Inadvertently removed interval file support in new TbLbR. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@540 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 18:15:42 +00:00
hanna c9e9731495 More cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@539 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 17:46:52 +00:00
hanna 4036f24909 Documentation and cleanup work in preparation for parallelism.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@538 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 17:42:00 +00:00
ebanks 0c76a70313 Renamed traversal by "interval" to "locusWindow"
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@537 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-26 02:26:08 +00:00
depristo 9a299c11d3 Oops, typo and build problems. FYI, fixing typos is better than packing...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@536 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-25 01:37:17 +00:00
depristo ce470702fc consistency with java naming conventions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@535 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 21:44:48 +00:00
depristo bfce0c93ab removing bad file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@534 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 21:40:04 +00:00
depristo 05c6679321 Enabled ReduceByInterval
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@533 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 21:39:44 +00:00
hanna ee2f022c71 Make new TraverseByLociByReference the default.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@532 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:50:11 +00:00
hanna e50ae97fe1 Introduce new index-based fasta reader. Clean up MicroManager code, pushing necessary code back into TraversalEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@531 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 19:40:21 +00:00
jmaguire dd408a2a9a First draft of actual pooled EM caller.
Produces sane looking output on region of 1kG pilot1:

    CALL NA12813.SRP000031.2009_02.bam CC 0.609084 0.609084
    CALL NA12003.SRP000031.2009_02.bam CC 2.114234 2.114234 CCCCC
    CALL NA06994.SRP000031.2009_02.bam CC 0.910114 0.910114 C
    CALL NA18940.SRP000031.2009_02.bam CT 2.589749 0.910114 T
    CALL NA18555.SRP000031.2009_02.bam CC 0.609084 0.609084

Next up, eval vs. Baseline pilot1 calls and pilot3 deep-coverage truth.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@525 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 13:42:15 +00:00
ebanks 13d4692d2e 1. Added a by-interval traversal.
2. Added a shell for the indel cleaner walker (it's currently being used to test the interval traversal).
3. Fixed small bug in downsampling (make sure to downsample the offsets too)
4. GenomeAnalysisTK.execute => anyone object to my change to "instanceof" instead of trying to catch a ClassCastException (yuck)?



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@524 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 04:33:35 +00:00
aaron bd4cacb832 Added code to make a read group and sample name for BAM files that don't annotate them on reads. The defaults for both are now the filename, but this may be shortened in the future.
The sample name for a read can be retrieved with the command:

read.getAttribute(SAMTag.RG.toString());




git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@518 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-24 00:31:00 +00:00
aaron 635bfd8604 Added a little bit of hack to get the header back to the walker by initialization time, which was before sharding in the last version.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@516 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 21:07:11 +00:00
aaron 0208d201c7 Forgot this in the last commit...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@515 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 20:47:22 +00:00
aaron 3dc2afd7ab Added the ability to get a merged header in a LociByReference traversal
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@514 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 20:34:52 +00:00
hanna 282f1d88b8 Make the operation 'read from the iterator and place on the queue' atomic with respect to hasNext(), next().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@513 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 20:16:26 +00:00
aaron 8c13940c5a A lot of changes to support by-read sharding and some from debugging of the by loci traversals
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@511 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 19:03:14 +00:00
hanna 3d7575bbb8 Oops...omitted walker.initialize().
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@504 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-23 17:35:28 +00:00
hanna 1bf4d040d8 Increase default shard size from 5 to 100000.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@494 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 18:29:44 +00:00
hanna 3af66a462e Make PrintLocusContextWalker less verbose.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@493 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 18:28:02 +00:00
hanna 4cafb95be8 TraverseByLoci / TraverseByLociByReference suffered from the same sam-triggered off-by-one (?) bug as TraverseByReference; it was just less obvious here because these versions don't shard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@491 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 15:48:20 +00:00
kcibul cb2f621d01 reverting accidental commit of change to shard size
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@490 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 00:33:28 +00:00
kcibul b820130dce * added ability to load multiple BAM files from command line
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@489 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-22 00:28:08 +00:00
kiran 5abfc7d079 Added an argument ('extended' or 'ext') that outputs the four-base probs in a long format.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@485 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 22:27:26 +00:00
asivache 521e202a10 updated interface
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@482 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 21:07:20 +00:00
asivache 55ca272919 reimplemented; now implements Genotype interface instead of AllelicVariant
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@481 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-21 21:06:42 +00:00
hanna eafb4633ba Temporary workaround for samtools index bug: there seems to be an off-by-one error. Will file bug report.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@470 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 23:14:41 +00:00
asivache f2f9fa3ed4 doc added
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@464 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 16:43:25 +00:00
hanna d639ec3776 Remove some copied code to make sure the traversal engine stays in sync with the locus context provider.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@463 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 16:41:56 +00:00
depristo 50ae1763f7 Support for -continue_after_errors flag in the validating pileup walker in case you want to see errors as they arise, rather than aborting greedily
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@461 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 03:13:11 +00:00
depristo ee5ab9536f trivial checking / flagging issues to enable testing of merging iterator performance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@460 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 03:11:59 +00:00
depristo dbf2344cef Fixes for including duplicate reads in the locus traversal; now checks that the ref arg is provided when needed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@459 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 01:27:36 +00:00
hanna 01be8f09e3 Exception cleanup. All our non-runtime exceptions should extend from StingException, StingException needs to be lower in the tree to build.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@457 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 22:17:25 +00:00
aaron e5c80e59dc fixed the case when you're not seeking, it didn't initalize
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@456 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 22:16:03 +00:00
hanna 165e504d1c Turn on new TraverseLociByReference is now only dependent on the -et flag. REGION_STR does not matter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@454 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 19:45:47 +00:00
aaron 12e1f192c4 Fixed a bug in this code where it would eat reads that didn't start at the beginning of the provided interval. This should fix / help fix Kristian problem
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@453 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 18:42:00 +00:00
asivache 835f1067d8 added isHom() and isHet() queries to the Genotype interface (with the obvious meaning)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@452 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 18:41:39 +00:00
kcibul d35a542bb9 * fixed bug where the merged header was not being set on the read (although the read group was)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@445 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 12:53:07 +00:00
asivache 0d324354ae separate interface for genotypes as opposed to (population) allelic variants
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@443 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 03:55:16 +00:00
depristo 7261787b71 Fixed potential bug with next() operation returning empty contexts when a read contains a large deletion. We can now use the look ahead safely...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@438 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 21:38:28 +00:00
aaron e70aecf518 bug fix, but important
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@437 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 21:07:20 +00:00
aaron 67ea66c866 Bug fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@434 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 19:12:18 +00:00
depristo 1edfe48194 Better debugging output with .debug
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@433 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 19:09:18 +00:00
depristo 9cc808104e Fixed subtle bug in permitting EXPAND_WINDOW to be > 1. We now use the right window size so we avoid including empty hangers. There's still a rare bug to sort out, which occurs in the case where a read with an indel can generate empty hangers.
Also cleaned up the debugging output.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@432 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 19:08:26 +00:00
aaron 180ff13290 Added a bunch of changes to support the new MicroManager code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@431 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 18:29:38 +00:00
aaron 12407b5b1a Deleted the old file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@427 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 13:55:01 +00:00
aaron 6db9127f90 Added changes to shattering, refactored SAMBAM into SAM
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@426 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 13:52:56 +00:00
depristo 24722a442e Slight code cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@421 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 22:21:36 +00:00
aaron 13b0995d54 Adding an iterator that bounds the number of reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@419 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 22:18:31 +00:00
depristo 72a3d84ed2 General purpose pileup code -- you can use these features to obtain detailed pileup data from reads and offsets. Useful for all pileup based walkers. Expanded support for rodSAMPileup to enable the new ValidatingPileupWalker, which takes a samtools pileup output and checks that GATK gives identical output as samtools on a per base and per qual pileup. It's going to be a very useful validation tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@418 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 22:13:10 +00:00
hanna 0629f79049 Moved fasta support files into their own package.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@408 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 18:13:23 +00:00
aaron eb4b4a053b A bunch of updates to the SAM/BAM data source, along with test cases for the merging of multiple files (it works!).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@399 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 14:19:20 +00:00
kiran 30121534ed Outputs the secondary bases and quals (if available) in verbose mode. Prefixed with the tag 'SQ='.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@398 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 13:58:28 +00:00
depristo 8b2c2e677b Uses the cleaner new GenomeLoc(read) syntax
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@396 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:55:43 +00:00
depristo 1cee7948ab Added lots of assertions to check for problems.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@395 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:55:19 +00:00
depristo 794360c410 Added verbose option to show mapping qualities and base qualities as ints!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@394 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:54:48 +00:00
depristo cc75e8f712 Uses the cleaner new GenomeLoc(read) syntax
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@393 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:53:58 +00:00
asivache 8e6093d5a5 remove mom/dad/kid cmd line arguments that were needed for mendelian walker; now we can use generic track binding!!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@389 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:45:34 +00:00
aaron 887adcfc7f Some minor fixes to the last check-in
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@387 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 18:24:51 +00:00
aaron f2d0d73309 removed old shard strategy code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@386 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 18:13:45 +00:00
aaron dd604799dc Added some new code for shard support over reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@385 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 18:11:43 +00:00
hanna e91a429c58 A class to print out as much context about the given locus site as is possible. Useful for testing traversal engines -- run old and new code across a given region and diff the output to make sure they have the same context.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@383 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 15:29:55 +00:00
asivache b4136b6d6e a few tweaks to make it more robust: ignore reads with cigars containing anything but I,D,M; don't set up contig ordering manually, rely upon reference sequence and its dictionary; don't die if a record does not have NM tag, but faal back to direct counting instead; now requires reference as a cmdline arg
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@378 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 04:49:19 +00:00
kiran 756e6c61d8 Strictness args are presented as lowercase in the help, but only accepted if uppercase. Changed help to list the valid arguments in uppercase.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@376 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:50:19 +00:00
kcibul c7777d46d6 * re-enabled setting of sequence dictionary information on GenomeLoc
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@366 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 02:44:14 +00:00
kcibul ce72932a45 * refactored GenomeLoc to use contigIndex internally for performance and fixed several calling classes
* added basic unit test for GenomeLoc
* fixed bug when parsing genome locations like chr1:5000 the start position was being left as maxint rather than being set to the same as the stop position.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@365 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 02:25:17 +00:00
hanna 608a66e6ab TbyLocibyRef previously didn't seem to support traversals with no interval specified. Put in a temporary fix until the threaded approach is in place.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@363 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 22:14:06 +00:00
hanna c2669021b8 Cleanup, and support either by-interval traversals or full traversals in data source-backed code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@362 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 22:09:01 +00:00
hanna 2322bb7d86 Workaround: use a single ReferenceIterator for an entire micromanaged traversal. We'll have to
do something about ReferenceIterator thread safety later.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@361 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 20:50:28 +00:00
hanna 95753e1b34 Should've been calling queryOverlapping in locus mode.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@360 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 20:22:04 +00:00
depristo 17b3d5b554 New ROD accessing system, including a generalized interface for binding ROD on the command line that doesn't require you to chance GenomeAnalysisTK.java
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@355 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 22:04:59 +00:00
hanna 0d825ccfc1 Oops. Fixed duplicate reference to the reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@353 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 21:27:57 +00:00
aaron 9afa101465 Add interval support to the
.__            __    __                
  _____|  |__ _____ _/  |__/  |_  ___________ 
 /  ___/  |  \\__  \\   __\   __\/ __ \_  __ \
 \___ \|   Y  \/ __ \|  |  |  | \  ___/|  | \/
/____  >___|  (____  /__|  |__|  \___  >__|   
     \/     \/     \/                \/       

classes!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@352 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 21:23:43 +00:00
hanna 8a1207e4db Bringing up scaffolding for integration of locus traversals by reference with Aaron's data source code.
Reverts to original TraverseByLociByReference behavior unless a special combination of command-line flags are used.

Lightly tested at best, and major flaws include:
- MicroManager is not doing MicroScheduling right now; it's driving the traversals.
- New database-ish data providers imply by their interface that they're stateless, but they're highly stateful.
- Using static objects to circumvent encapsulation.
- Code duplication is rampant.
- Plus more!


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@346 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:28:17 +00:00
aaron 8e2f5471a1 Some cleanup to the data source, and another JUnit test case.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@344 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 14:58:05 +00:00
aaron d56193b6df Cleanup of a couple of output statements
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@343 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 14:09:07 +00:00
aaron 12752cf893 Added a bunch of fixes: MSRI wasn't working, sharding had broken edge cases, and SAMBAM DS needed to close the file handles.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@341 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 00:20:15 +00:00
aaron d4ab95c098 Added a constructor, took out a copy constructor, and changed some SAMBAM code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@335 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 19:53:20 +00:00
kcibul 0b81a76420 added support for Picard IntervalList files to --interval_file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@334 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 16:49:43 +00:00
aaron 295c269a64 Remove the main() I put in for debugging
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@333 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 16:43:44 +00:00
aaron d517245beb Fixes for shattering, added JUnit test case
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@332 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 16:37:34 +00:00
asivache 453d13415d count variant as biallelic if it's just a non-ref homogeneous site!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@326 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 01:57:27 +00:00
depristo b49f713336 Enabled multiple argument for GATK driver; first step towards generalized -rods <name> <type> <file> argument structure
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@325 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 01:52:13 +00:00
asivache 1ade22121b cruel hack: new toolkit-wide optional cmdline arguments added to allow for loading trio genotyping tracks; to be moved back to walker when walkers can register their data needs with the toolkit
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@324 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 22:33:26 +00:00
asivache 8ec427ab66 latest version... still under dev/testing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@323 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 22:31:06 +00:00
depristo 00722e19bc The system now requires a dictionary file for a fasta file, or it throws an error. You can't just operate without a sequence dictionary any longer. We will transition to a GenomeLoc system that assumes a dictionary is available.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@319 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 22:19:54 +00:00
asivache b64e4d1a04 seekForwardOffset changed (improved?): first, compareContigs does *not*, in general, return -1,0 or 1 if no dictionary is available; second, be more flexible in trying to jump to the right contig (current implementation of FastaFile2 will still through an exception if there's no dictionary, but iterator itself behaves transparently)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@317 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 21:42:33 +00:00
aaron 2663ac3e4a documentation fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@316 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 21:39:50 +00:00
aaron 8a357a88a2 right...exponential should be exponential, so I might want to increment the exponent
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@315 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 20:12:05 +00:00
aaron 6ce9e0f941 delete the old strategy
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@314 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 19:40:03 +00:00
aaron 08fddd43af -Replaced adaptive and linear strategies with an adaptive linear strategy
-Added the exponential growth strategy
-Added factory code that allows you to transitition between strategies, so if you want to move from linear to exp at a point, and then back when you've hit a runtime threshold, it will take care of it for you.
-Changed the code to return a Shard instead of a GenomeLoc

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@313 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 19:37:38 +00:00
aaron 6369d23b43 renamed; these files are more strategy than actual shards
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@312 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 16:50:56 +00:00
asivache e95f427965 Added isReference() to AllelicVariant and updated rodDbSNP accordingly
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@311 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 14:49:20 +00:00
aaron b42d8df646 the new shatter method, independent of the underlying data. The only thing needed to create a Shard is the reference seq, which may be a problem in reference less traversals, so the builder class is there so we can make different construction schemes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@308 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 00:32:57 +00:00
aaron 150bca30aa typO in the documentation...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@306 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 23:05:59 +00:00
aaron 4aa9c0d591 Matt make a good point that the Reference Iterator we were using wasn't bounded; The BoundedReferenceIterator takes a GenomeLoc to bound the iterations by
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@305 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 23:03:56 +00:00
aaron 0fc8a90553 removing some files from the old approach to dataSource
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@303 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 21:57:34 +00:00
aaron 5feb7ee627 temperary fix, relying on a old reference order data constructor
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@302 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 21:38:41 +00:00
aaron af5a443e5a add Synchronized to the has_next and next methods
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@301 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 21:17:11 +00:00
aaron 97d14abe85 Interface check-in for Matt
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@300 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 21:14:19 +00:00
asivache 0d25e71953 a declaration is made generic
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@295 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-04 21:55:02 +00:00
asivache 551ce9130f added isBiallelic() to the AllelicVariant interface and to rodDbSNP implementation. We probably don't really know how to deal with non-biallelic sites just as yet...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@294 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-04 21:31:16 +00:00
depristo 4eac3193f7 Added RefMetaDataTracker system as a replacement for the List<RefenenceOrderedData> going into walkers. This system allows you to more easily get a tracker for processing using the lookup(name, default) system. See Pileup for an example.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@292 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 19:54:54 +00:00
ebanks 42eb356782 1. modifed by read traversals with indexes to be more general
2. GenomeLocs for reads should have ends spanning the read
   (moved it to GenomeLoc from Utils)
3. Got rid of those stupid unmappable characters from comments in various files


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@289 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 18:24:08 +00:00
andrewk 86fc18e9fc Fixed merge bug
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@288 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 17:41:58 +00:00
andrewk bef475778f - Updated --hapmap switch to --hapmap-chip to reflect the data being chip data for an individual rather than population allele frequency data in Hapmap
- Corrected some bugs to get metrics logging working
- Added a switch --force_1base_probs to ignore 4-base probalities if they exist


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@287 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 17:32:31 +00:00
depristo edc44807af rod's now have names. Use getName() to access it. Next step is better interface to accessing rods
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@286 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 16:41:33 +00:00
depristo f031d882c6 ByReference traversals!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@281 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 13:23:18 +00:00
asivache c6ab60ee04 change variable type to Boolean from boolean to make cmdline parser happy
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@279 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:35:30 +00:00
asivache 16aa979e34 make -A a true flag not an argument that asks for 'true/false' value!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@278 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:23:46 +00:00
jmaguire b7a67da775 Expose the underlying SAM reader to the walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@270 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 21:38:00 +00:00
asivache 5d9b068b8b generic declarations added here and there to eliminate a few annoying warnings; no consequential changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@268 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 20:53:01 +00:00
asivache 4bc035d919 half-way through making rodDbSNP implement AllelicVariant interface; does not work yet
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@267 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 20:48:59 +00:00
ebanks 4faa680887 *Massive* speed-up for interval-based by-read traversals.
[Could do more optimizing, but this simple fix was good enough for now]


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@266 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 20:19:39 +00:00
kcibul c192a95998 changes in three files to make the HapMap RODs work:
- HapMapAlleleFrequenciesROD.java - the referenceOrderedDatum implementation
 - PrepareROD.java - has a static block that loads the known ROD classes, had to add the above
 - GenomeAnalysisTK.java - when supplied a hapmap argument... loads the ROD

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@265 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 19:55:19 +00:00
asivache b4cdd1d9a1 correct package name
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@264 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 18:09:31 +00:00
depristo 93fc768c38 Fixing problems with SAMQueryIterator and reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@263 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 18:04:28 +00:00
ebanks 3248176118 Die with appropriate error message if we try to read past the end
of a chromosome.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@261 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 16:44:32 +00:00
depristo 24e8581c30 Slight improvements to allele caller interface; fixed problem with printing progress
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@260 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 16:44:12 +00:00
jmaguire 25ace306b9 GenomeAnalysisTK: better documentation of validation option.
AlleleFrequencyWalker: output the last reference interval if it's left hanging open.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@258 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 16:11:20 +00:00
asivache 816e768a74 move interface from playground
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@257 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 15:58:01 +00:00
depristo d952790258 GFF now parses attributes correctly and efficiently. Slightly better interface to Utils.join
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@253 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 22:54:38 +00:00
ebanks 6cc2fa24d5 Added ability to downsample to a particular coverage
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@250 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 20:27:06 +00:00
jmaguire bb3dbb5756 change default onTraversalDone to use the new output streams
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@249 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 19:50:31 +00:00
ebanks 6994cca988 added precision
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@246 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 16:21:29 +00:00
ebanks 3af4290a49 Added iterator to randomly downsample to a given fraction of the reads.
Also, updated sort iterator to allow user to input max sorts.
Put in placeholder for downsampling to given coverage.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@243 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 02:11:13 +00:00
depristo 385736469c High performance pileup code and utilities
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@242 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 00:47:47 +00:00
aaron ad63633b1c forgot to change the chunks dir to shards before
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@241 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-31 20:28:20 +00:00
ebanks 8d601a6a42 unbox
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@239 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-31 15:51:59 +00:00
ebanks 234137dee8 use boolean instead of String for flag to suppress printing in map
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@236 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-31 15:14:00 +00:00
ebanks 3896cc8f17 Moved avg depth of coverage functionality into the core depth of coverage
walker.  Used new command line args for walkers.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@234 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-31 05:02:33 +00:00
ebanks 89c1762aa9 Apparently, no one else has tried to create a stateless walker over loci until
now, as this should have come up: make sure reduce sums get transferred to the
next reduce.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@232 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-31 02:31:51 +00:00
aaron ba99e9f648 checking in some of the more static Data Source dependent code at this point. They don't do much on their own, but are need for the base data source code I'm writing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@231 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-31 00:04:03 +00:00
hanna e812cfbf55 Refactor common functionality out of WalkerManager and into JVMUtils and PathUtils. Add support for loading walkers from a jar.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@229 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-30 23:20:55 +00:00
hanna 7c6455fe36 Handle the case where a walker is being run outside of the GATK framework, such as JUnit tests.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@222 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-29 01:50:27 +00:00
depristo d7c0bcc223 Reorganized GenomeLoc code to more clearly and better use the picard SequenceDictionary information.
All GenomeLoc[] are not ArrayList<GenomeLoc> for clarity and consistency
Parsing now recursively merges contiguous elements chr1:1-10;chr1:11-20 => chr1:1-20
Added support for TraversingByLoci over all reference positions specified by the provided location array.  System dynamically determines which traversal system to use.
Pileup now marks, very clearly, reference positions without covered reads.
Made changes around the codebase to deal with new GenomeLoc structure.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@218 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-28 20:37:27 +00:00
depristo c2ae6765a3 Removed unnecessary dependence on playground code...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@217 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 22:48:51 +00:00
depristo cfee59e0e6 New type hierarchy for Traversals. There's a new package to hold them (traversals) and an easy system to create new ones. We are now one step closer to supporting the execution manager (a totally non-functional version is included here) that actually executes walkers in parallel using N threads.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@214 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 15:40:45 +00:00
hanna 4a6be896b9 Provide out and err PrintStreams to the walkers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@213 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 15:03:32 +00:00
depristo 826781a760 The traversal engine now passes the reduce result to OnTraversalDone() in the walker base class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@210 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 13:44:46 +00:00
aaron d115209e86 moved a bunch of files over to the logging system. In some cases I ballparked the severity level of an error, so if you see something wrong feel free to make changes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@209 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 13:27:04 +00:00
depristo 3abaaa3cc3 Tried to add a poor man's version of seeing all reference sites in an interval, and failed. However, I did add the command line argument and a few pieces of useful code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@206 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 00:12:35 +00:00
hanna 53fe9acf65 Make command-line arguments available in walker constructor, provide back door from
walker into GATK itself, do some cleanup of output messages, and add some bug fixes.
Command-line arguments in walkers are now feature-complete, but still a bit messy.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@203 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 20:45:27 +00:00
hanna 5f9010116a Collapse the walker hierarchy, in preparation for in-walker output streams less hokey walker args.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@201 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 16:22:35 +00:00
depristo 7cad3acc61 Support for dynamically merging data files. Preliminary only -- everything in these systems is still being tested
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@200 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 14:40:50 +00:00
depristo d457778283 Unified byLoci and byLociByInterval traversals. It now figures out what to do for you based on the presence of an index and set of required locations to process.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@191 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 16:01:58 +00:00
depristo d11bb0fc64 Added xReadLines class to utils. It is a iterator<string> and iterable<string> so you can easily read all lines from a file. It's been used to simplify the code to process intervals, and will be used to add merging data support to the system...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@187 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 15:17:38 +00:00
depristo 8bdf49a01f added slightly more useful output to Depth of Coverage walker. (now prints number of loci). Traversal engine now actually prints the reduce result (key) and no longer prints millions of locus interval updates
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@183 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 03:12:54 +00:00
depristo ff98e28abf High-performance interval list implement -- uses StringBuilder to avoid n^2 calculation. Can handle millions of locations quickly now
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@182 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 02:17:48 +00:00
andrewk 30babbf5b9 Restructured AlleleFrequencyMetricsWalker to correctly report Hapmap concordance numbers for genotyping and added reporting for Hapmap reference/variant calling. Also, tiny bugfix in interval code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@181 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 01:12:05 +00:00
hanna 9e2a373184 Prototype, buggy implementation of walker command-line arguments. Doesn't
(yet) deal elegantly with even simple cases.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@180 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 00:12:00 +00:00
depristo 919a86e876 Cleaned up code for by interval traversals for Jared. Initialization code refactored and made clear. by loci and by loci by interval use the same underlying code now. Everyone uses the same initialization code to set things up. It's a party in the TraversalEngine and everyone's invited...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@179 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 22:32:45 +00:00
depristo 6df19ab793 Support for byInterval traversals for Jared. Do not use them.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@175 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 20:55:34 +00:00
depristo 9f500215da Support for reseting the system; Cleanup later
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@174 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 20:52:11 +00:00
andrewk 9dee9ab51c Added Hapmap data track (using rodGFF class for GFF file format) to toolkit as a command line option, Hapmap metrics to AlleleFrequencyMetricsWalker, and a python Geli2GFF file converter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@163 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 03:58:03 +00:00
hanna f7363cf935 Support for loading from either a jar or a class directory. Fixes troubles with IntelliJ debugging.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@162 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 03:56:49 +00:00
hanna 63cd1fe201 Push core / playground lower into the tree.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@160 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-23 23:19:54 +00:00