kiran
1fb16d54e0
For SAM files that have no alignments and when no reference is specified, contigInfo.getSequence() is null, causing an error when getSequenceName() is called on the resulting null pointer. Check for null instead and return that instead of barfing here.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@374 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:48:21 +00:00
kiran
5e96ab6161
Helpful functions for converting a base (char) to a base index (A:0, C:1, G:2, T:3, alphabetical and consistent with Illumina conventions to minimize confusion.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@373 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:46:23 +00:00
kcibul
ce72932a45
* refactored GenomeLoc to use contigIndex internally for performance and fixed several calling classes
...
* added basic unit test for GenomeLoc
* fixed bug when parsing genome locations like chr1:5000 the start position was being left as maxint rather than being set to the same as the stop position.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@365 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 02:25:17 +00:00
depristo
17b3d5b554
New ROD accessing system, including a generalized interface for binding ROD on the command line that doesn't require you to chance GenomeAnalysisTK.java
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@355 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 22:04:59 +00:00
hanna
8a1207e4db
Bringing up scaffolding for integration of locus traversals by reference with Aaron's data source code.
...
Reverts to original TraverseByLociByReference behavior unless a special combination of command-line flags are used.
Lightly tested at best, and major flaws include:
- MicroManager is not doing MicroScheduling right now; it's driving the traversals.
- New database-ish data providers imply by their interface that they're stateless, but they're highly stateful.
- Using static objects to circumvent encapsulation.
- Code duplication is rampant.
- Plus more!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@346 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:28:17 +00:00
kiran
59b2e6a90f
Added some stuff for retreiving the base index and probability of a compressed base.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@329 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 15:52:58 +00:00
depristo
b49f713336
Enabled multiple argument for GATK driver; first step towards generalized -rods <name> <type> <file> argument structure
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@325 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-08 01:52:13 +00:00
depristo
00722e19bc
The system now requires a dictionary file for a fasta file, or it throws an error. You can't just operate without a sequence dictionary any longer. We will transition to a GenomeLoc system that assumes a dictionary is available.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@319 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 22:19:54 +00:00
asivache
9c4fc633aa
Make it symmetric: if there is no sequence dictionary, also send a message to the logger, just like we do when we find the dict
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@318 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-07 21:44:39 +00:00
ebanks
d1c5e986d5
Another check to deal with bad reads (BWA output throws bad exceptions)
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@298 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 04:58:22 +00:00
ebanks
3f75fc4e83
Unfortunately, because BWA occasionally outputs crazy reads, we need
...
to make sure not to have an ArrayIndexOutOfBoundsException thrown.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@297 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-06 03:51:35 +00:00
ebanks
2e89d5e46f
That was an annoying bug to find. Mark, I want a beer.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@293 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 20:05:24 +00:00
depristo
c1abcfb014
Fixed problem where we were considering reads out of order because their stop positions where out of order, but with equal starts. This involved a change in the ordering feature of GenomeLoc, which now no longer sorts by both start and stop. So as long as the start positions are equal, things are considered "in order". Perhaps this isn't a good idea to change...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@291 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 19:53:33 +00:00
kiran
ef06924f73
JavaDocs!
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@290 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 19:19:17 +00:00
ebanks
42eb356782
1. modifed by read traversals with indexes to be more general
...
2. GenomeLocs for reads should have ends spanning the read
(moved it to GenomeLoc from Utils)
3. Got rid of those stupid unmappable characters from comments in various files
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@289 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 18:24:08 +00:00
kiran
b854c24575
Oops. I gave this method the wrong name first time around.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@283 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-03 15:46:26 +00:00
kiran
a8a6c63a32
A class with some static methods that aid the manipulation of quality scores and probabilities (including a method to compress a base and quality score into a byte for SAM output.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@271 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 22:06:15 +00:00
ebanks
4faa680887
*Massive* speed-up for interval-based by-read traversals.
...
[Could do more optimizing, but this simple fix was good enough for now]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@266 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-02 20:19:39 +00:00
depristo
d952790258
GFF now parses attributes correctly and efficiently. Slightly better interface to Utils.join
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@253 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 22:54:38 +00:00
hanna
ce57fed2fb
Hack to work around an Apache CLI bug, where core arguments couldn't be commingled with walker arguments. These arguments can commingle now. Everybody into the pool.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@252 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 20:56:42 +00:00
hanna
16c2ea4673
Invalid arguments are not always flagged when stopAtNonOption is false. Make sure stopAnNonOption is true when we do final argument validation.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@245 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 15:58:57 +00:00
hanna
7ee792df04
Print correct help if core arguments (--input-file et al) aren't correctly specified.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@244 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 15:16:49 +00:00
depristo
385736469c
High performance pileup code and utilities
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@242 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-01 00:47:47 +00:00
hanna
7fda409f4e
Fixed bug where read traversals would fail with an exception when not called with a genome_region (-L) argument. From TraversalEngine, line 455, looks like Mark intended an invariant where the list of locations is 0 length if not specified. Made GenomeLoc code compliant with that.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@230 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-30 23:43:12 +00:00
hanna
e812cfbf55
Refactor common functionality out of WalkerManager and into JVMUtils and PathUtils. Add support for loading walkers from a jar.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@229 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-30 23:20:55 +00:00
hanna
36f851362e
Oops. While writing command-line argument docs, I realized I introduced
...
a regression in default value handling.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@226 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-30 18:51:39 +00:00
depristo
d7c0bcc223
Reorganized GenomeLoc code to more clearly and better use the picard SequenceDictionary information.
...
All GenomeLoc[] are not ArrayList<GenomeLoc> for clarity and consistency
Parsing now recursively merges contiguous elements chr1:1-10;chr1:11-20 => chr1:1-20
Added support for TraversingByLoci over all reference positions specified by the provided location array. System dynamically determines which traversal system to use.
Pileup now marks, very clearly, reference positions without covered reads.
Made changes around the codebase to deal with new GenomeLoc structure.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@218 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-28 20:37:27 +00:00
hanna
4a6be896b9
Provide out and err PrintStreams to the walkers.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@213 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 15:03:32 +00:00
aaron
230c1ad161
moved a bunch of files over to the logging system. In some cases I ballparked the severity level of an error, so if you see something wrong feel free to make changes.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@211 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 14:02:55 +00:00
aaron
935a4d81c9
fixed the problem where you could specify a logging level that didn't exist
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@208 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-27 04:29:27 +00:00
hanna
f7097c8ee7
Cleanup.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@205 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 21:24:12 +00:00
hanna
728f932ecf
Fix exclusive options.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@204 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 20:59:32 +00:00
hanna
53fe9acf65
Make command-line arguments available in walker constructor, provide back door from
...
walker into GATK itself, do some cleanup of output messages, and add some bug fixes.
Command-line arguments in walkers are now feature-complete, but still a bit messy.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@203 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 20:45:27 +00:00
hanna
5f9010116a
Collapse the walker hierarchy, in preparation for in-walker output streams less hokey walker args.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@201 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 16:22:35 +00:00
hanna
2808fd4bbd
Better support for required mutually exclusive options.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@199 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 03:22:30 +00:00
hanna
08ece8df79
Bug fixes and support for mutually exclusive options. Still a bit rough, but will
...
be easier to clean up after a walker refactoring.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@198 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-26 03:11:56 +00:00
hanna
4b7bfb284a
Support for more complex command-line types: arrays, untyped collections, typed collections, interfaces to typed and untyped collections.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@194 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 20:11:31 +00:00
depristo
c18f8fbf5f
Documentation and cleanup of xReadLines.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@190 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 15:36:21 +00:00
depristo
d11bb0fc64
Added xReadLines class to utils. It is a iterator<string> and iterable<string> so you can easily read all lines from a file. It's been used to simplify the code to process intervals, and will be used to add merging data support to the system...
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@187 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 15:17:38 +00:00
depristo
ff98e28abf
High-performance interval list implement -- uses StringBuilder to avoid n^2 calculation. Can handle millions of locations quickly now
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@182 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 02:17:48 +00:00
hanna
9e2a373184
Prototype, buggy implementation of walker command-line arguments. Doesn't
...
(yet) deal elegantly with even simple cases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@180 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-25 00:12:00 +00:00
aaron
c047b53d6b
added some cleanup of code, and new junit targets to the build file
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@177 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 21:16:12 +00:00
depristo
6df19ab793
Support for byInterval traversals for Jared. Do not use them.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@175 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 20:55:34 +00:00
aaron
a3b8830855
need more access, found out in junit testing
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@165 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-24 13:22:09 +00:00
hanna
63cd1fe201
Push core / playground lower into the tree.
...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@160 348d0f76-0448-11de-a6fe-93d51630548a
2009-03-23 23:19:54 +00:00