Commit Graph

317 Commits (3cda85f2e34d7d6be69360a28700add2fdf9bc54)

Author SHA1 Message Date
asivache f2f9fa3ed4 doc added
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@464 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 16:43:25 +00:00
hanna d639ec3776 Remove some copied code to make sure the traversal engine stays in sync with the locus context provider.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@463 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 16:41:56 +00:00
asivache df5aae5ed4 got read of a couple of warnings and added percentage(x,base) methods
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@462 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 15:15:21 +00:00
depristo 50ae1763f7 Support for -continue_after_errors flag in the validating pileup walker in case you want to see errors as they arise, rather than aborting greedily
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@461 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 03:13:11 +00:00
depristo ee5ab9536f trivial checking / flagging issues to enable testing of merging iterator performance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@460 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 03:11:59 +00:00
depristo dbf2344cef Fixes for including duplicate reads in the locus traversal; now checks that the ref arg is provided when needed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@459 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-17 01:27:36 +00:00
hanna 01be8f09e3 Exception cleanup. All our non-runtime exceptions should extend from StingException, StingException needs to be lower in the tree to build.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@457 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 22:17:25 +00:00
aaron e5c80e59dc fixed the case when you're not seeking, it didn't initalize
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@456 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 22:16:03 +00:00
depristo f47f640df6 Better debugging output and testing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@455 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 21:54:56 +00:00
hanna 165e504d1c Turn on new TraverseLociByReference is now only dependent on the -et flag. REGION_STR does not matter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@454 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 19:45:47 +00:00
aaron 12e1f192c4 Fixed a bug in this code where it would eat reads that didn't start at the beginning of the provided interval. This should fix / help fix Kristian problem
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@453 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 18:42:00 +00:00
asivache 835f1067d8 added isHom() and isHet() queries to the Genotype interface (with the obvious meaning)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@452 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 18:41:39 +00:00
asivache 55537c0d1e chnage class name, now it compiles...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@451 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 16:51:00 +00:00
asivache 4f9bc7206f some cleanup, also ensuring that all reads get written into output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@450 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 16:49:25 +00:00
asivache e8a6cdb386 renamed standalone main
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@449 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 15:56:46 +00:00
asivache 832afd3d60 renamed standalone main
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@448 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 15:56:27 +00:00
asivache 85308f4ddc resurrected indel tool's standalone main
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@447 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 15:55:52 +00:00
kcibul 6f56938d42 * added a bit more debugging output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@446 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 15:20:26 +00:00
kcibul d35a542bb9 * fixed bug where the merged header was not being set on the read (although the read group was)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@445 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 12:53:07 +00:00
asivache 240eb18564 fix a few related issues when not all the reads were written into the output files. now cleaned output still contains all reads either with modified alignments or untouched
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@444 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 03:56:47 +00:00
asivache 0d324354ae separate interface for genotypes as opposed to (population) allelic variants
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@443 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 03:55:16 +00:00
kcibul 7e05b43f40 * added some error checking for read groups
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@442 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 03:22:49 +00:00
hanna 56f6847456 Changed interface from contig,pos,length to more common contig,start,stop interface.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@441 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-16 00:04:41 +00:00
depristo 7261787b71 Fixed potential bug with next() operation returning empty contexts when a read contains a large deletion. We can now use the look ahead safely...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@438 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 21:38:28 +00:00
aaron e70aecf518 bug fix, but important
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@437 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 21:07:20 +00:00
depristo 76d833e39b Meaningful assert messages
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@435 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 19:12:28 +00:00
aaron 67ea66c866 Bug fix
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@434 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 19:12:18 +00:00
depristo 1edfe48194 Better debugging output with .debug
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@433 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 19:09:18 +00:00
depristo 9cc808104e Fixed subtle bug in permitting EXPAND_WINDOW to be > 1. We now use the right window size so we avoid including empty hangers. There's still a rare bug to sort out, which occurs in the case where a read with an indel can generate empty hangers.
Also cleaned up the debugging output.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@432 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 19:08:26 +00:00
aaron 180ff13290 Added a bunch of changes to support the new MicroManager code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@431 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 18:29:38 +00:00
hanna 339261c4a9 Load the dictionary and sanity check it against the index.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@430 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 18:04:13 +00:00
hanna 26e84d7fd6 Added index iteration for ReferenceSequenceFile interface compatibility.
Added better error checking for querying past the end of a contig.
Lots more testing.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@429 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 17:17:11 +00:00
kcibul 3fda8613c3 * minor formatting changes
* support for "extended" output

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@428 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 15:11:05 +00:00
aaron 12407b5b1a Deleted the old file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@427 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 13:55:01 +00:00
aaron 6db9127f90 Added changes to shattering, refactored SAMBAM into SAM
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@426 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 13:52:56 +00:00
hanna 182626576f Basic indexed fasta POC in place. Requires a more complete implementation of the ReferenceSequenceFile interface,
and much more testing.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@425 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 13:46:56 +00:00
kiran 7949e377e4 Intermediate commit. Refactored some simple base manipulation stuff into BaseUtils.java. Generalized some likelihood computation logic to make future possible EM-ing easier.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@424 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 04:18:07 +00:00
kiran d0b8d311e6 Can now optionally print the read and the alignment region of the reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@423 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 04:10:30 +00:00
kcibul d4aaa1bef4 * fixed (with Matt's help) the argument parsing
* outputting UCSC wiggle format

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@422 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-15 02:17:39 +00:00
depristo 24722a442e Slight code cleanup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@421 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 22:21:36 +00:00
aaron 13b0995d54 Adding an iterator that bounds the number of reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@419 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 22:18:31 +00:00
depristo 72a3d84ed2 General purpose pileup code -- you can use these features to obtain detailed pileup data from reads and offsets. Useful for all pileup based walkers. Expanded support for rodSAMPileup to enable the new ValidatingPileupWalker, which takes a samtools pileup output and checks that GATK gives identical output as samtools on a per base and per qual pileup. It's going to be a very useful validation tool.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@418 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 22:13:10 +00:00
asivache baae98c6d5 and don't allocate new 200M string every time please, just pass byte array!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@417 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 21:55:33 +00:00
asivache 9d56355abe bug fixed when reference name was passed as a string instead of actual reference bases
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@416 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 21:46:27 +00:00
kiran 222c4e5865 Commented out some debugging lines
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@415 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 20:15:41 +00:00
kiran 49d76014d1 Commented out a debugging line
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@414 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 20:15:11 +00:00
kiran b39e584787 Primary or secondary bases that got a quality score of literally zero led to unfortunate infinities. Added an epsilon (1e-5) to every prob.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@413 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 20:04:49 +00:00
jmaguire d28e9f9b98 search over q's for finding argmax[q] p(D|q)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@412 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 19:15:45 +00:00
ebanks 647827b18c Transitioned indel code to use GATK and Walkers
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@410 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 19:14:15 +00:00
ebanks b363eedd2c Deal with screwy reads by changing logic to determine whether we are
past the last interval


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@409 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 19:13:16 +00:00
hanna 0629f79049 Moved fasta support files into their own package.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@408 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 18:13:23 +00:00
hanna 186c799ffc Class to read an .fai file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@405 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 17:37:18 +00:00
jmaguire 961dbbd4ef Now output bases and qhat and qstar into the GFF.
Quals coming soon (four-base)

QHAT  : Most likely alt allele freq (unconstrained by number of chromosomes).
QSTAR : Most likely alt allele freq (constrained by number of chromosomes).





git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@402 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 15:23:00 +00:00
kiran dafdff1974 All bases are now indexed as A:0, C:1, G:2, T:3.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@401 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 14:49:43 +00:00
kiran 40ea22eb17 Added some methods to return the cross-talk partner base of a given base or base index.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@400 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 14:49:12 +00:00
aaron eb4b4a053b A bunch of updates to the SAM/BAM data source, along with test cases for the merging of multiple files (it works!).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@399 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 14:19:20 +00:00
kiran 30121534ed Outputs the secondary bases and quals (if available) in verbose mode. Prefixed with the tag 'SQ='.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@398 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 13:58:28 +00:00
kiran 998fad76c6 Some utility methods for creating pileups of secondary bases and secondary quals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@397 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 13:57:54 +00:00
depristo 8b2c2e677b Uses the cleaner new GenomeLoc(read) syntax
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@396 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:55:43 +00:00
depristo 1cee7948ab Added lots of assertions to check for problems.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@395 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:55:19 +00:00
depristo 794360c410 Added verbose option to show mapping qualities and base qualities as ints!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@394 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:54:48 +00:00
depristo cc75e8f712 Uses the cleaner new GenomeLoc(read) syntax
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@393 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:53:58 +00:00
depristo 11377ef390 Added lots of assertions to check for problems. The current GenomeLoc needs to be cleaned up and refactored but at least it runs. We need unit tests ASAP
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@392 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:53:08 +00:00
depristo bb666ce392 Added mappingQualPileup function for use in the verbose mode of Pileup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@391 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:51:26 +00:00
asivache bc43c0eefc there are really cases when we can not merge until we get just two pilesant now we do not crash in those cases but print a warning and just show the resulting n piles even when n>2
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@390 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:45:47 +00:00
asivache 8e6093d5a5 remove mom/dad/kid cmd line arguments that were needed for mendelian walker; now we can use generic track binding!!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@389 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-14 00:45:34 +00:00
kiran f838a5e511 Changed some double comparisons of the form a == b to abs(a - b) <= precision. Now we shouldn't be passing or failing some if conditions due to floating-point precision.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@388 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 20:05:46 +00:00
aaron 887adcfc7f Some minor fixes to the last check-in
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@387 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 18:24:51 +00:00
aaron f2d0d73309 removed old shard strategy code
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@386 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 18:13:45 +00:00
aaron dd604799dc Added some new code for shard support over reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@385 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 18:11:43 +00:00
asivache d44c30154a added MAX_READ_LENGTH - now we can ignore long reads (454?); a bad idea in general, but the performance hit is to hard to take, at least for preliminary testing runs...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@384 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 16:53:12 +00:00
hanna e91a429c58 A class to print out as much context about the given locus site as is possible. Useful for testing traversal engines -- run old and new code across a given region and diff the output to make sure they have the same context.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@383 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 15:29:55 +00:00
jmaguire 6652f13a17 more verbose gff output!
EVEN MORE verbosity to come! 

Tremble in anticipation.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@382 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 15:21:23 +00:00
jmaguire 6e180ed44e Unified caller is go.
AlleleFrequencyWalker and related classes work equally well for 2 or 200 chromosomes. 

Single Sample Calling:

	Allele Frequency Metrics (LOD >= 5)
	-------------------------------------------------
	Total loci                            : 171575
	Total called with confidence          : 168615 (98.27%)
	Number of variants                    : 111 (0.07%) (1/1519)
	Fraction of variant sites in dbSNP    : 87.39%
	-------------------------------------------------
	
    Hapmap metrics are coming up all zero. Will fix.

Pooled Calling:

	AAF r-squared after EM is 0.99. 
    AAF r-squared after EM for alleles < 20% (in pools of ~100-200 chromosomes) is 0.95 (0.75 before EM)

    Still not using fractional genotype counts in EM. That should improve r-squared for low frequency alleles.


Chores still outstanding:
    - make a real pooled caller walker (as opposed to my experiment framework).
    - add fractional genotype counts to EM cycle.
    - add pool metrics to the metrics class? *shrug* we don't really have truth outside of a contrived experiment...



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@380 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 12:29:51 +00:00
jmaguire f39092526d Added function RandomSubset
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@379 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 12:14:53 +00:00
asivache b4136b6d6e a few tweaks to make it more robust: ignore reads with cigars containing anything but I,D,M; don't set up contig ordering manually, rely upon reference sequence and its dictionary; don't die if a record does not have NM tag, but faal back to direct counting instead; now requires reference as a cmdline arg
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@378 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 04:49:19 +00:00
kiran 756e6c61d8 Strictness args are presented as lowercase in the help, but only accepted if uppercase. Changed help to list the valid arguments in uppercase.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@376 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:50:19 +00:00
kiran c51f51f255 Make sure we always write at least 1000 points per base in each cycle's scatterplot. Print the disagreement rate between Bustard and FourBaseRecaller.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@375 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:49:41 +00:00
kiran 1fb16d54e0 For SAM files that have no alignments and when no reference is specified, contigInfo.getSequence() is null, causing an error when getSequenceName() is called on the resulting null pointer. Check for null instead and return that instead of barfing here.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@374 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:48:21 +00:00
kiran 5e96ab6161 Helpful functions for converting a base (char) to a base index (A:0, C:1, G:2, T:3, alphabetical and consistent with Illumina conventions to minimize confusion.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@373 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-13 00:46:23 +00:00
kiran 35fc002d5d Debugging information is now written in such a way to make it easier to import into R.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@372 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:45:33 +00:00
kiran 6ee4fe5a20 Fixed a Bustard/Firecrest file synchronization bug.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@371 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:44:07 +00:00
kiran 817278be46 If a SAMRecord is on the negative strand, reverse complement the SQ tag.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@370 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:42:24 +00:00
kiran 1d5a22cacf Extracts a Fastq file and the SQ tags to a separate file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@369 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:41:44 +00:00
kiran e410c005c0 A debugging tool to ensure the SQ tag in a four-prob SAM file matches the SAMRecord strand orientation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@368 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 19:40:42 +00:00
kcibul c7777d46d6 * re-enabled setting of sequence dictionary information on GenomeLoc
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@366 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 02:44:14 +00:00
kcibul ce72932a45 * refactored GenomeLoc to use contigIndex internally for performance and fixed several calling classes
* added basic unit test for GenomeLoc
* fixed bug when parsing genome locations like chr1:5000 the start position was being left as maxint rather than being set to the same as the stop position.

git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@365 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-12 02:25:17 +00:00
hanna 608a66e6ab TbyLocibyRef previously didn't seem to support traversals with no interval specified. Put in a temporary fix until the threaded approach is in place.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@363 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 22:14:06 +00:00
hanna c2669021b8 Cleanup, and support either by-interval traversals or full traversals in data source-backed code.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@362 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 22:09:01 +00:00
hanna 2322bb7d86 Workaround: use a single ReferenceIterator for an entire micromanaged traversal. We'll have to
do something about ReferenceIterator thread safety later.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@361 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 20:50:28 +00:00
hanna 95753e1b34 Should've been calling queryOverlapping in locus mode.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@360 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 20:22:04 +00:00
kiran 2b59110dca CombineSamAndFourProbs is better.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@358 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 04:19:53 +00:00
kiran 56aa98ad30 Ignore null values.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@357 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 04:18:20 +00:00
kiran 2ef2c9e121 Fixed an issue wherein the SQ field was only being pulled from the first read of the pileup, no matter what. Fixed an issue wherein Andrew enumerates his bases as A:0, C:1, T:2, G:3, and Kiran's QualityUtils methods enumerate bases as A:0, C:1, G:2, T:3 (we should standardize this). Fixed an issue wherein the remaining probability was being divided by 3 rather than 2 when four-base probs are enabled.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@356 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-10 04:17:53 +00:00
depristo 17b3d5b554 New ROD accessing system, including a generalized interface for binding ROD on the command line that doesn't require you to chance GenomeAnalysisTK.java
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@355 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 22:04:59 +00:00
kiran f5cc2d8b0b Commented out import of IlluminaParser.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@354 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 21:30:29 +00:00
hanna 0d825ccfc1 Oops. Fixed duplicate reference to the reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@353 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 21:27:57 +00:00
aaron 9afa101465 Add interval support to the
.__            __    __                
  _____|  |__ _____ _/  |__/  |_  ___________ 
 /  ___/  |  \\__  \\   __\   __\/ __ \_  __ \
 \___ \|   Y  \/ __ \|  |  |  | \  ___/|  | \/
/____  >___|  (____  /__|  |__|  \___  >__|   
     \/     \/     \/                \/       

classes!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@352 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 21:23:43 +00:00
kiran c5220c0822 Four-base probs are now decoded with the relevant method in QualityUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@351 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:52:17 +00:00
kiran 9bc763a835 A better (aka 'working') tool for combining four-base probs with an aligned sam file.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@350 348d0f76-0448-11de-a6fe-93d51630548a
2009-04-09 20:51:37 +00:00