Commit Graph

757 Commits (6bb7f7e9d815acfd6344e1c161f8ca32565535b2)

Author SHA1 Message Date
kiran 6bb7f7e9d8 Commented some stuff out so that things compile.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@963 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 14:06:33 +00:00
hanna dc6a9ca196 Pooling resources to lower memory consumption.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@962 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 13:39:32 +00:00
kiran 87ba8b3451 Removed some useless code. Don't apply second-base test if the coverage is too high, since the binomial probs explode and return NaN or Infinite values.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@961 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:27:06 +00:00
kiran a12ed404ce Changed method name from applyFourBaseDistributionPrior to applySecondBaseDistributionPrior. 'Cause that's how I roll.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@960 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:21:22 +00:00
kiran 3adb4239e4 Same as regular Pileup, but also allows you to see flanking region around locus. This will be useful in determining that some SNPs are spurious due to being at the ends of homopolymer regions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@959 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:19:31 +00:00
kiran 2b0e7f612b Handles bam pileups where some of the reads have SQ tags and some don't.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@958 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:17:15 +00:00
aaron 36c98b9d6c added tools to test read based traversals using the artificial in-memory SAM file tools, and testing of the PrintReadsWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@957 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 01:52:25 +00:00
aaron eb962fe52a adding an artificial sam file writer, used to unit test some of the walkers (mainly the PrintReadsWalker)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@956 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 21:47:49 +00:00
hanna e77dfe9983 Allow script to be easily modified to support different platforms.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@955 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 16:06:57 +00:00
depristo 7fa84ea157 10x speedup of recalibration walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@954 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 15:39:40 +00:00
aaron a62bc6b05d fixed some documentation and attached a correct license
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@953 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 14:44:27 +00:00
aaron bf6190b471 cleaned up the PrintReadsWalker, and added a lot of documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@952 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 14:28:32 +00:00
ebanks b45b1d5f2b border case bug fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@951 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 04:33:15 +00:00
kiran fecba2cae5 Disabled option to show secondary quals as the definition has changed to conform to the spec and thus this printout is non-sensical.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@950 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 03:21:14 +00:00
kiran e7f222108d More accessors. Can compute the sum of the quality scores in the read (useful for sorting) and can return a subset of itself.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@948 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 01:02:48 +00:00
kiran 6506504a60 Updates after seeing a certain number of reads, not a certain number of bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@947 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 01:01:36 +00:00
kiran 65d0675a4e Some changes regarding what to do when a cycle is completely busted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@946 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 01:01:13 +00:00
kiran 0bd78d72d7 Some changes regarding what to do when a cycle is completely busted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@945 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 01:00:33 +00:00
kiran af0b03a257 Added tests for mostFrequentBaseFraction() and reverseComplementString()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@944 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:53:45 +00:00
kiran 681e67c72c Added some methods to generate random bases or random base indexes, optionally disallowing the generation of a specified base or base index.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@943 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:47:54 +00:00
asivache 13eb868536 helper class. array-like random access and fast shift. good for sliding windows (e.g. keeping coverage over last 100 bases while sliding along the reference)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@942 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:11:57 +00:00
asivache 3d6e738a60 still under development. does not genotype yet, but walks and talks (counts overal coverage and indel variant occurences at every reference position
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@941 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:10:31 +00:00
ebanks 58f7ae8628 better filtering, plus deal with case where user doesn't input maxlength
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@939 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 18:44:29 +00:00
asivache ce431b5d2d added hashCode()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@937 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 16:52:02 +00:00
asivache b4ef16ced2 extractIndels() now should deal correctly with soft- and hard-clipped bases
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@936 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 16:04:49 +00:00
aaron a8a2d0eab9 added support for the -M option in traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@935 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 15:12:24 +00:00
hanna e2ed56dc96 Add a MAX_READ_GROUPS sanity parameter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@934 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 13:57:43 +00:00
asivache 9f35a5aa32 Insidious bug: clipped sequences (S cigar elements) where a) processed incorrectly; b) sometimes caused IntervalCleaner to crash, if such sequence occured at the boundary of the interval. The following inconsistency occurs: LocusWindow traversal instantiates interval reference stretch up to rightmost read.getAlignmentEnd(), but this does not include clipped bases; then IntervalCleaner takes all read bases (as a string) and does not check if some of them were clipped. Inside the interval this would cause counting mismatches on clipped bases, at the boundary of the interval the clipped bases would stick outside the passed reference stretch and index-out-of-bound exception would be thrown. THIS IS A PARTIAL, TEMPORARY FIX of the problem: mismatchQualitySum() is fixed, in that it does not count mismatches on clipped bases anymore; however, we do not attempt yet to realign only meaningful, unclipped part of the read; instead all reads that have clipped bases are assigned to the original reference and we do not attempt to realign them at all (we'd need to be careful to preserve the cigar if we wanted to do this)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@933 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 05:20:29 +00:00
ebanks 3a8219a469 use knowledge from other reads to find a consensus
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@932 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 21:22:17 +00:00
hanna 596773e6c6 Cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@931 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 20:25:08 +00:00
depristo 98396732ba Bug fixes for Andrey
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@930 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 18:19:51 +00:00
asivache b48508a226 indelRealignment() signature changed. The only difference about consensus sequences is that they are passed along with alignment cigars that start inside the sequence, while for 'conventional' reads cigar always starts at position 0 on the read. Logically, indelRealignment() should not know what 'consensus' is. Instead, now it receives an additional int parameter, start of the cigar on the 'read' sequence
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@929 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 17:42:19 +00:00
asivache 9eb38c0222 mostly synchronizing with the main branch. Based on anecdotal evidence (too few examples in the data), realignment (shifting indel left across a repeat) works correctly on non-homonucleotide repeats
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@928 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 16:39:16 +00:00
ebanks c6634e3121 cleaned up some code and minor bug fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@927 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 03:14:21 +00:00
asivache 99c105790b Now indelRealignment should be correct... The old version could only condense to the left homo-nucleotide indels. New version should be able to detect and shift left arbitrary repeated sequence (e.g. deletion of ATA after ATAATAATA will be shifted left to the first occurence of ATA on the ref! NOT THOROUGHLY TESTED YET, will test tonight../somaticIndels.pl --dir . --cutoff 100 -filter EXON --mode SOMATIC --condense 5 --format bed > 0883.indel.somatic.exon.100.bed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@926 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 23:54:07 +00:00
asivache 3b4dc6e7b5 added sequencePeriod(String seq, int minPeriod) - finds smallest period equal to or greater than minPeriod for the specified text string seq; this is a trivial (hopefully correct) back-of-the-envelope implementation for a well-known and well-studied problem; there should be more efficient algorithms in the wild
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@925 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 23:05:24 +00:00
hanna 40ac3b7816 Inject read group into covars_out file's toString output. Continue fixing systematic bug in the code where flattenData is not joined to the read group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@924 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 20:43:28 +00:00
asivache 0bb4565798 added AlignmentUtils.getNumAlignmentBlocks(read) - a faster alternative to read.getAlignmentBlocks().size(); IntervalCleaner updated accordingly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@923 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 19:35:21 +00:00
asivache 92b054b71b moved another variant of numMismatches to AlignmentUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@922 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 18:07:48 +00:00
asivache 7018dd1469 moved another variant of numMismatches to AlignmentUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@921 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 18:05:29 +00:00
hanna ac5b7dd453 Fixed order-of-operations bug.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@919 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 03:22:56 +00:00
depristo 819862e04e major restructuring of generalized variant analysis framework. Now trivally easy to add additional analyses. Easy partitioning of all analyses by features, such as singleton status. Now has transition/transversional bias, counting, dbSNP coverage, HWE violation, selecting of variants by presence/absense in dbs. Also restructured the ROD system to make it easier to add tracks. Also, added the interval track -- if you provide an interval list, then the system autoatmically makese this available to you as a bound rod -- you can always find out where you are in the interval at every site. Python scripts improved to handle more merging, etc, into population snps.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@918 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 23:34:37 +00:00
asivache 400399f1b8 fixed (?) a bug in insertion realignment
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@917 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 22:04:37 +00:00
hanna 34bb43a6c8 Saw that one of the offsets needed to be changed from - 1 to -2 and changed the wrong damn offset. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@915 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 19:18:34 +00:00
ebanks 4623a34ad3 Fix bug in realigning insertion cigar strings
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@914 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 18:46:41 +00:00
aaron 199be46c36 changed the warning that is outputted when the GenomeLoc constructor can't find the given contig in the reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@913 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 15:49:03 +00:00
ebanks 092a754071 Make sure indel position from SW alignment is leftmost possible
(and improve printouts)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@912 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 15:36:10 +00:00
aaron 37efd78c7e fixed the logger call so we get output that indicates this class generated the message
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@911 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 15:02:17 +00:00
aaron b323c58ef2 add a place to store the walker return value, along with a method to retrieve it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@910 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 14:41:42 +00:00
ebanks 36fb6ca3c5 Allow user to specify the compression to be used when writing out BAM files.
Updated most of the walkers to reflect this change.
Now it won't take forever to write BAMs!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@909 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 08:48:34 +00:00