Commit Graph

778 Commits (5440dd13dfd0bc9fae31bf99f9e21dddd2d75b8a)

Author SHA1 Message Date
aaron 63b5c12cbd Changed dataSources to datasources, to be consistant with the rest of our package names. Also, this makes me champion in the largest check-in contest.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@985 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 18:13:22 +00:00
aaron 195b4ea7b4 a rename for consistancy of Sam to SAM, creating a genotype utils dir, and moving the GLF code into it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@984 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 17:46:06 +00:00
ebanks 599ceeddd8 Better method for downsampling deep regions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@983 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 16:57:40 +00:00
ebanks 4d9a88153a Update inferred insert size of cleaned reads when they are paired
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@982 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 16:29:13 +00:00
ebanks 3796654069 Added walker to emit intervals of clustered SNP calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@981 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 00:57:14 +00:00
hanna 678ddd914f Stopgap fixes GFF, DbSNP being half-open rather than half-closed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@980 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 21:38:57 +00:00
aaron 94b0e46d12 checked in a sample xml file used to store the defaults for the SomaticCoverage tool, and added it to the SomaticCoverage.jar in build.sml. Also added a inputStream marshalling method to the GATKArgumentCollection.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@979 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:46:16 +00:00
asivache 8d25f1a105 should be a little faster
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@978 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:33:45 +00:00
aaron 026f68fb41 a couple of quick name changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@976 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:02:52 +00:00
aaron 72a81f8f25 removed the requirement that a bam file list be present in the XML version of the command line arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@975 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:01:13 +00:00
ebanks b1f90635c1 1. downsample when there are too many mismatching reads (needs perfecting)
2. allow user to specify that no reads be emitted


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@974 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 19:55:42 +00:00
asivache 39dcd4f11f an attempt to bail out when unmapped reads are reached at the end of the file(s). still testing...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@973 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 19:53:50 +00:00
asivache 030efc468f added naive ad-hoc cutoff for the pile size the cleaner will attempt to process; use --maxPileSize argument to force any pile larger than specified cutoff to be directly written to the output without cleaning
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@972 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:52:35 +00:00
ebanks f9be175f44 Be smart about trying alternate consenses:
try prior indels first and only 1 instance of them


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@971 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:43:22 +00:00
aaron f304803811 initial check-in of an easy way to create command line tools based on the GATK
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@970 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:34:02 +00:00
kiran b0cc763eb5 Added some methods to format bases such that read bases on the forward strand are in uppercase, while those on the negative strand are lowercase. This does *not* affect the default functionality of the standard PileupWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@969 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:31:00 +00:00
depristo 9ebcd6546d Convenience printing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@968 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:07:38 +00:00
asivache 06e5a765f8 now has two modes: one sample - just call indel sites; two samples - call somatic-looking variants only. Still uses heuristic count-based cutoffs, cutoffs are hardcoded and are pretty conservative...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@967 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 16:41:38 +00:00
ebanks 5451bbfd5a -move final vars to command-line args
-Per Andrey: ignore indels from aligner when testing against alt consensus


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@966 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 16:39:00 +00:00
hanna ad80894afa Bumped picard to latest svn version.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@965 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 14:36:34 +00:00
aaron ec2f015447 fixed a bunch of comments and license headers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@964 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 14:10:46 +00:00
kiran 6bb7f7e9d8 Commented some stuff out so that things compile.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@963 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 14:06:33 +00:00
hanna dc6a9ca196 Pooling resources to lower memory consumption.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@962 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 13:39:32 +00:00
kiran 87ba8b3451 Removed some useless code. Don't apply second-base test if the coverage is too high, since the binomial probs explode and return NaN or Infinite values.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@961 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:27:06 +00:00
kiran a12ed404ce Changed method name from applyFourBaseDistributionPrior to applySecondBaseDistributionPrior. 'Cause that's how I roll.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@960 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:21:22 +00:00
kiran 3adb4239e4 Same as regular Pileup, but also allows you to see flanking region around locus. This will be useful in determining that some SNPs are spurious due to being at the ends of homopolymer regions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@959 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:19:31 +00:00
kiran 2b0e7f612b Handles bam pileups where some of the reads have SQ tags and some don't.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@958 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:17:15 +00:00
aaron 36c98b9d6c added tools to test read based traversals using the artificial in-memory SAM file tools, and testing of the PrintReadsWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@957 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 01:52:25 +00:00
aaron eb962fe52a adding an artificial sam file writer, used to unit test some of the walkers (mainly the PrintReadsWalker)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@956 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 21:47:49 +00:00
hanna e77dfe9983 Allow script to be easily modified to support different platforms.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@955 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 16:06:57 +00:00
depristo 7fa84ea157 10x speedup of recalibration walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@954 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 15:39:40 +00:00
aaron a62bc6b05d fixed some documentation and attached a correct license
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@953 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 14:44:27 +00:00
aaron bf6190b471 cleaned up the PrintReadsWalker, and added a lot of documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@952 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 14:28:32 +00:00
ebanks b45b1d5f2b border case bug fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@951 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 04:33:15 +00:00
kiran fecba2cae5 Disabled option to show secondary quals as the definition has changed to conform to the spec and thus this printout is non-sensical.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@950 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 03:21:14 +00:00
kiran e7f222108d More accessors. Can compute the sum of the quality scores in the read (useful for sorting) and can return a subset of itself.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@948 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 01:02:48 +00:00
kiran 6506504a60 Updates after seeing a certain number of reads, not a certain number of bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@947 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 01:01:36 +00:00
kiran 65d0675a4e Some changes regarding what to do when a cycle is completely busted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@946 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 01:01:13 +00:00
kiran 0bd78d72d7 Some changes regarding what to do when a cycle is completely busted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@945 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 01:00:33 +00:00
kiran af0b03a257 Added tests for mostFrequentBaseFraction() and reverseComplementString()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@944 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:53:45 +00:00
kiran 681e67c72c Added some methods to generate random bases or random base indexes, optionally disallowing the generation of a specified base or base index.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@943 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:47:54 +00:00
asivache 13eb868536 helper class. array-like random access and fast shift. good for sliding windows (e.g. keeping coverage over last 100 bases while sliding along the reference)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@942 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:11:57 +00:00
asivache 3d6e738a60 still under development. does not genotype yet, but walks and talks (counts overal coverage and indel variant occurences at every reference position
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@941 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:10:31 +00:00
ebanks 58f7ae8628 better filtering, plus deal with case where user doesn't input maxlength
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@939 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 18:44:29 +00:00
asivache ce431b5d2d added hashCode()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@937 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 16:52:02 +00:00
asivache b4ef16ced2 extractIndels() now should deal correctly with soft- and hard-clipped bases
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@936 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 16:04:49 +00:00
aaron a8a2d0eab9 added support for the -M option in traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@935 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 15:12:24 +00:00
hanna e2ed56dc96 Add a MAX_READ_GROUPS sanity parameter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@934 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 13:57:43 +00:00
asivache 9f35a5aa32 Insidious bug: clipped sequences (S cigar elements) where a) processed incorrectly; b) sometimes caused IntervalCleaner to crash, if such sequence occured at the boundary of the interval. The following inconsistency occurs: LocusWindow traversal instantiates interval reference stretch up to rightmost read.getAlignmentEnd(), but this does not include clipped bases; then IntervalCleaner takes all read bases (as a string) and does not check if some of them were clipped. Inside the interval this would cause counting mismatches on clipped bases, at the boundary of the interval the clipped bases would stick outside the passed reference stretch and index-out-of-bound exception would be thrown. THIS IS A PARTIAL, TEMPORARY FIX of the problem: mismatchQualitySum() is fixed, in that it does not count mismatches on clipped bases anymore; however, we do not attempt yet to realign only meaningful, unclipped part of the read; instead all reads that have clipped bases are assigned to the original reference and we do not attempt to realign them at all (we'd need to be careful to preserve the cigar if we wanted to do this)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@933 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 05:20:29 +00:00
ebanks 3a8219a469 use knowledge from other reads to find a consensus
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@932 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 21:22:17 +00:00