Commit Graph

907 Commits (d19366eaadf57ca545fe0e3496e16d4ee2192165)

Author SHA1 Message Date
kiran 41687d5237 Added accessors for the prior probabilities.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@994 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 21:16:10 +00:00
kiran 12dd18cdba Now aware of Hapmap and dbSNP sites. We *can* change the priors there, but we don't yet.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@993 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 21:15:34 +00:00
asivache d5cd883b99 bug fixed when a read with alignment end exactly at the window boundary and with last cigar element being an indel would cause index-out-of-bounds exception
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@992 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 21:03:15 +00:00
kiran a12009e9e7 Added a new constructor in which priors for hom-ref, het, and hom-var can be specified. Otherwise, it uses the default values of 0.999, 1e-3, and 1e-5 respectively.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@991 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 20:33:45 +00:00
kiran 909fefa40a Argumentized priors for hom-ref, het, and hom-var.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@990 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 20:32:44 +00:00
hanna 71e3825fa1 First pass of a walker for Eric that searches through an input BAM file for unclean reads, injecting the cleaned reads in their place and outputting the composite result.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@989 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 20:18:13 +00:00
ebanks 032d0436e6 Added ROD for 1KG SNP calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@988 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 19:53:51 +00:00
ebanks ffffe3b2f6 -Support for 1KG SNP calls in RODs
-Minor bug fix


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@987 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 18:56:37 +00:00
aaron 63b5c12cbd Changed dataSources to datasources, to be consistant with the rest of our package names. Also, this makes me champion in the largest check-in contest.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@985 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 18:13:22 +00:00
aaron 195b4ea7b4 a rename for consistancy of Sam to SAM, creating a genotype utils dir, and moving the GLF code into it.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@984 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 17:46:06 +00:00
ebanks 599ceeddd8 Better method for downsampling deep regions
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@983 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 16:57:40 +00:00
ebanks 4d9a88153a Update inferred insert size of cleaned reads when they are paired
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@982 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 16:29:13 +00:00
ebanks 3796654069 Added walker to emit intervals of clustered SNP calls
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@981 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-11 00:57:14 +00:00
hanna 678ddd914f Stopgap fixes GFF, DbSNP being half-open rather than half-closed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@980 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 21:38:57 +00:00
aaron 94b0e46d12 checked in a sample xml file used to store the defaults for the SomaticCoverage tool, and added it to the SomaticCoverage.jar in build.sml. Also added a inputStream marshalling method to the GATKArgumentCollection.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@979 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:46:16 +00:00
asivache 8d25f1a105 should be a little faster
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@978 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:33:45 +00:00
aaron 026f68fb41 a couple of quick name changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@976 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:02:52 +00:00
aaron 72a81f8f25 removed the requirement that a bam file list be present in the XML version of the command line arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@975 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 20:01:13 +00:00
ebanks b1f90635c1 1. downsample when there are too many mismatching reads (needs perfecting)
2. allow user to specify that no reads be emitted


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@974 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 19:55:42 +00:00
asivache 39dcd4f11f an attempt to bail out when unmapped reads are reached at the end of the file(s). still testing...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@973 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 19:53:50 +00:00
asivache 030efc468f added naive ad-hoc cutoff for the pile size the cleaner will attempt to process; use --maxPileSize argument to force any pile larger than specified cutoff to be directly written to the output without cleaning
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@972 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:52:35 +00:00
ebanks f9be175f44 Be smart about trying alternate consenses:
try prior indels first and only 1 instance of them


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@971 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:43:22 +00:00
aaron f304803811 initial check-in of an easy way to create command line tools based on the GATK
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@970 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:34:02 +00:00
kiran b0cc763eb5 Added some methods to format bases such that read bases on the forward strand are in uppercase, while those on the negative strand are lowercase. This does *not* affect the default functionality of the standard PileupWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@969 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:31:00 +00:00
depristo 9ebcd6546d Convenience printing
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@968 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 17:07:38 +00:00
asivache 06e5a765f8 now has two modes: one sample - just call indel sites; two samples - call somatic-looking variants only. Still uses heuristic count-based cutoffs, cutoffs are hardcoded and are pretty conservative...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@967 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 16:41:38 +00:00
ebanks 5451bbfd5a -move final vars to command-line args
-Per Andrey: ignore indels from aligner when testing against alt consensus


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@966 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 16:39:00 +00:00
hanna ad80894afa Bumped picard to latest svn version.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@965 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 14:36:34 +00:00
aaron ec2f015447 fixed a bunch of comments and license headers.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@964 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 14:10:46 +00:00
kiran 6bb7f7e9d8 Commented some stuff out so that things compile.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@963 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 14:06:33 +00:00
hanna dc6a9ca196 Pooling resources to lower memory consumption.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@962 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 13:39:32 +00:00
kiran 87ba8b3451 Removed some useless code. Don't apply second-base test if the coverage is too high, since the binomial probs explode and return NaN or Infinite values.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@961 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:27:06 +00:00
kiran a12ed404ce Changed method name from applyFourBaseDistributionPrior to applySecondBaseDistributionPrior. 'Cause that's how I roll.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@960 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:21:22 +00:00
kiran 3adb4239e4 Same as regular Pileup, but also allows you to see flanking region around locus. This will be useful in determining that some SNPs are spurious due to being at the ends of homopolymer regions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@959 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:19:31 +00:00
kiran 2b0e7f612b Handles bam pileups where some of the reads have SQ tags and some don't.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@958 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 08:17:15 +00:00
aaron 36c98b9d6c added tools to test read based traversals using the artificial in-memory SAM file tools, and testing of the PrintReadsWalker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@957 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-10 01:52:25 +00:00
aaron eb962fe52a adding an artificial sam file writer, used to unit test some of the walkers (mainly the PrintReadsWalker)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@956 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 21:47:49 +00:00
hanna e77dfe9983 Allow script to be easily modified to support different platforms.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@955 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 16:06:57 +00:00
depristo 7fa84ea157 10x speedup of recalibration walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@954 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 15:39:40 +00:00
aaron a62bc6b05d fixed some documentation and attached a correct license
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@953 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 14:44:27 +00:00
aaron bf6190b471 cleaned up the PrintReadsWalker, and added a lot of documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@952 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 14:28:32 +00:00
ebanks b45b1d5f2b border case bug fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@951 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 04:33:15 +00:00
kiran fecba2cae5 Disabled option to show secondary quals as the definition has changed to conform to the spec and thus this printout is non-sensical.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@950 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 03:21:14 +00:00
kiran e7f222108d More accessors. Can compute the sum of the quality scores in the read (useful for sorting) and can return a subset of itself.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@948 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 01:02:48 +00:00
kiran 6506504a60 Updates after seeing a certain number of reads, not a certain number of bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@947 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 01:01:36 +00:00
kiran 65d0675a4e Some changes regarding what to do when a cycle is completely busted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@946 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 01:01:13 +00:00
kiran 0bd78d72d7 Some changes regarding what to do when a cycle is completely busted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@945 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 01:00:33 +00:00
kiran 681e67c72c Added some methods to generate random bases or random base indexes, optionally disallowing the generation of a specified base or base index.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@943 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:47:54 +00:00
asivache 13eb868536 helper class. array-like random access and fast shift. good for sliding windows (e.g. keeping coverage over last 100 bases while sliding along the reference)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@942 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:11:57 +00:00
asivache 3d6e738a60 still under development. does not genotype yet, but walks and talks (counts overal coverage and indel variant occurences at every reference position
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@941 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-09 00:10:31 +00:00
ebanks 58f7ae8628 better filtering, plus deal with case where user doesn't input maxlength
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@939 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 18:44:29 +00:00
asivache ce431b5d2d added hashCode()
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@937 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 16:52:02 +00:00
asivache b4ef16ced2 extractIndels() now should deal correctly with soft- and hard-clipped bases
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@936 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 16:04:49 +00:00
aaron a8a2d0eab9 added support for the -M option in traversals.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@935 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 15:12:24 +00:00
hanna e2ed56dc96 Add a MAX_READ_GROUPS sanity parameter.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@934 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 13:57:43 +00:00
asivache 9f35a5aa32 Insidious bug: clipped sequences (S cigar elements) where a) processed incorrectly; b) sometimes caused IntervalCleaner to crash, if such sequence occured at the boundary of the interval. The following inconsistency occurs: LocusWindow traversal instantiates interval reference stretch up to rightmost read.getAlignmentEnd(), but this does not include clipped bases; then IntervalCleaner takes all read bases (as a string) and does not check if some of them were clipped. Inside the interval this would cause counting mismatches on clipped bases, at the boundary of the interval the clipped bases would stick outside the passed reference stretch and index-out-of-bound exception would be thrown. THIS IS A PARTIAL, TEMPORARY FIX of the problem: mismatchQualitySum() is fixed, in that it does not count mismatches on clipped bases anymore; however, we do not attempt yet to realign only meaningful, unclipped part of the read; instead all reads that have clipped bases are assigned to the original reference and we do not attempt to realign them at all (we'd need to be careful to preserve the cigar if we wanted to do this)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@933 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-08 05:20:29 +00:00
ebanks 3a8219a469 use knowledge from other reads to find a consensus
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@932 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 21:22:17 +00:00
hanna 596773e6c6 Cleanup.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@931 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 20:25:08 +00:00
depristo 98396732ba Bug fixes for Andrey
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@930 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 18:19:51 +00:00
asivache b48508a226 indelRealignment() signature changed. The only difference about consensus sequences is that they are passed along with alignment cigars that start inside the sequence, while for 'conventional' reads cigar always starts at position 0 on the read. Logically, indelRealignment() should not know what 'consensus' is. Instead, now it receives an additional int parameter, start of the cigar on the 'read' sequence
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@929 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 17:42:19 +00:00
asivache 9eb38c0222 mostly synchronizing with the main branch. Based on anecdotal evidence (too few examples in the data), realignment (shifting indel left across a repeat) works correctly on non-homonucleotide repeats
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@928 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 16:39:16 +00:00
ebanks c6634e3121 cleaned up some code and minor bug fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@927 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-07 03:14:21 +00:00
asivache 99c105790b Now indelRealignment should be correct... The old version could only condense to the left homo-nucleotide indels. New version should be able to detect and shift left arbitrary repeated sequence (e.g. deletion of ATA after ATAATAATA will be shifted left to the first occurence of ATA on the ref! NOT THOROUGHLY TESTED YET, will test tonight../somaticIndels.pl --dir . --cutoff 100 -filter EXON --mode SOMATIC --condense 5 --format bed > 0883.indel.somatic.exon.100.bed
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@926 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 23:54:07 +00:00
asivache 3b4dc6e7b5 added sequencePeriod(String seq, int minPeriod) - finds smallest period equal to or greater than minPeriod for the specified text string seq; this is a trivial (hopefully correct) back-of-the-envelope implementation for a well-known and well-studied problem; there should be more efficient algorithms in the wild
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@925 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 23:05:24 +00:00
hanna 40ac3b7816 Inject read group into covars_out file's toString output. Continue fixing systematic bug in the code where flattenData is not joined to the read group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@924 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 20:43:28 +00:00
asivache 0bb4565798 added AlignmentUtils.getNumAlignmentBlocks(read) - a faster alternative to read.getAlignmentBlocks().size(); IntervalCleaner updated accordingly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@923 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 19:35:21 +00:00
asivache 92b054b71b moved another variant of numMismatches to AlignmentUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@922 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 18:07:48 +00:00
asivache 7018dd1469 moved another variant of numMismatches to AlignmentUtils
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@921 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 18:05:29 +00:00
hanna ac5b7dd453 Fixed order-of-operations bug.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@919 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-06 03:22:56 +00:00
depristo 819862e04e major restructuring of generalized variant analysis framework. Now trivally easy to add additional analyses. Easy partitioning of all analyses by features, such as singleton status. Now has transition/transversional bias, counting, dbSNP coverage, HWE violation, selecting of variants by presence/absense in dbs. Also restructured the ROD system to make it easier to add tracks. Also, added the interval track -- if you provide an interval list, then the system autoatmically makese this available to you as a bound rod -- you can always find out where you are in the interval at every site. Python scripts improved to handle more merging, etc, into population snps.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@918 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 23:34:37 +00:00
asivache 400399f1b8 fixed (?) a bug in insertion realignment
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@917 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 22:04:37 +00:00
hanna 34bb43a6c8 Saw that one of the offsets needed to be changed from - 1 to -2 and changed the wrong damn offset. Fixed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@915 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 19:18:34 +00:00
ebanks 4623a34ad3 Fix bug in realigning insertion cigar strings
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@914 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 18:46:41 +00:00
aaron 199be46c36 changed the warning that is outputted when the GenomeLoc constructor can't find the given contig in the reference.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@913 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 15:49:03 +00:00
ebanks 092a754071 Make sure indel position from SW alignment is leftmost possible
(and improve printouts)


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@912 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 15:36:10 +00:00
aaron 37efd78c7e fixed the logger call so we get output that indicates this class generated the message
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@911 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 15:02:17 +00:00
aaron b323c58ef2 add a place to store the walker return value, along with a method to retrieve it
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@910 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 14:41:42 +00:00
ebanks 36fb6ca3c5 Allow user to specify the compression to be used when writing out BAM files.
Updated most of the walkers to reflect this change.
Now it won't take forever to write BAMs!



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@909 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 08:48:34 +00:00
ebanks c1792de44f First pass at fixing the incorrect border-case behavior of the cleaner
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@908 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 07:55:06 +00:00
hanna 9da04fd9ac Cleaned up error warning in case no PL groups are present.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@907 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 03:14:17 +00:00
ebanks 45eeefbb80 Deal with randomly occurring unmapped reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@906 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 02:55:53 +00:00
hanna fdfc3abf80 Better handling for case where PL attribute is missing.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@905 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-05 02:52:30 +00:00
hanna 9689bb3331 Very early draft of script integrating the covariant counting / logistic regression. Deleted some unused code and spurious debug info.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@902 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 22:52:11 +00:00
aaron 109bef6c08 We're no longer in the read-dropping business.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@901 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 22:37:51 +00:00
ebanks 4d880477d6 Deal with ends of contigs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@900 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 20:09:53 +00:00
hanna 40bc4ae39a The building blocks for segmenting covariate counting data by read group.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@899 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 19:55:24 +00:00
depristo 13be846c2a qualsAsInt argument for Pileup -- fixing stupid bug [again]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@898 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 18:52:12 +00:00
depristo 97c8ff75dd qualsAsInt argument for Pileup -- fixing stupid bug
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@897 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 18:51:17 +00:00
depristo 9de3e58aa8 qualsAsInt argument for Pileup
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@896 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 18:37:39 +00:00
asivache 4d654f30d4 slightly improved error message printed upon failure to parse interval list file
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@895 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 18:24:43 +00:00
asivache bcc7bacba1 added List<Transcript> getTranscripts(); also more comments added
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@894 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 16:25:14 +00:00
depristo b492192838 Pairwise SNP distance metrics now enabled
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@892 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-04 00:11:29 +00:00
hanna 8672ae6019 Now seeing results from the training data. There are still some critical problems in the quality of the output, but we're at least getting training output.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@891 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-03 20:41:07 +00:00
ebanks 4e41646c88 print out stats for Andrey
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@890 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-03 17:45:35 +00:00
andrewk dfe464cd81 Updated CovariateCounterWalker to be read group aware
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@889 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-03 10:06:06 +00:00
aaron 40af4f085c Adding some utilities to test unmapped reads
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@887 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-03 07:40:34 +00:00
hanna fa93661133 Eric wins the prize for pointing out that doubles weren't valid command-line arguments. Made all primitive types parseable as command-line arguments.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@884 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 22:41:10 +00:00
aaron 107b5d73b5 The flagStatReadWalker generates the exact same statistical output as the samtools flagstat command, so the two outputs can be diff'ed.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@883 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 21:23:56 +00:00
kcibul a1218ef508 changed default value for failure output
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@880 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 19:32:29 +00:00
depristo 7e7c83ddca fixing insidious bugs
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@879 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 18:33:45 +00:00
hanna 6e60cddfed A fix for the 'rod blows up when it hits a GenomeLoc outside the reference' issu
e.  Really a stopgap; error handling in the RODs needs to be addressed in a more comprehensive way.  Right now, hasNext() isn't guaranteed to be correct.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@878 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 18:14:46 +00:00
kcibul ad5b057140 parameterized a bit more
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@877 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 17:58:26 +00:00
andrewk 587d07da00 Merged functionality of two python scripts into LogRegression.py, some clarity updates to covariate and regression java files.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@876 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 16:55:05 +00:00
aaron 82aa0533b8 added some more documentation to the GLF writer and it's supporting classes, and some other fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@875 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 14:53:58 +00:00
kcibul c4cb867d74 basic clustering of reads to reduce artifacts
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@873 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-02 02:54:21 +00:00
aaron e712d69382 GLF writing support
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@872 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 21:30:18 +00:00
jmaguire 417f5b145e Strand test and misc touch-ups
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@871 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 17:13:21 +00:00
aaron fc91e3e30e equals signs can be important
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@870 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 16:56:21 +00:00
aaron 4edb33788b added a fix for a bug Andrew found
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@869 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 16:53:56 +00:00
hanna b7defeae83 Fix bug in unit tests created by new filter in TraversalEngine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@868 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 15:50:44 +00:00
hanna fc7320133c Cleaned up error when fasta index is missing. Code still throws an exception, but the message is more direct (no more 'error while micromanaging') and tells the user to run 'samtools faidx' to fix the issue.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@867 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 15:34:38 +00:00
depristo f19d7abba9 Added geli compatibility mode to SingleSampleGenotyper, to enable easy linking to the geli2popsnps.py script
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@866 348d0f76-0448-11de-a6fe-93d51630548a
2009-06-01 14:32:12 +00:00
kcibul 4d6398cef9 a lot of people have been asking me for the equivalent of the old "PrintCoverage" command from Arachne. Even though I show them the pileup, and they agree that's more accurate/complete, they don't want to modify their scripts and/or write a translator. It was simple enough to write, so here it is.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@863 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-31 01:45:23 +00:00
hanna c04b67c969 Basic instrumentation support for the hierarchical microscheduler.x
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@862 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 22:19:27 +00:00
asivache c8347c3c94 set proper package name (...walkers.indels), remove couple of unused import statements
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@861 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 22:02:14 +00:00
asivache c549c34caa still in development and testing; kinda works
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@860 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 21:59:03 +00:00
asivache c252fec1bc synchronizing, no real changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@859 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 21:56:14 +00:00
asivache eafdba7300 more efficient implementation of line parsing, runs at least 1.5 times faster
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@858 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 21:09:06 +00:00
hanna 8761ab3aff Oops. IteratorPool was occasionally creating too many RODIterators in cases where some reference-ordered data was missing. Fixed by better tracking position of RODIterator.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@857 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 21:00:31 +00:00
asivache d601548d53 added reallocate(int[] orig_array, int new_size) and int[] indexOfAll(String s, int ch); the former is self-explanatory, while the latter returns array of indices of all occurences of ch in the specified string
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@856 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 20:15:00 +00:00
hanna a1edb898ef Make criteria for determining whether to stop and merge inputs more sane.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@855 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 18:08:18 +00:00
asivache fe3b843b65 intercept NullPointerException and rethrow it with (marginally) comprehensible error message when an attempt to get class source code location fails
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@854 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 15:56:55 +00:00
depristo e0803eabd9 enabled underlying filtering of zero mapping quality reads, vastly improves system performance
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@853 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-29 14:51:08 +00:00
hanna 1f93545c70 Always opt to merge dictionaries when creating a SAMFileHeaderMerger.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@852 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 22:38:16 +00:00
hanna 0cf90b6f8a Tie into sequence merging code in the latest version of picard.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@851 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 21:48:35 +00:00
aaron b43deda6c9 iterative changes to GLF files; also a test of checking-in over sshfs.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@850 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:24:30 +00:00
hanna 5e8c08ee63 Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@849 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 20:13:01 +00:00
ebanks 19f9ac2b05 Realign existing indels (from the aligner) to leftmost position
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@848 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 04:56:51 +00:00
hanna aa17c4a468 Farewell, functionalj. You promised much, but you could not deliver.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@847 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-28 01:35:49 +00:00
aaron d275c18e58 adding some objects we need for the GLF format.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@846 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 22:32:25 +00:00
depristo ce6a0f522b First incarnation of the population-based SNP analysis tool. Also bug fixes throughout the GATK
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@845 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 22:02:24 +00:00
hanna a11bf0f43e Basic unit tests for ReferenceOrderedView, ShardDataProvider. Addressing GSA-25.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@844 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 21:15:01 +00:00
ebanks e533c64b8f Walker to pull out the reference for given intervals and emit them in fasta format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@843 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:39:09 +00:00
aaron 5c6163ecbf Removing the old reads traversal.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@842 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:36:11 +00:00
aaron c7b032cc88 missed a file in the add.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@841 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:27:38 +00:00
aaron 3c3cd5bb64 Moving some of the data sharding around. A new shard catagory now exits, INTERVAL. This saved a lot of code that was mirroring the same approach in both the read and locus shard strategies.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@840 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:24:31 +00:00
asivache 99524ab6d0 package name corrected
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@839 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:20:43 +00:00
asivache b76f8c4eb5 moved from playground to gatk
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@838 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:18:33 +00:00
asivache c3678c7bb9 moved from playground to gatk
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@837 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:18:08 +00:00
asivache 5b310e48f5 changed to use factored out Transcript class; some docs added (not much)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@836 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 18:17:23 +00:00
asivache ae0bac5696 'made public' implies the 'public' keyword, actually...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@835 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 17:57:01 +00:00
asivache 41c1a62ac4 formerly private class, factored out and made public. Represents a transcript annotation (transcript id, genomic location, genomic intervals for all exons present in this transcript, etc)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@834 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 17:52:38 +00:00
ebanks 9bd6489f8e Output indels in the format appropriate for low-coverage indel submission
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@832 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 17:32:15 +00:00
ebanks 919e995b7f -Moved my walkers to indels directory
-Removed entropy walker and replaced it with mismatch (column) walker
-Some improvements to the cleaner (more to come)



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@830 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 16:34:24 +00:00
hanna 864a1e81e3 Delete stale class from previous rethink of the traversal engine.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@828 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-27 13:52:03 +00:00
aaron 6fab1a64fa Started work on GLF input / output basics. Do not use.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@827 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 22:49:59 +00:00
asivache b81135c606 bug fixed; this rod seems to work now...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@826 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 22:25:34 +00:00
hanna a488d2dbb2 Lazy creation of output streams. Only create output streams when absolutely necessary.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@824 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:56:57 +00:00
asivache ab7bb5800a forgot to remove debug print statement
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@823 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:38:27 +00:00
asivache 568a0d3c27 exon coordinates are now parsed correctly (?). IF DELIMITER IS THE LAST CHARACTER IN A STRING, String.split() DOES NOT return empty field as the last one; instead, the last field returned will be the one immediately before such delimiter! Wicked.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@822 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:36:50 +00:00
asivache f4119c17de still working on it...
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@821 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:07:38 +00:00
asivache d73f2e95cc refseq added to the list of known rod types
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@820 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:06:44 +00:00
asivache 23b7a28015 simple walker that works off pre-computed tumor/normal genotyping calls (e.g. samtools pileup). Collects overal stats and also writes somatic variants into IGV-compatible bed file if asked to. NOT finished. NOT tested
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@819 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:05:47 +00:00
asivache 8f1cabd33d cmd line args changed - again; internally uses VariantType enum
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@818 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:03:58 +00:00
asivache 9ef1a21112 minor changes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@817 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 21:03:06 +00:00
aaron d994544c47 Added back end code support for Sharding based on genomic location for reads. Changed the sharding
code to take GenomeLocSortedSet instead of a list<GenomeLoc>, and added a bunch of much simplier 
and cleaner test cases.



git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@816 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 20:57:46 +00:00
asivache 4edcdffe45 refseq annotation track: should be able to provide (multiple) transcript annotations available over a given genomic position. NOT finished and NOT tested!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@815 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 20:07:15 +00:00
andrewk 149cc9989b spaces!!!!!!!!!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@814 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 19:40:25 +00:00
ebanks c2df35b7fe - get leftmost position of indel correct
- don't try to clean reads with mapping quality of 0
- un-deprecate


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@813 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 17:24:58 +00:00
hanna 54bb643d19 Validated Mark's assertion that GSA-27 is fixed. Also did some cleanup on the pileup walker so that it doesn't output to System.out.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@812 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-26 15:58:21 +00:00
hanna 008d677bea Fixed ValidatingPileup to work with Andrey's new rodSAMPileup -> GenotypeList type hierarchy.
Fixed reference-ordered data validation system to validate class hierarchies instead of specific class types.


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@811 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-23 20:50:28 +00:00
aaron d056f9f3e8 Changed the name to reflect the sorted nature of the set, added some fixes
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@810 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 22:34:24 +00:00
aaron 831d430025 Added a collection for storing GenomeLocs, that also has functions for removing by genomic region (that may span multiple GenomeLoc's in the collection), and adding regions, which are then merged with any overlapping regions.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@809 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:52:40 +00:00
hanna 34413362fd Bugfix: handle case where queue is empty.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@808 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:45:22 +00:00
hanna ec2e8d5726 Fixes for getting ValidatingPileup running in parallel.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@807 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 21:20:24 +00:00
kiran cd80e3f372 Replaced dumb training function with a version that creates a training set slightly more sensibly.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@806 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:34:33 +00:00
kiran 02c0afdb85 Added the ability to specify the sorted, unaligned bam and/or the sorted, aligned bam such that broken computations can be restarted.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@805 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:33:34 +00:00
kiran 454a6d1df7 Fixed an egregious error in simpleReverseComplement wherein the RC'd string would be composed entirely of the last base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@804 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:32:20 +00:00
hanna 2a5be1debe Cleanup in datasources.providers namespace. Make it easier for others writing traversal engines to use.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@803 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 19:12:00 +00:00
asivache 02fc4f145f refactoring: a couple of general purpose (hopefully useful?) methods/classes extracted into a standalone utils class
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@802 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 18:54:40 +00:00
asivache 4b718688d5 no changes, really, just synchronizing (instead of reversing) to increase the amount of entropy
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@801 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:27:28 +00:00
asivache 893f1b6427 updated
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@800 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:25:50 +00:00
asivache a9dfbfb309 internal changes and some refactoring. slightly different final report. Now can take tracks that implement either Genotype or GenotypeList; takes an arg specifying what variants to look for (POINT - aka snp - or INDEL); takes an arg specifying whether default ref/ref call of one type (INDEL/POINT) should be implicitly assumed if another call (POINT/INDEL respectively) was made at the same position [this is probably most useful for indels and only (?) for sam pileups: if we have only point mutation call at a given position, it does mean that we do have coverage, and that there was no evidence whatsoever for an indel, so we have an implicit 'no-indel' call]
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@799 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:25:09 +00:00
asivache d5bb4d9ba9 Auxiliary class that can read one line from samtools pileup file. Used by rodSAMPileup to read pairs of lines as needed. NOTE: this class implements Genotype and (a trivial) GenotypeList, but it is NOT a rod!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@798 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:20:01 +00:00
asivache 732fed9aad ALERT, ALERT! rodSAMPileup is now a GenotypeList, not a Genotype! Now it can intelligently read full samtools pileup files (containing, in general, both point and indel genotypes at the same position). No need to split/synchronize pileups from different individuals anymore, hooray!
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@797 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:17:59 +00:00
asivache 26633957d9 Genotype interface is extended: now it requires implementing object to be able to tell whether it isPointGenotype() or isIndelGenotype() (and the contract requires, e.g. alleles to be represented differently)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@796 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:14:46 +00:00
depristo d9fc84f1e3 actually checking in the first pass
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@795 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:13:27 +00:00
asivache 8773b3a430 a trivial wrapper interface for the objects capable of holding 'full' genotype, i.e. both point (as in ref/snp) and indel variants at the same reference position
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@794 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:12:01 +00:00
depristo 7a979859a9 Intermediate checking for evaluation -- now supports transition / transversion evaluation
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@793 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 17:05:06 +00:00
ebanks f2ea193149 For some reason the apostraphes in the comments were throwing annoying
compile-time warnings: "unmappable character for encoding UTF8"


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@792 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 14:07:07 +00:00
jmaguire 9902ce8073 properly flush the gzip output stream. this was a subtle inheritance bug.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@791 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 13:57:58 +00:00
asivache 63caca31bf minor update in report printout format
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@790 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-22 13:56:09 +00:00
asivache 7afc10fd6f updated, reports more stuff now, including stats for external consistency checks
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@789 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:28:18 +00:00
depristo 30c63daf89 More improvements to the duplicate quality combiner, making progress towards a clean system
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@788 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:26:57 +00:00
depristo 65995887fc Releasable version of the Pileup walker
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@786 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:25:37 +00:00
depristo dc17a5661d Better accessors for dealing with second base prob pileups
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@785 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:25:16 +00:00
depristo d261459c48 Useful function to create a string with N copies of a same char
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@784 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 22:23:52 +00:00
kiran 287bb52e81 Refreshes the mount points that we'll be using (so that the program will play nicely with LSF).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@783 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:36:12 +00:00
jmaguire b5ad5176f7 stick headers on the output tables
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@782 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:35:50 +00:00
kiran 83e1454a11 Added a method to determine the fraction of a sequence that's taken up by the most frequent base.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@781 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:35:31 +00:00
hanna d61a5261c1 Better integration of reference-ordered data into the data sharding system.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@779 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 20:09:32 +00:00
ebanks 0d58e4ccc9 -check original alignments for indels when computing mismatch score
-move logging to debug


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@778 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:55:42 +00:00
kiran 5f67914b08 Added loads of documentation.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@777 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:40:47 +00:00
kiran 1a9d5cea29 Added a method to reverse-complement a String object, preserving 'N' and '.' bases.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@776 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:39:39 +00:00
kiran a687c6bc03 Added a method to refresh an NFS mount point (necessary to prevent NFS flakiness when running on the LSF farm.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@774 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:31:54 +00:00
aaron 8515247575 Adding some functions I keep reinventing, especially for testing purposes.
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@772 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:30:44 +00:00
ebanks e6200fe5b5 don't ignore reads when maxReadLength isn't set
also, print out LOD score for cleaning


git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@771 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 19:24:10 +00:00
andrewk 0219d33e10 QualityUtils: added reverse function to reverse an array of bytes (and not complement it), BaseUtils: split qualToProb into itself and qualToErrProb, CovariateCounterWalker and LogisticRecalibrationWalker: several changes including a properly acocunting (only partly complete) for reversing AND complementing bases that are negative strand, PrintReadsWalker: created option to output reads to a BAM file rather than just to the sceern (useful for creating a downsampled BAM file)
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@770 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 18:30:45 +00:00
asivache 7e77c62b49 auxiliary class, a simple struct to keep together info like numbers of covered, assessed, ref/variant bases across the sample
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@769 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 16:30:16 +00:00
asivache 7e5e422591 ReferenceOrdereData now inspects the ROD class using reflection. If the ROD declares a static Iterator<ROD> createIterator(String rodName, File rodFile) factory method, it is wrapped and used by the ReferenceOrderedData to read records from rodFile. If the ROD does not provide such factory method, the old behavior is the default: ReferenceOrderedData uses its own simple default iterator to read the file line by line (assuming there is only one line per record/position).
git-svn-id: file:///humgen/gsa-scr1/gsa-engineering/svn_contents/trunk@768 348d0f76-0448-11de-a6fe-93d51630548a
2009-05-21 15:23:22 +00:00