Commit Graph

  • 7a921c908c Can now adjust the genotype likelihoods of a variant returned from the rod. This automatically causes the lodBtr, lodBtnb, and genotype to be recomputed. kiran 2009-06-18 07:26:37 +0000
  • 9a7cec7d2e Directory to house variant calling and filtration tools. kiran 2009-06-18 07:20:38 +0000
  • 5992d88409 skip N's in the reference (rather than crash. doh!) jmaguire 2009-06-17 23:22:35 +0000
  • f45d5a73a5 Package annotator for Alec. hanna 2009-06-17 22:40:40 +0000
  • c4d9058f32 Added module rodVariants.class to the list of allowable RODs. kiran 2009-06-17 21:33:13 +0000
  • ab2a80f3ea A new ROD type that allows one to input a geli.calls file back into a walker. kiran 2009-06-17 21:32:21 +0000
  • 9ef391706c Added outputting of genotype posteriors to geli.calls file. kiran 2009-06-17 21:31:46 +0000
  • 615572ea06 output to out... not System.out... kcibul 2009-06-17 20:43:10 +0000
  • b947fd586f FIxed a nasty bug in GenomeLoc compareContigs; we were using '==' to compare Integer contig ID's. The surprising thing is that it actually works for Integers > -127 and < 128 (they're cached by the JVM, so it's actually comparing the underlying ints). Switched over GenomeLoc contigs to int based. aaron 2009-06-17 20:19:47 +0000
  • ed7fac1c90 Add bcel and cleanup. hanna 2009-06-17 19:28:04 +0000
  • 87d1c11ed7 Delete lingering empty directory. hanna 2009-06-17 18:33:03 +0000
  • cba9025983 More package-level documentation. hanna 2009-06-17 16:28:45 +0000
  • 43a28750e0 Package level documentation -- helps new users get acclimated to the codebase more quickly. hanna 2009-06-17 16:27:48 +0000
  • 673205ed5f additional output tweaking kcibul 2009-06-17 15:37:38 +0000
  • 7d281296a7 Finishing checking for building depristo 2009-06-17 14:12:40 +0000
  • d1e25bfe88 Intermediate checkin for safety -- now compiles depristo 2009-06-17 13:16:55 +0000
  • 2250769a42 Intermediate checkin for safety -- do not use depristo 2009-06-17 13:07:19 +0000
  • 86c8c08375 Intermediate checkin for safety -- do not use depristo 2009-06-17 13:06:24 +0000
  • e2ccea4883 Cleanup. Move output of packaging to dist directory. Don't always create resources directory. Make jar take on the package name. hanna 2009-06-16 22:47:23 +0000
  • 78b7fb25c7 allow contig names to have spaces in the fai. This is not yet supported by samtools fai generator (which truncates at the first space), but we might as well fix it on our side. aaron 2009-06-16 22:23:12 +0000
  • 6ee64c7e43 added changes to support alec toUnmappedRead seek. Huge improvements (orders of magnitude) in unmapped read performance. aaron 2009-06-16 22:15:56 +0000
  • 4f6d26849f Behold MultiSampleCaller! jmaguire 2009-06-16 20:03:24 +0000
  • 7db4497013 fixing the readTraversal output aaron 2009-06-16 19:44:38 +0000
  • b11c5a7cd5 doing some read validation aaron 2009-06-16 19:25:43 +0000
  • 010304fe44 bug: printing incorrect coordinates into output, finally fixed (?) asivache 2009-06-16 18:08:56 +0000
  • 647b8a1ab0 Fix TabularROD printing and testing so Aaron stops nagging me. ebanks 2009-06-16 15:49:26 +0000
  • a0a549557f added a check of the sort ordering to the query methods, so that we detect if a file is unsorted much earlier. Also added some verbosity to the exception; it now contains an information about the raw attribute we saw for 'SO', the sort order of the bam file. aaron 2009-06-15 22:15:03 +0000
  • 2259dc3a8f added filtering out indels with large levels of noise (mismatches) remaining in the close proximity; also a bug in recording deletion coordinates is fixed asivache 2009-06-15 21:13:28 +0000
  • a6477df6d1 Now optionally outputs whether "SNPs" are maintained/cleaned out/introduced by cleaning ebanks 2009-06-15 20:02:02 +0000
  • 29df74ae23 Plumbed packaging support into build.xml and added package for GATK. hanna 2009-06-15 19:41:16 +0000
  • 11aa715630 added capability for filtering by platform ebanks 2009-06-15 19:19:50 +0000
  • 8f4bc8cb6e Move filtering functionality into the PrintReadsWalker. More to come. ebanks 2009-06-15 16:38:08 +0000
  • 161c74716c Forgot to change some direct references to variables in SSG. Fixed. kiran 2009-06-15 14:16:18 +0000
  • 9eeb5f79d4 Various refactoring to achieve hapmap and dbsnp awareness, the ability to set pop-gen and secondary base priors from the command-line, and general code cleanup. kiran 2009-06-15 07:21:08 +0000
  • f2946fa3e8 Various refactoring to achieve hapmap and dbsnp awareness, the ability to set pop-gen and secondary base priors from the command-line, and general code cleanup. kiran 2009-06-15 07:20:22 +0000
  • f6af190b74 ignore clipped reads for realigning indel positions ebanks 2009-06-15 01:01:27 +0000
  • 93dc2cdc70 Start of a 'package' format for xml files which should be distributed together. Uses xslt scripts to transform packages into build scripts. hanna 2009-06-15 00:52:48 +0000
  • 0583459839 Another formatting change to make Hapmap sites more clearly visible. kiran 2009-06-12 19:53:21 +0000
  • 811f560efb add refseq annotations to single sample calls asivache 2009-06-12 19:43:30 +0000
  • e9be2a9c60 Changed a formatting issue. kiran 2009-06-12 19:40:32 +0000
  • ca09a10b76 refseq annotation rod is now manually bound to tell coding indels from non-coding ones asivache 2009-06-12 19:27:37 +0000
  • 260fd0dc45 Trivial change depristo 2009-06-12 19:11:28 +0000
  • 5859948e80 Fixed bugs in CleanedReadInjector arising from integration testing. hanna 2009-06-12 17:37:33 +0000
  • fb7ba47fff Now does really neightbor distance calculation, as well as true snp cluster counting depristo 2009-06-12 16:29:26 +0000
  • dbf2cc037c don't have a null-pointer hissy fit when the reference is N. jmaguire 2009-06-12 13:59:16 +0000
  • 1fb241a8b8 Now supports resume and dry runningRecalQual.py depristo 2009-06-11 23:31:59 +0000
  • 4eda040e0f what used to be internal cutoff values are now exposed as cmdline parameters: minCoverage, minNormalCoverage, minFraction, minConsensusFraction asivache 2009-06-11 21:22:52 +0000
  • 41687d5237 Added accessors for the prior probabilities. kiran 2009-06-11 21:16:10 +0000
  • 12dd18cdba Now aware of Hapmap and dbSNP sites. We *can* change the priors there, but we don't yet. kiran 2009-06-11 21:15:34 +0000
  • d5cd883b99 bug fixed when a read with alignment end exactly at the window boundary and with last cigar element being an indel would cause index-out-of-bounds exception asivache 2009-06-11 21:03:15 +0000
  • a12009e9e7 Added a new constructor in which priors for hom-ref, het, and hom-var can be specified. Otherwise, it uses the default values of 0.999, 1e-3, and 1e-5 respectively. kiran 2009-06-11 20:33:45 +0000
  • 909fefa40a Argumentized priors for hom-ref, het, and hom-var. kiran 2009-06-11 20:32:44 +0000
  • 71e3825fa1 First pass of a walker for Eric that searches through an input BAM file for unclean reads, injecting the cleaned reads in their place and outputting the composite result. hanna 2009-06-11 20:18:13 +0000
  • 032d0436e6 Added ROD for 1KG SNP calls ebanks 2009-06-11 19:53:51 +0000
  • ffffe3b2f6 -Support for 1KG SNP calls in RODs -Minor bug fix ebanks 2009-06-11 18:56:37 +0000
  • 5440dd13df Preparation for point release of read calibrator: no artificial heap size limit, no duplicate dbsnp records. hanna 2009-06-11 18:39:33 +0000
  • 63b5c12cbd Changed dataSources to datasources, to be consistant with the rest of our package names. Also, this makes me champion in the largest check-in contest. aaron 2009-06-11 18:13:22 +0000
  • 195b4ea7b4 a rename for consistancy of Sam to SAM, creating a genotype utils dir, and moving the GLF code into it. aaron 2009-06-11 17:46:06 +0000
  • 599ceeddd8 Better method for downsampling deep regions ebanks 2009-06-11 16:57:40 +0000
  • 4d9a88153a Update inferred insert size of cleaned reads when they are paired ebanks 2009-06-11 16:29:13 +0000
  • 3796654069 Added walker to emit intervals of clustered SNP calls ebanks 2009-06-11 00:57:14 +0000
  • 678ddd914f Stopgap fixes GFF, DbSNP being half-open rather than half-closed. hanna 2009-06-10 21:38:57 +0000
  • 94b0e46d12 checked in a sample xml file used to store the defaults for the SomaticCoverage tool, and added it to the SomaticCoverage.jar in build.sml. Also added a inputStream marshalling method to the GATKArgumentCollection. aaron 2009-06-10 20:46:16 +0000
  • 8d25f1a105 should be a little faster asivache 2009-06-10 20:33:45 +0000
  • 3a340ca887 adding the SomaticCoverage.jar to the list of generated jars, at least for now. aaron 2009-06-10 20:05:54 +0000
  • 026f68fb41 a couple of quick name changes aaron 2009-06-10 20:02:52 +0000
  • 72a81f8f25 removed the requirement that a bam file list be present in the XML version of the command line arguments. aaron 2009-06-10 20:01:13 +0000
  • b1f90635c1 1. downsample when there are too many mismatching reads (needs perfecting) 2. allow user to specify that no reads be emitted ebanks 2009-06-10 19:55:42 +0000
  • 39dcd4f11f an attempt to bail out when unmapped reads are reached at the end of the file(s). still testing... asivache 2009-06-10 19:53:50 +0000
  • 030efc468f added naive ad-hoc cutoff for the pile size the cleaner will attempt to process; use --maxPileSize argument to force any pile larger than specified cutoff to be directly written to the output without cleaning asivache 2009-06-10 17:52:35 +0000
  • f9be175f44 Be smart about trying alternate consenses: try prior indels first and only 1 instance of them ebanks 2009-06-10 17:43:22 +0000
  • f304803811 initial check-in of an easy way to create command line tools based on the GATK aaron 2009-06-10 17:34:02 +0000
  • b0cc763eb5 Added some methods to format bases such that read bases on the forward strand are in uppercase, while those on the negative strand are lowercase. This does *not* affect the default functionality of the standard PileupWalker kiran 2009-06-10 17:31:00 +0000
  • 9ebcd6546d Convenience printing depristo 2009-06-10 17:07:38 +0000
  • 06e5a765f8 now has two modes: one sample - just call indel sites; two samples - call somatic-looking variants only. Still uses heuristic count-based cutoffs, cutoffs are hardcoded and are pretty conservative... asivache 2009-06-10 16:41:38 +0000
  • 5451bbfd5a -move final vars to command-line args -Per Andrey: ignore indels from aligner when testing against alt consensus ebanks 2009-06-10 16:39:00 +0000
  • ad80894afa Bumped picard to latest svn version. hanna 2009-06-10 14:36:34 +0000
  • ec2f015447 fixed a bunch of comments and license headers. aaron 2009-06-10 14:10:46 +0000
  • 6bb7f7e9d8 Commented some stuff out so that things compile. kiran 2009-06-10 14:06:33 +0000
  • dc6a9ca196 Pooling resources to lower memory consumption. hanna 2009-06-10 13:39:32 +0000
  • 87ba8b3451 Removed some useless code. Don't apply second-base test if the coverage is too high, since the binomial probs explode and return NaN or Infinite values. kiran 2009-06-10 08:27:06 +0000
  • a12ed404ce Changed method name from applyFourBaseDistributionPrior to applySecondBaseDistributionPrior. 'Cause that's how I roll. kiran 2009-06-10 08:21:22 +0000
  • 3adb4239e4 Same as regular Pileup, but also allows you to see flanking region around locus. This will be useful in determining that some SNPs are spurious due to being at the ends of homopolymer regions. kiran 2009-06-10 08:19:31 +0000
  • 2b0e7f612b Handles bam pileups where some of the reads have SQ tags and some don't. kiran 2009-06-10 08:17:15 +0000
  • 36c98b9d6c added tools to test read based traversals using the artificial in-memory SAM file tools, and testing of the PrintReadsWalker aaron 2009-06-10 01:52:25 +0000
  • eb962fe52a adding an artificial sam file writer, used to unit test some of the walkers (mainly the PrintReadsWalker) aaron 2009-06-09 21:47:49 +0000
  • e77dfe9983 Allow script to be easily modified to support different platforms. hanna 2009-06-09 16:06:57 +0000
  • 7fa84ea157 10x speedup of recalibration walker depristo 2009-06-09 15:39:40 +0000
  • a62bc6b05d fixed some documentation and attached a correct license aaron 2009-06-09 14:44:27 +0000
  • bf6190b471 cleaned up the PrintReadsWalker, and added a lot of documentation. aaron 2009-06-09 14:28:32 +0000
  • b45b1d5f2b border case bug fixes ebanks 2009-06-09 04:33:15 +0000
  • fecba2cae5 Disabled option to show secondary quals as the definition has changed to conform to the spec and thus this printout is non-sensical. kiran 2009-06-09 03:21:14 +0000
  • 5fa3f7ed3a Added absolute path bug fix for Mark. hanna 2009-06-09 02:25:17 +0000
  • e7f222108d More accessors. Can compute the sum of the quality scores in the read (useful for sorting) and can return a subset of itself. kiran 2009-06-09 01:02:48 +0000
  • 6506504a60 Updates after seeing a certain number of reads, not a certain number of bases. kiran 2009-06-09 01:01:36 +0000
  • 65d0675a4e Some changes regarding what to do when a cycle is completely busted. kiran 2009-06-09 01:01:13 +0000
  • 0bd78d72d7 Some changes regarding what to do when a cycle is completely busted. kiran 2009-06-09 01:00:33 +0000
  • af0b03a257 Added tests for mostFrequentBaseFraction() and reverseComplementString() kiran 2009-06-09 00:53:45 +0000
  • 681e67c72c Added some methods to generate random bases or random base indexes, optionally disallowing the generation of a specified base or base index. kiran 2009-06-09 00:47:54 +0000
  • 13eb868536 helper class. array-like random access and fast shift. good for sliding windows (e.g. keeping coverage over last 100 bases while sliding along the reference) asivache 2009-06-09 00:11:57 +0000