Commit Graph

  • 3d6e738a60 still under development. does not genotype yet, but walks and talks (counts overal coverage and indel variant occurences at every reference position asivache 2009-06-09 00:10:31 +0000
  • 127c321d0a Cut over to 1kG version of fasta / reference. Updated doc with latest version of tool summary. hanna 2009-06-08 21:11:44 +0000
  • 58f7ae8628 better filtering, plus deal with case where user doesn't input maxlength ebanks 2009-06-08 18:44:29 +0000
  • f6e985d97f Documentation for read quality recalibrator. We have to spend some time rethinking how to organize these mini-releases. hanna 2009-06-08 16:54:39 +0000
  • ce431b5d2d added hashCode() asivache 2009-06-08 16:52:02 +0000
  • b4ef16ced2 extractIndels() now should deal correctly with soft- and hard-clipped bases asivache 2009-06-08 16:04:49 +0000
  • a8a2d0eab9 added support for the -M option in traversals. aaron 2009-06-08 15:12:24 +0000
  • e2ed56dc96 Add a MAX_READ_GROUPS sanity parameter. hanna 2009-06-08 13:57:43 +0000
  • 9f35a5aa32 Insidious bug: clipped sequences (S cigar elements) where a) processed incorrectly; b) sometimes caused IntervalCleaner to crash, if such sequence occured at the boundary of the interval. The following inconsistency occurs: LocusWindow traversal instantiates interval reference stretch up to rightmost read.getAlignmentEnd(), but this does not include clipped bases; then IntervalCleaner takes all read bases (as a string) and does not check if some of them were clipped. Inside the interval this would cause counting mismatches on clipped bases, at the boundary of the interval the clipped bases would stick outside the passed reference stretch and index-out-of-bound exception would be thrown. THIS IS A PARTIAL, TEMPORARY FIX of the problem: mismatchQualitySum() is fixed, in that it does not count mismatches on clipped bases anymore; however, we do not attempt yet to realign only meaningful, unclipped part of the read; instead all reads that have clipped bases are assigned to the original reference and we do not attempt to realign them at all (we'd need to be careful to preserve the cigar if we wanted to do this) asivache 2009-06-08 05:20:29 +0000
  • 3a8219a469 use knowledge from other reads to find a consensus ebanks 2009-06-07 21:22:17 +0000
  • 596773e6c6 Cleanup. hanna 2009-06-07 20:25:08 +0000
  • 98396732ba Bug fixes for Andrey depristo 2009-06-07 18:19:51 +0000
  • b48508a226 indelRealignment() signature changed. The only difference about consensus sequences is that they are passed along with alignment cigars that start inside the sequence, while for 'conventional' reads cigar always starts at position 0 on the read. Logically, indelRealignment() should not know what 'consensus' is. Instead, now it receives an additional int parameter, start of the cigar on the 'read' sequence asivache 2009-06-07 17:42:19 +0000
  • 9eb38c0222 mostly synchronizing with the main branch. Based on anecdotal evidence (too few examples in the data), realignment (shifting indel left across a repeat) works correctly on non-homonucleotide repeats asivache 2009-06-07 16:39:16 +0000
  • c6634e3121 cleaned up some code and minor bug fixes ebanks 2009-06-07 03:14:21 +0000
  • 99c105790b Now indelRealignment should be correct... The old version could only condense to the left homo-nucleotide indels. New version should be able to detect and shift left arbitrary repeated sequence (e.g. deletion of ATA after ATAATAATA will be shifted left to the first occurence of ATA on the ref! NOT THOROUGHLY TESTED YET, will test tonight../somaticIndels.pl --dir . --cutoff 100 -filter EXON --mode SOMATIC --condense 5 --format bed > 0883.indel.somatic.exon.100.bed asivache 2009-06-06 23:54:07 +0000
  • 3b4dc6e7b5 added sequencePeriod(String seq, int minPeriod) - finds smallest period equal to or greater than minPeriod for the specified text string seq; this is a trivial (hopefully correct) back-of-the-envelope implementation for a well-known and well-studied problem; there should be more efficient algorithms in the wild asivache 2009-06-06 23:05:24 +0000
  • 40ac3b7816 Inject read group into covars_out file's toString output. Continue fixing systematic bug in the code where flattenData is not joined to the read group. hanna 2009-06-06 20:43:28 +0000
  • 0bb4565798 added AlignmentUtils.getNumAlignmentBlocks(read) - a faster alternative to read.getAlignmentBlocks().size(); IntervalCleaner updated accordingly. asivache 2009-06-06 19:35:21 +0000
  • 92b054b71b moved another variant of numMismatches to AlignmentUtils asivache 2009-06-06 18:07:48 +0000
  • 7018dd1469 moved another variant of numMismatches to AlignmentUtils asivache 2009-06-06 18:05:29 +0000
  • e6aa058ec4 Tighten up error handling a bit. hanna 2009-06-06 03:40:50 +0000
  • ac5b7dd453 Fixed order-of-operations bug. hanna 2009-06-06 03:22:56 +0000
  • 819862e04e major restructuring of generalized variant analysis framework. Now trivally easy to add additional analyses. Easy partitioning of all analyses by features, such as singleton status. Now has transition/transversional bias, counting, dbSNP coverage, HWE violation, selecting of variants by presence/absense in dbs. Also restructured the ROD system to make it easier to add tracks. Also, added the interval track -- if you provide an interval list, then the system autoatmically makese this available to you as a bound rod -- you can always find out where you are in the interval at every site. Python scripts improved to handle more merging, etc, into population snps. depristo 2009-06-05 23:34:37 +0000
  • 400399f1b8 fixed (?) a bug in insertion realignment asivache 2009-06-05 22:04:37 +0000
  • 050d55cdb0 Basic graph support for testing. hanna 2009-06-05 21:04:01 +0000
  • 34bb43a6c8 Saw that one of the offsets needed to be changed from - 1 to -2 and changed the wrong damn offset. Fixed. hanna 2009-06-05 19:18:34 +0000
  • 4623a34ad3 Fix bug in realigning insertion cigar strings ebanks 2009-06-05 18:46:41 +0000
  • 199be46c36 changed the warning that is outputted when the GenomeLoc constructor can't find the given contig in the reference. aaron 2009-06-05 15:49:03 +0000
  • 092a754071 Make sure indel position from SW alignment is leftmost possible (and improve printouts) ebanks 2009-06-05 15:36:10 +0000
  • 37efd78c7e fixed the logger call so we get output that indicates this class generated the message aaron 2009-06-05 15:02:17 +0000
  • b323c58ef2 add a place to store the walker return value, along with a method to retrieve it aaron 2009-06-05 14:41:42 +0000
  • 36fb6ca3c5 Allow user to specify the compression to be used when writing out BAM files. Updated most of the walkers to reflect this change. Now it won't take forever to write BAMs! ebanks 2009-06-05 08:48:34 +0000
  • c1792de44f First pass at fixing the incorrect border-case behavior of the cleaner ebanks 2009-06-05 07:55:06 +0000
  • 9da04fd9ac Cleaned up error warning in case no PL groups are present. hanna 2009-06-05 03:14:17 +0000
  • 45eeefbb80 Deal with randomly occurring unmapped reads ebanks 2009-06-05 02:55:53 +0000
  • fdfc3abf80 Better handling for case where PL attribute is missing. hanna 2009-06-05 02:52:30 +0000
  • 2035d7dfd3 Revert some debug code in RecalQual.py. Make LogisticRegression easier to Ctrl-C out of. hanna 2009-06-05 01:53:48 +0000
  • 61ae00c7bf Lots of cleanup. hanna 2009-06-05 01:26:10 +0000
  • 9689bb3331 Very early draft of script integrating the covariant counting / logistic regression. Deleted some unused code and spurious debug info. hanna 2009-06-04 22:52:11 +0000
  • 109bef6c08 We're no longer in the read-dropping business. aaron 2009-06-04 22:37:51 +0000
  • 4d880477d6 Deal with ends of contigs ebanks 2009-06-04 20:09:53 +0000
  • 40bc4ae39a The building blocks for segmenting covariate counting data by read group. hanna 2009-06-04 19:55:24 +0000
  • 13be846c2a qualsAsInt argument for Pileup -- fixing stupid bug [again] depristo 2009-06-04 18:52:12 +0000
  • 97c8ff75dd qualsAsInt argument for Pileup -- fixing stupid bug depristo 2009-06-04 18:51:17 +0000
  • 9de3e58aa8 qualsAsInt argument for Pileup depristo 2009-06-04 18:37:39 +0000
  • 4d654f30d4 slightly improved error message printed upon failure to parse interval list file asivache 2009-06-04 18:24:43 +0000
  • bcc7bacba1 added List<Transcript> getTranscripts(); also more comments added asivache 2009-06-04 16:25:14 +0000
  • 67112c79a1 More robust individual genotypes to population script depristo 2009-06-04 00:12:31 +0000
  • b492192838 Pairwise SNP distance metrics now enabled depristo 2009-06-04 00:11:29 +0000
  • 8672ae6019 Now seeing results from the training data. There are still some critical problems in the quality of the output, but we're at least getting training output. hanna 2009-06-03 20:41:07 +0000
  • 4e41646c88 print out stats for Andrey ebanks 2009-06-03 17:45:35 +0000
  • dfe464cd81 Updated CovariateCounterWalker to be read group aware andrewk 2009-06-03 10:06:06 +0000
  • 7755476d36 Updated coverter to reflect change in contig ordering in Geli files andrewk 2009-06-03 10:05:28 +0000
  • 40af4f085c Adding some utilities to test unmapped reads aaron 2009-06-03 07:40:34 +0000
  • 080af519cb Added R script and uncommented a line in recal_qual.py andrewk 2009-06-03 03:15:45 +0000
  • b2eb724456 First commit of recalibration master control script for recalibrating quality scores. andrewk 2009-06-03 02:17:10 +0000
  • fa93661133 Eric wins the prize for pointing out that doubles weren't valid command-line arguments. Made all primitive types parseable as command-line arguments. hanna 2009-06-02 22:41:10 +0000
  • 107b5d73b5 The flagStatReadWalker generates the exact same statistical output as the samtools flagstat command, so the two outputs can be diff'ed. aaron 2009-06-02 21:23:56 +0000
  • 056fcdc31c Adding a script for diff'ing the output of samtools and the GATK for the whole genome and each individual chromosome. aaron 2009-06-02 21:19:39 +0000
  • 3998085e4b more and better python scripts for dealing with calls depristo 2009-06-02 20:37:19 +0000
  • a1218ef508 changed default value for failure output kcibul 2009-06-02 19:32:29 +0000
  • 7e7c83ddca fixing insidious bugs depristo 2009-06-02 18:33:45 +0000
  • 6e60cddfed A fix for the 'rod blows up when it hits a GenomeLoc outside the reference' issu e. Really a stopgap; error handling in the RODs needs to be addressed in a more comprehensive way. Right now, hasNext() isn't guaranteed to be correct. hanna 2009-06-02 18:14:46 +0000
  • ad5b057140 parameterized a bit more kcibul 2009-06-02 17:58:26 +0000
  • 587d07da00 Merged functionality of two python scripts into LogRegression.py, some clarity updates to covariate and regression java files. andrewk 2009-06-02 16:55:05 +0000
  • 82aa0533b8 added some more documentation to the GLF writer and it's supporting classes, and some other fixes aaron 2009-06-02 14:53:58 +0000
  • ae2eddec2d Improving, yet again, the merging of bam files depristo 2009-06-02 13:31:12 +0000
  • c4cb867d74 basic clustering of reads to reduce artifacts kcibul 2009-06-02 02:54:21 +0000
  • e712d69382 GLF writing support aaron 2009-06-01 21:30:18 +0000
  • 417f5b145e Strand test and misc touch-ups jmaguire 2009-06-01 17:13:21 +0000
  • fc91e3e30e equals signs can be important aaron 2009-06-01 16:56:21 +0000
  • 4edb33788b added a fix for a bug Andrew found aaron 2009-06-01 16:53:56 +0000
  • b7defeae83 Fix bug in unit tests created by new filter in TraversalEngine. hanna 2009-06-01 15:50:44 +0000
  • fc7320133c Cleaned up error when fasta index is missing. Code still throws an exception, but the message is more direct (no more 'error while micromanaging') and tells the user to run 'samtools faidx' to fix the issue. hanna 2009-06-01 15:34:38 +0000
  • f19d7abba9 Added geli compatibility mode to SingleSampleGenotyper, to enable easy linking to the geli2popsnps.py script depristo 2009-06-01 14:32:12 +0000
  • 543c68cdd8 First version of individual geli files to population SNPS depristo 2009-05-31 15:29:10 +0000
  • 6adef28b97 Now supports automatic merging by population depristo 2009-05-31 15:28:44 +0000
  • 4d6398cef9 a lot of people have been asking me for the equivalent of the old "PrintCoverage" command from Arachne. Even though I show them the pileup, and they agree that's more accurate/complete, they don't want to modify their scripts and/or write a translator. It was simple enough to write, so here it is. kcibul 2009-05-31 01:45:23 +0000
  • c04b67c969 Basic instrumentation support for the hierarchical microscheduler.x hanna 2009-05-29 22:19:27 +0000
  • c8347c3c94 set proper package name (...walkers.indels), remove couple of unused import statements asivache 2009-05-29 22:02:14 +0000
  • c549c34caa still in development and testing; kinda works asivache 2009-05-29 21:59:03 +0000
  • c252fec1bc synchronizing, no real changes asivache 2009-05-29 21:56:14 +0000
  • eafdba7300 more efficient implementation of line parsing, runs at least 1.5 times faster asivache 2009-05-29 21:09:06 +0000
  • 8761ab3aff Oops. IteratorPool was occasionally creating too many RODIterators in cases where some reference-ordered data was missing. Fixed by better tracking position of RODIterator. hanna 2009-05-29 21:00:31 +0000
  • d601548d53 added reallocate(int[] orig_array, int new_size) and int[] indexOfAll(String s, int ch); the former is self-explanatory, while the latter returns array of indices of all occurences of ch in the specified string asivache 2009-05-29 20:15:00 +0000
  • a1edb898ef Make criteria for determining whether to stop and merge inputs more sane. hanna 2009-05-29 18:08:18 +0000
  • fe3b843b65 intercept NullPointerException and rethrow it with (marginally) comprehensible error message when an attempt to get class source code location fails asivache 2009-05-29 15:56:55 +0000
  • e0803eabd9 enabled underlying filtering of zero mapping quality reads, vastly improves system performance depristo 2009-05-29 14:51:08 +0000
  • 1f93545c70 Always opt to merge dictionaries when creating a SAMFileHeaderMerger. hanna 2009-05-28 22:38:16 +0000
  • 0cf90b6f8a Tie into sequence merging code in the latest version of picard. hanna 2009-05-28 21:48:35 +0000
  • b43deda6c9 iterative changes to GLF files; also a test of checking-in over sshfs. aaron 2009-05-28 20:24:30 +0000
  • 5e8c08ee63 Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard... hanna 2009-05-28 20:13:01 +0000
  • 19f9ac2b05 Realign existing indels (from the aligner) to leftmost position ebanks 2009-05-28 04:56:51 +0000
  • aa17c4a468 Farewell, functionalj. You promised much, but you could not deliver. hanna 2009-05-28 01:35:49 +0000
  • d275c18e58 adding some objects we need for the GLF format. aaron 2009-05-27 22:32:25 +0000
  • ce6a0f522b First incarnation of the population-based SNP analysis tool. Also bug fixes throughout the GATK depristo 2009-05-27 22:02:24 +0000
  • a11bf0f43e Basic unit tests for ReferenceOrderedView, ShardDataProvider. Addressing GSA-25. hanna 2009-05-27 21:15:01 +0000
  • e533c64b8f Walker to pull out the reference for given intervals and emit them in fasta format ebanks 2009-05-27 18:39:09 +0000
  • 5c6163ecbf Removing the old reads traversal. aaron 2009-05-27 18:36:11 +0000