3d6e738a60still under development. does not genotype yet, but walks and talks (counts overal coverage and indel variant occurences at every reference position
asivache
2009-06-09 00:10:31 +0000
127c321d0aCut over to 1kG version of fasta / reference. Updated doc with latest version of tool summary.
hanna
2009-06-08 21:11:44 +0000
58f7ae8628better filtering, plus deal with case where user doesn't input maxlength
ebanks
2009-06-08 18:44:29 +0000
f6e985d97fDocumentation for read quality recalibrator. We have to spend some time rethinking how to organize these mini-releases.
hanna
2009-06-08 16:54:39 +0000
b4ef16ced2extractIndels() now should deal correctly with soft- and hard-clipped bases
asivache
2009-06-08 16:04:49 +0000
a8a2d0eab9added support for the -M option in traversals.
aaron
2009-06-08 15:12:24 +0000
e2ed56dc96Add a MAX_READ_GROUPS sanity parameter.
hanna
2009-06-08 13:57:43 +0000
9f35a5aa32Insidious bug: clipped sequences (S cigar elements) where a) processed incorrectly; b) sometimes caused IntervalCleaner to crash, if such sequence occured at the boundary of the interval. The following inconsistency occurs: LocusWindow traversal instantiates interval reference stretch up to rightmost read.getAlignmentEnd(), but this does not include clipped bases; then IntervalCleaner takes all read bases (as a string) and does not check if some of them were clipped. Inside the interval this would cause counting mismatches on clipped bases, at the boundary of the interval the clipped bases would stick outside the passed reference stretch and index-out-of-bound exception would be thrown. THIS IS A PARTIAL, TEMPORARY FIX of the problem: mismatchQualitySum() is fixed, in that it does not count mismatches on clipped bases anymore; however, we do not attempt yet to realign only meaningful, unclipped part of the read; instead all reads that have clipped bases are assigned to the original reference and we do not attempt to realign them at all (we'd need to be careful to preserve the cigar if we wanted to do this)
asivache
2009-06-08 05:20:29 +0000
3a8219a469use knowledge from other reads to find a consensus
ebanks
2009-06-07 21:22:17 +0000
596773e6c6Cleanup.
hanna
2009-06-07 20:25:08 +0000
98396732baBug fixes for Andrey
depristo
2009-06-07 18:19:51 +0000
b48508a226indelRealignment() signature changed. The only difference about consensus sequences is that they are passed along with alignment cigars that start inside the sequence, while for 'conventional' reads cigar always starts at position 0 on the read. Logically, indelRealignment() should not know what 'consensus' is. Instead, now it receives an additional int parameter, start of the cigar on the 'read' sequence
asivache
2009-06-07 17:42:19 +0000
9eb38c0222mostly synchronizing with the main branch. Based on anecdotal evidence (too few examples in the data), realignment (shifting indel left across a repeat) works correctly on non-homonucleotide repeats
asivache
2009-06-07 16:39:16 +0000
c6634e3121cleaned up some code and minor bug fixes
ebanks
2009-06-07 03:14:21 +0000
99c105790bNow indelRealignment should be correct... The old version could only condense to the left homo-nucleotide indels. New version should be able to detect and shift left arbitrary repeated sequence (e.g. deletion of ATA after ATAATAATA will be shifted left to the first occurence of ATA on the ref! NOT THOROUGHLY TESTED YET, will test tonight../somaticIndels.pl --dir . --cutoff 100 -filter EXON --mode SOMATIC --condense 5 --format bed > 0883.indel.somatic.exon.100.bed
asivache
2009-06-06 23:54:07 +0000
3b4dc6e7b5added sequencePeriod(String seq, int minPeriod) - finds smallest period equal to or greater than minPeriod for the specified text string seq; this is a trivial (hopefully correct) back-of-the-envelope implementation for a well-known and well-studied problem; there should be more efficient algorithms in the wild
asivache
2009-06-06 23:05:24 +0000
40ac3b7816Inject read group into covars_out file's toString output. Continue fixing systematic bug in the code where flattenData is not joined to the read group.
hanna
2009-06-06 20:43:28 +0000
0bb4565798added AlignmentUtils.getNumAlignmentBlocks(read) - a faster alternative to read.getAlignmentBlocks().size(); IntervalCleaner updated accordingly.
asivache
2009-06-06 19:35:21 +0000
92b054b71bmoved another variant of numMismatches to AlignmentUtils
asivache
2009-06-06 18:07:48 +0000
7018dd1469moved another variant of numMismatches to AlignmentUtils
asivache
2009-06-06 18:05:29 +0000
e6aa058ec4Tighten up error handling a bit.
hanna
2009-06-06 03:40:50 +0000
ac5b7dd453Fixed order-of-operations bug.
hanna
2009-06-06 03:22:56 +0000
819862e04emajor restructuring of generalized variant analysis framework. Now trivally easy to add additional analyses. Easy partitioning of all analyses by features, such as singleton status. Now has transition/transversional bias, counting, dbSNP coverage, HWE violation, selecting of variants by presence/absense in dbs. Also restructured the ROD system to make it easier to add tracks. Also, added the interval track -- if you provide an interval list, then the system autoatmically makese this available to you as a bound rod -- you can always find out where you are in the interval at every site. Python scripts improved to handle more merging, etc, into population snps.
depristo
2009-06-05 23:34:37 +0000
400399f1b8fixed (?) a bug in insertion realignment
asivache
2009-06-05 22:04:37 +0000
050d55cdb0Basic graph support for testing.
hanna
2009-06-05 21:04:01 +0000
34bb43a6c8Saw that one of the offsets needed to be changed from - 1 to -2 and changed the wrong damn offset. Fixed.
hanna
2009-06-05 19:18:34 +0000
199be46c36changed the warning that is outputted when the GenomeLoc constructor can't find the given contig in the reference.
aaron
2009-06-05 15:49:03 +0000
092a754071Make sure indel position from SW alignment is leftmost possible (and improve printouts)
ebanks
2009-06-05 15:36:10 +0000
37efd78c7efixed the logger call so we get output that indicates this class generated the message
aaron
2009-06-05 15:02:17 +0000
b323c58ef2add a place to store the walker return value, along with a method to retrieve it
aaron
2009-06-05 14:41:42 +0000
36fb6ca3c5Allow user to specify the compression to be used when writing out BAM files. Updated most of the walkers to reflect this change. Now it won't take forever to write BAMs!
ebanks
2009-06-05 08:48:34 +0000
c1792de44fFirst pass at fixing the incorrect border-case behavior of the cleaner
ebanks
2009-06-05 07:55:06 +0000
9da04fd9acCleaned up error warning in case no PL groups are present.
hanna
2009-06-05 03:14:17 +0000
45eeefbb80Deal with randomly occurring unmapped reads
ebanks
2009-06-05 02:55:53 +0000
fdfc3abf80Better handling for case where PL attribute is missing.
hanna
2009-06-05 02:52:30 +0000
2035d7dfd3Revert some debug code in RecalQual.py. Make LogisticRegression easier to Ctrl-C out of.
hanna
2009-06-05 01:53:48 +0000
61ae00c7bfLots of cleanup.
hanna
2009-06-05 01:26:10 +0000
9689bb3331Very early draft of script integrating the covariant counting / logistic regression. Deleted some unused code and spurious debug info.
hanna
2009-06-04 22:52:11 +0000
109bef6c08We're no longer in the read-dropping business.
aaron
2009-06-04 22:37:51 +0000
4d880477d6Deal with ends of contigs
ebanks
2009-06-04 20:09:53 +0000
40bc4ae39aThe building blocks for segmenting covariate counting data by read group.
hanna
2009-06-04 19:55:24 +0000
9de3e58aa8qualsAsInt argument for Pileup
depristo
2009-06-04 18:37:39 +0000
4d654f30d4slightly improved error message printed upon failure to parse interval list file
asivache
2009-06-04 18:24:43 +0000
bcc7bacba1added List<Transcript> getTranscripts(); also more comments added
asivache
2009-06-04 16:25:14 +0000
67112c79a1More robust individual genotypes to population script
depristo
2009-06-04 00:12:31 +0000
b492192838Pairwise SNP distance metrics now enabled
depristo
2009-06-04 00:11:29 +0000
8672ae6019Now seeing results from the training data. There are still some critical problems in the quality of the output, but we're at least getting training output.
hanna
2009-06-03 20:41:07 +0000
4e41646c88print out stats for Andrey
ebanks
2009-06-03 17:45:35 +0000
dfe464cd81Updated CovariateCounterWalker to be read group aware
andrewk
2009-06-03 10:06:06 +0000
7755476d36Updated coverter to reflect change in contig ordering in Geli files
andrewk
2009-06-03 10:05:28 +0000
40af4f085cAdding some utilities to test unmapped reads
aaron
2009-06-03 07:40:34 +0000
080af519cbAdded R script and uncommented a line in recal_qual.py
andrewk
2009-06-03 03:15:45 +0000
b2eb724456First commit of recalibration master control script for recalibrating quality scores.
andrewk
2009-06-03 02:17:10 +0000
fa93661133Eric wins the prize for pointing out that doubles weren't valid command-line arguments. Made all primitive types parseable as command-line arguments.
hanna
2009-06-02 22:41:10 +0000
107b5d73b5The flagStatReadWalker generates the exact same statistical output as the samtools flagstat command, so the two outputs can be diff'ed.
aaron
2009-06-02 21:23:56 +0000
056fcdc31cAdding a script for diff'ing the output of samtools and the GATK for the whole genome and each individual chromosome.
aaron
2009-06-02 21:19:39 +0000
3998085e4bmore and better python scripts for dealing with calls
depristo
2009-06-02 20:37:19 +0000
a1218ef508changed default value for failure output
kcibul
2009-06-02 19:32:29 +0000
6e60cddfedA fix for the 'rod blows up when it hits a GenomeLoc outside the reference' issu e. Really a stopgap; error handling in the RODs needs to be addressed in a more comprehensive way. Right now, hasNext() isn't guaranteed to be correct.
hanna
2009-06-02 18:14:46 +0000
ad5b057140parameterized a bit more
kcibul
2009-06-02 17:58:26 +0000
587d07da00Merged functionality of two python scripts into LogRegression.py, some clarity updates to covariate and regression java files.
andrewk
2009-06-02 16:55:05 +0000
82aa0533b8added some more documentation to the GLF writer and it's supporting classes, and some other fixes
aaron
2009-06-02 14:53:58 +0000
ae2eddec2dImproving, yet again, the merging of bam files
depristo
2009-06-02 13:31:12 +0000
c4cb867d74basic clustering of reads to reduce artifacts
kcibul
2009-06-02 02:54:21 +0000
e712d69382GLF writing support
aaron
2009-06-01 21:30:18 +0000
417f5b145eStrand test and misc touch-ups
jmaguire
2009-06-01 17:13:21 +0000
fc91e3e30eequals signs can be important
aaron
2009-06-01 16:56:21 +0000
4edb33788badded a fix for a bug Andrew found
aaron
2009-06-01 16:53:56 +0000
b7defeae83Fix bug in unit tests created by new filter in TraversalEngine.
hanna
2009-06-01 15:50:44 +0000
fc7320133cCleaned up error when fasta index is missing. Code still throws an exception, but the message is more direct (no more 'error while micromanaging') and tells the user to run 'samtools faidx' to fix the issue.
hanna
2009-06-01 15:34:38 +0000
f19d7abba9Added geli compatibility mode to SingleSampleGenotyper, to enable easy linking to the geli2popsnps.py script
depristo
2009-06-01 14:32:12 +0000
543c68cdd8First version of individual geli files to population SNPS
depristo
2009-05-31 15:29:10 +0000
6adef28b97Now supports automatic merging by population
depristo
2009-05-31 15:28:44 +0000
4d6398cef9a lot of people have been asking me for the equivalent of the old "PrintCoverage" command from Arachne. Even though I show them the pileup, and they agree that's more accurate/complete, they don't want to modify their scripts and/or write a translator. It was simple enough to write, so here it is.
kcibul
2009-05-31 01:45:23 +0000
c04b67c969Basic instrumentation support for the hierarchical microscheduler.x
hanna
2009-05-29 22:19:27 +0000
c8347c3c94set proper package name (...walkers.indels), remove couple of unused import statements
asivache
2009-05-29 22:02:14 +0000
c549c34caastill in development and testing; kinda works
asivache
2009-05-29 21:59:03 +0000
c252fec1bcsynchronizing, no real changes
asivache
2009-05-29 21:56:14 +0000
eafdba7300more efficient implementation of line parsing, runs at least 1.5 times faster
asivache
2009-05-29 21:09:06 +0000
8761ab3affOops. IteratorPool was occasionally creating too many RODIterators in cases where some reference-ordered data was missing. Fixed by better tracking position of RODIterator.
hanna
2009-05-29 21:00:31 +0000
d601548d53added reallocate(int[] orig_array, int new_size) and int[] indexOfAll(String s, int ch); the former is self-explanatory, while the latter returns array of indices of all occurences of ch in the specified string
asivache
2009-05-29 20:15:00 +0000
a1edb898efMake criteria for determining whether to stop and merge inputs more sane.
hanna
2009-05-29 18:08:18 +0000
fe3b843b65intercept NullPointerException and rethrow it with (marginally) comprehensible error message when an attempt to get class source code location fails
asivache
2009-05-29 15:56:55 +0000
e0803eabd9enabled underlying filtering of zero mapping quality reads, vastly improves system performance
depristo
2009-05-29 14:51:08 +0000
1f93545c70Always opt to merge dictionaries when creating a SAMFileHeaderMerger.
hanna
2009-05-28 22:38:16 +0000
0cf90b6f8aTie into sequence merging code in the latest version of picard.
hanna
2009-05-28 21:48:35 +0000
b43deda6c9iterative changes to GLF files; also a test of checking-in over sshfs.
aaron
2009-05-28 20:24:30 +0000
5e8c08ee63Update to latest version of picard. Change imports in all classes dependent on picard public from import edu.mit.broad.picard... to import net.sf.picard...
hanna
2009-05-28 20:13:01 +0000
19f9ac2b05Realign existing indels (from the aligner) to leftmost position
ebanks
2009-05-28 04:56:51 +0000
aa17c4a468Farewell, functionalj. You promised much, but you could not deliver.
hanna
2009-05-28 01:35:49 +0000
d275c18e58adding some objects we need for the GLF format.
aaron
2009-05-27 22:32:25 +0000
ce6a0f522bFirst incarnation of the population-based SNP analysis tool. Also bug fixes throughout the GATK
depristo
2009-05-27 22:02:24 +0000
a11bf0f43eBasic unit tests for ReferenceOrderedView, ShardDataProvider. Addressing GSA-25.
hanna
2009-05-27 21:15:01 +0000
e533c64b8fWalker to pull out the reference for given intervals and emit them in fasta format
ebanks
2009-05-27 18:39:09 +0000
5c6163ecbfRemoving the old reads traversal.
aaron
2009-05-27 18:36:11 +0000