891f4c2bd9up on the wiki, removing it from the repo. Ignore my last commit
aaron
2009-07-01 15:42:02 +0000
05c5659053This document is now up as content on the wiki, so I'm removing it from svn.
aaron
2009-07-01 15:40:32 +0000
d58eeb7539Don't cry wolf: only one warning is now emitted, instead of tons of warnings.
aaron
2009-07-01 13:50:37 +0000
a3e0ec20c4Kill the TraverseByLocusWindows traversal. TraverseLocusWindows will take its place.
hanna
2009-07-01 13:46:35 +0000
74e9bb46b4Contents of the Hello World doc are now in the wiki.
hanna
2009-06-30 22:32:56 +0000
93da64db10Update naming for consistency.
hanna
2009-06-30 22:03:21 +0000
e93f751bd7First step in replacing the Hello, World! document. Revamped the HelloWalker and checked it into the source tree, created a special build file for it, and added it to the packaging tool.
hanna
2009-06-30 21:59:54 +0000
8d3dc57c3dCommit to emit in sorted order so we don't have to use /tmp
ebanks
2009-06-30 19:47:15 +0000
f5cba5a6bbFixed genome loc to be immutable, the only way to now change it's values is through the GenomeLocParser.
aaron
2009-06-30 19:17:24 +0000
455275996fAdded contents to the wiki.
hanna
2009-06-30 18:29:46 +0000
177d6d00b8added setContigIndex(). NOTE: both setContig() and setContigIndex are UNSAFE as one does not automatically involve updating the other, and there's also no validation
asivache
2009-06-30 17:40:37 +0000
9fca79ed62Read groups are now sorted in the output data, for convenience
depristo
2009-06-30 16:50:44 +0000
fe421e5712All IntelliJ best practices info is now on the wiki.
hanna
2009-06-30 16:45:52 +0000
08df4771c8count X/N/etc. as mismatches for the NM attribute in the BAMs
ebanks
2009-06-30 16:08:55 +0000
d412c5dc2fUpdated to use SecondaryBaseAnnotator class.
kiran
2009-06-30 16:08:43 +0000
e3cdf7ef4bA single class that can be handed reads for training and basecalling. When in training mode, we accumulate no more than 10000 reads and always replace the lowest-quality reads with superior quality reads. Thus, the training set always contains 10000 of the best reads available. After training is complete, the class can be interrogated to return the SQ tag for a given RawRead object.
kiran
2009-06-30 16:03:15 +0000
74cc7136f7All info from the user manual is now in the wiki. Deleting.
hanna
2009-06-30 15:29:59 +0000
ddf4003536Updates to picard public / private and sam.
hanna
2009-06-30 14:50:55 +0000
8aa3b65e7ffix to guarantee emission in sorted order
ebanks
2009-06-30 13:48:41 +0000
03f8177a53When you get the reference string for a read that is mapped partially off the end of a contig, the string is masked with X's for base positions without corresponding reference positions.
aaron
2009-06-29 20:51:55 +0000
1dcababad1a fix to make the test run
aaron
2009-06-29 20:24:32 +0000
a17bf145f6fix to respond to the change in IndelLikelihood constructor.
jmaguire
2009-06-29 19:05:33 +0000
7ecc43e9a7Fixed subtle null ptr exception discovered by Kiran. Now deals with the rare situation where you have only say Q28 bases at dbSNP sites, so you fail in the Table recalibration step with a null pointer error into the data structure indexed by quality score. If you are Q score above those seen before you aren't modified in any way.
depristo
2009-06-29 18:57:42 +0000
95e2ae0171Deal with reads whose ends are aligned off the end of a chromosome. Includes update to ignore non-ATCG bases (not just 'N') (Also, create a BWA dir for future work)
ebanks
2009-06-29 16:50:05 +0000
65a788f18aAdded a ROD (SangerSNP) for parsing the Sanger's chr20 pilot1 SNP calls. Some doodling around with indel calling in an EM context.
jmaguire
2009-06-29 16:32:12 +0000
ceeeec13b8Computes a vector of numbers of reads falling into successive intervals of specified length (e.g. numbers of reads per every 1Mbase)
asivache
2009-06-29 16:12:21 +0000
3bacb3db03updated some defaults
ebanks
2009-06-26 19:28:05 +0000
eb74b16e39updated what constitutes removing entropy
ebanks
2009-06-26 18:29:00 +0000
d7d4298917Some files to support generic genotype outputing
aaron
2009-06-26 15:43:41 +0000
1a97c86f95don't crash when an unmapped read is encountered, just write it into the output file, it should be ok
asivache
2009-06-26 15:33:59 +0000
491ed70b44TraverseByLocusWindow -- asstd bug fixes.
hanna
2009-06-25 22:51:38 +0000
5289230eb8Version 0.2.1 (released) of the TableRecalibrator
depristo
2009-06-25 22:50:55 +0000
73caf5db15This is, strictly speaking, NOT a GATK module. Standalone, picard-level executable except that it uses couple of gatk utils (GenomeLoc). Remaps alignments from cutom reference (such as transcritome, hyb-sel etc) onto the 'master' reference
asivache
2009-06-25 22:04:18 +0000
ee2af3b423I committed this too soon... reverting...
kiran
2009-06-25 20:49:12 +0000
ad3a3aa350First pass at passing lists of files / lists of interval arguments work. Note that the interval ROD system will throw up its hands and not deal with intervals at all if multiple interval files are passed in (see JIRA GSA-95).
hanna
2009-06-25 20:44:23 +0000
23680a9a16Replaced an expensive sort with an inexpensive direct computation.
kiran
2009-06-25 20:25:12 +0000
83816fb801Stop using the annoying refIterator (temp change until new traversal is green lighted)
ebanks
2009-06-25 20:05:39 +0000
0c3aabd1c5logger output should be less verbose by default. Also fixed a printout in my read validation walker
aaron
2009-06-25 19:47:29 +0000
11d83ac7d0pushing up to test on unix box
kcibul
2009-06-25 19:00:48 +0000
0a16519aa2a couple of additions to the tests, plus a change to the artificial resource pool to support the queryContained flag
aaron
2009-06-25 18:30:32 +0000
2c97c5e873Compute a simple histogram of depth of coverage.
jmaguire
2009-06-25 18:30:11 +0000
102b38c055Sketch of new version of TraverseByLocusWindow, and a flag to conditionally turn it on.
hanna
2009-06-25 18:20:56 +0000
4e04370f14forgot a file
aaron
2009-06-25 17:56:17 +0000
5b1c23a7f2changes to fix and test the interval based traversals
aaron
2009-06-25 17:54:15 +0000
3b24264c2bincorporating skew check, further output of metrics
kcibul
2009-06-25 16:01:07 +0000
ea2426dcd0one more change needed to commit
ebanks
2009-06-25 15:09:53 +0000
347608cfe0remove hacked traversal in preparation for move to Matt's new one
ebanks
2009-06-25 14:32:05 +0000
940d75171aBig cleaner changes: 1. Added a Walker to merge intervals before cleaning 2. (Almost) all Walkers can filter out 454 reads (and do by default) 3. Got rid of -all command and related pieces (time to switch to CleanedReadsInjector)
ebanks
2009-06-25 14:31:24 +0000
3cb6d7048edon't freak out if two reference intervals a custom contig is built of are strictly adjacent; instead politely warn user that her data suck and proceed
asivache
2009-06-24 19:08:10 +0000
d4f3ca1a10A utility class for keeping the mapping from 'custom' reference (e.g. transcriptome) onto the 'master' reference (e.g. whole genome), and for remapping SAM records from the former onto the latter. It's Arachne's BaitMultiMap, pretty much
asivache
2009-06-24 18:16:15 +0000
69dc502174I forgot that this depends on BoundedScoringSet.
kiran
2009-06-24 17:18:53 +0000
a9c30c5fccadded -nosort cmdline flag; if specified, the output writer does not attempt to sort reads on the fly (sorting involves use of sorting collection backed up by temporary disk storage and can lead to crashes if temp size is low and/or filesystem is not behaving). Output can be later sorted externally by samtools
asivache
2009-06-24 15:58:00 +0000
7b5d8d7604Changed the intensities array order from cycle,channel to channel,cycle. This, I'm told, is a far more efficient allocation strategy.
kiran
2009-06-24 15:41:06 +0000
3112302ec9A priority-queue-like container that allows you to add a specified number of elements. When the limit has been reached, new additions replace the lower scoring elements.
kiran
2009-06-24 15:39:47 +0000
0a50f2e160Updated and near final version of tabular recalibration system. Uses 'yates' correction for low-occupancy quality bins. Faster and more robust handling of input and output
depristo
2009-06-24 03:52:12 +0000
caf5aef0f8Much improved python analysis routines, as well as easier / more correct merging utility. Better R scripts, which now close recalibration data by the confidence of the quality score itself
depristo
2009-06-24 01:12:35 +0000
ef546868bfPooling of unmapped reads -- improves runtime of files with tons of unmapped reads by an order of magnitude. Desperately needs cleanup.
hanna
2009-06-23 23:48:06 +0000
dfa2efbcf5not crashing when refseq annotation track is not requested is a nice added feature
asivache
2009-06-23 22:52:40 +0000
1339f3f3e3make refseq annotation file an optional argument; if specified, indels will be annotated as genomic/utr/intron/coding (accidentally appearing 'unknowns' probably mean that there's something wrong with refseq annotations?)
asivache
2009-06-23 18:17:03 +0000
9c0dba6979Some quick documentation and typo changes
aaron
2009-06-23 13:40:13 +0000
6b5560e1e9Fairly detailed first pass at a README for the MSA realigner
ebanks
2009-06-23 01:54:38 +0000
630d9e6a37Fixed a typo.
kiran
2009-06-22 21:37:46 +0000
8b4d0412caChanged the duplicate traversal over to the new style of traversal and plumbed into the genome analysis engine. Also added a CountDuplicates walker, to validate the engine.
aaron
2009-06-22 21:11:18 +0000
4a92a999a0made the constructors protected. Protected also mean package-protected, so other methods in the utils class can call these constructors (mainly the parser), as well as any inheriting classes. Also fixed some Intellij suggested clean-ups and documentation
aaron
2009-06-22 16:01:59 +0000
9e25229014use better entropy threshold and don't print out "new" SNPs (since they're just an antrifact of the low (arbitrary) threshold
ebanks
2009-06-22 15:30:08 +0000
bcb64d92e9Aaron: 1, GenomeLoc: 0. I changed our GenomeLoc class, seperating the creation of a genome loc (with the reference setup) to a parser class. GenomeLoc now just represents the actual genomic postion. The constructors are now package-protected (to enforce using the parser), but we may want to expose some constructors in the future.
aaron
2009-06-22 14:39:41 +0000
26eb362f52Added novel / known split to variant eval. That is, emits all of the standard analyses on SNP partitioned into those known in the provided known db and those novel. Also fixed problem with counting bases within subsets
depristo
2009-06-21 21:27:40 +0000
d3f0c51944longer update times so we don't overwhelm when running genome-wide
depristo
2009-06-21 14:10:02 +0000
a21c2a7e48don't make mapping quality too high
ebanks
2009-06-21 04:51:42 +0000
686c8133edmassive change in the way the cleaner works, mostly revolving around the fact that we no longer trust indels from the alignments (although we do use it as a good alternate consensus possibility). Other changes include better "greedy mode" performance and allowing the user to have just the cleaned reads themselves be printed out (mostly for Matt's CleanedReadInjector).
ebanks
2009-06-21 03:56:59 +0000
c9e6cb72e1Major improvements to python analysis code -- now computes a host of statistics about quality scores from the recal_data.csv file emitted by countcovariates. Includes average Q scores, medians, modes, stdev, coefficients of variation, RSME, and % bases > q10, q20, q30. Can finally quantify and compare the improvement of quality score recalibration.
depristo
2009-06-20 19:50:37 +0000
9e26550b0dApprach v2. Added python analysis script, so java no longer must be used to analyses quality score data. About to refactor out lots of unneeded code
depristo
2009-06-20 16:00:23 +0000
dde52e33ebCleanup of the cleaned read injector based on Eric's feedback.
hanna
2009-06-19 22:04:47 +0000
a0a3cf2f9fVariantFiltrationWalker can now apply specified exclusion tests after the feature tests. For a given variant, all reasons for exclusions are printed to screen.
kiran
2009-06-19 21:12:01 +0000
8ac40e8e2dUpdated version of the recalibration tool
depristo
2009-06-19 17:45:47 +0000
9a0151b7e1Added an option to list all available feature classes and exit.
kiran
2009-06-19 00:00:12 +0000
ed7afd8b70Added javadocs. Now throws an exception if an unknown feature is specified. General cleanup.
kiran
2009-06-18 23:28:38 +0000
284fd6a5fbVariantFiltrationWalker now inspects its parent package and determines the list of features that can be applied. Command-line specification of filters to run look at the simple names of these features and do a case-insensitive match to determine which features to apply. A new verbose mode allows the user to see how the likelihoods are changing with the application of each subsequent feature.
kiran
2009-06-18 22:45:36 +0000
0a0ef573f7Methods for finding classes given a path and finding classes that implement a given interface. This stuff was mostly copied from private methods in WalkerManager, so there's some code redundancy. At some point, those calls could be replaced with these.
kiran
2009-06-18 22:43:19 +0000
d748c85dc4Cleaned code and reorganized -- moving in the right direction for v2
depristo
2009-06-18 22:28:34 +0000
9fe330cdf1Bump picard and sam to latest version.
hanna
2009-06-18 21:59:40 +0000
af7a759ba4Convert the somatic coverage tool to output from the packaging tool rather than from the dist target.
hanna
2009-06-18 21:29:30 +0000
1bca144119Moving things around
depristo
2009-06-18 21:06:46 +0000
ca8a3bd85eAnother temp checking for rearranging things
depristo
2009-06-18 21:04:36 +0000
3c40db260dAdded REFERENCE_BASES required annotation for performance
depristo
2009-06-18 21:03:57 +0000
03fe166994Wrote a public static version of loadFirstNReasonableReadsTrainingSet() so Alec can call it.
kiran
2009-06-18 20:18:17 +0000
a4fa02f11cMoved output outside of for loop so I don't have 10 different versions of the same variant (though, now that I think of it, that's not necessarily a terrible thing for debugging...
kiran
2009-06-18 19:59:26 +0000
768a16e791An experimental, tile-parallel version of the secondary base annotator.
kiran
2009-06-18 19:58:09 +0000
e26df45e8eDifferent features can now be specified by repeatedly supplying the -F "featurename:arguments" option.
kiran
2009-06-18 18:45:03 +0000
17a5b50ea4Script that aligns paired-end BAMs using BWA.
andrewk
2009-06-18 18:14:58 +0000