* Added reference coordinate based hard clipping functions. This allows you to set a hard cut on where you need the read to be trimmed despite indels.
* soft clipping was messing up cigar string if there was already a hard clip at the beginning of the read. Fixed.
* hard clipping now works with previously hard clipped reads.
* Hard clipping was wrongfully hard clipping unmapped reads while soft clipping then hard clipping mapped reads. Now we throw exception if we try to hard/soft clip unmapped reads and use the soft->hard clip procedure fore every mapped read.
* Interval containment needed a <= and >= to make sure it caught the borders right.
reads are clipped in map() and now we cover almost all cases. Left behind the case where the read stretches through two intervals. This will need special treatment later.
-- DocString function for types that create default outputs "stdout"
-- RodBinding now creates a makeUnbound default value automatically for you if your RodBinding isn't required
-- Removed warning about sparse help from TextFormattingUtils
New class handles (vastly more cleanly) the db of tribble codecs, features, and names for use throughout the GATK.
Added SelfScopingFeatureCodec interface that allows a FeatureCodec to examine a file and determine if the file can be parsed. This is the first step towards allowing the GATK to dynamically determine the type of a RodBinding.
It is outputting a VCF with the 'second best guess' for the alternate allele correctly. Annotations are added at the pool level, but may get overwritten at the lane and site level. Still need to implement the merging of the the annotations at higher levels.
-- Cleaning up old interface to RMDT, docs and contracts added
-- Proper type checking for RodBinding for cases where the Tribble type isn't found or is the wrong type
- Floating point column widths are measured correctly
- Using fixed width columns instead of white space separated which allows spaces embedded in cell values
- Legacy support for parsing white space separated v0.1 tables where the columns may not be fixed width
- Enforcing that table descriptions do not contain newlines so that tables can be parsed correctly
Replaced GATKReportTableParser with existing functionality in GATKReport
-- support for explicit naming of bindings (-X:name,type x)
-- support for automatic naming of bindings in lists (-X:vcf foo.vcf -X:vcf bar.vcf will generate internal names X and X2)
-- ParserEngineUnitTest expanded to cover all of the Rodbinding cases
-- RodBindingUnitTest tests all of the low-level accessors
-- Parsing engine throws UserExceptions when bad bindings are provided on the command line
initial size for a Java hash table (HashMap, HashSet, etc.) given an expected
maximum number of elements. The optimum size is the smallest size that's
guaranteed not to result in any rehash / table-resize operations.
Example Usage:
Map<String, Object> hash = new HashMap<String, Object>(Utils.optimumHashSize(expectedMaxElements));
I think we're paying way too heavy a price in unnecessary rehash operations across
the GATK. If you don't specify an initial size, you get a table of size 16 that gets
completely rehashed and doubles in size every time it becomes 75% full. This means you
do at least twice as much work as you need to in order to populate your table:
(n + n/2 + n/4 + ... 16 ~= (1 + 1/2 + 1/4...) * n ~= 2 * n
Calls are now made based on the likelihood AC model. Two filters are applied: minQual and minPower. Output is now a VCF file with the variant context. It's now called the gatk's PoolCaller, no longer Replication Validation framework. Lots of testing ensue....
*New style.css, new template for the walker auto-generated html. Short description is no longer repeated in the long description of the walker.
*Updated DiffObjectsWalker and ContigStatsWalker as "reference" documented walkers.