-- This new algorithm is essential to properly handle activity profiles that have many large active regions generated from lots of dense variant events. The new algorithm passes unit tests and passes visualize visual inspection of both running on 1000G and NA12878
-- Misc. commenting of the code
-- Updated ActiveRegionExtension to include a min active region size
-- Renamed ActiveRegionExtension to ActiveRegionTraversalParameters, as it carries more than just the traversal extension now
-- Previously we allowed band pass filter size to be specified along with the sigma. But now that sigma is controllable from walkers and from the command line, we instead compute the filter size given the kernel from the sigma, including all kernel points with p > 1e-5 in the kernel. This means that if you use a smaller kernel you get a small band size and therefore faster ART
-- Update, as discussed with Ryan, the sigma and band size to 17 bp for HC (default ART wide) and max band size of 50 bp
- walker simulates sequencing with different lengths to evaluate mapping/alignment biases relative to read length
- split : splits reads n-ways generating 2^n reads for each read of the same length.
- chop : chops the right end tail of the read creating 1 smaller read as if the sequencer stopped short.
- mate information is preserved for chopped reads, and re-indexed for split reads so that each split still points at the corresponding split on the mate.
- added systematic unit tests
GSATDG-23
-- Based on the new incremental activity profile
-- Unit Tested! Fixed a few bugs with the old band pass filter
-- Expand IncrementalActivityProfileUnitTest to test the band pass filter as well for basic properties
-- Add new UnitTest for BandPassIncrementalActivityProfile
-- Added normalizeFromRealSpace to MathUtils
-- Cleanup unused code in new activity profiles
-- The incremental version now processes active regions as soon as they are ready to be processed, instead of waiting until the end of the shard as in the previous version. This means that ART walkers will now take much less memory than previously. On chr20 of NA12878 the majority of regions are processed with as few as 500 reads in memory. Over the whole chr20 only 5K reads were ever held in ART at one time.
-- Fixed bug in the way active regions worked with shard boundaries. The new implementation no longer see shard boundaries in any meaningful way, and that uncovered a problem that active regions were always being closed across shard boundaries. This behavior was actually encoded in the unit tests, so those needed to be updated as well.
-- Changed the way that preset regions work in ART. The new contract ensures that you get exactly the regions you requested. the isActive function is still called, but its result has no impact on the regions. With this functionality is should be possible to use the HC as a generic assembly by forcing it to operate over very large regions
-- Added a few misc. useful functions to IncrementalActivityProfile
-- Required before I jump in an redo the entire activity profile so it's can be run imcrementally
-- This restructuring makes the differences between the two functionalities clearer, as almost all of the functionality is in the base class. The only functionality provided by the BandPassActivityProfile is isolated to a finalizeProfile function overloaded from the base class.
-- Renamed ActivityProfileResult to ActivityProfileState, as this is a clearer indication of its actual functionality. Almost all of the misc. walker changes are due to this name update
-- Code cleanup and docs for TraverseActiveRegions
-- Expanded unit tests for ActivityProfile and ActivityProfileState
- checkAllLicenses was concatenating all tests in one big && statement which is good for coding style, but bad for shell blog size limitation.
GSATDG-9
- CheckLicense.py script checks license of sources against a provided license file
- checkAllLicenses.csh script runs CheckLicense for all files in the repo with the appropriate license for each
GSATDG-9