-- Based on the new incremental activity profile
-- Unit Tested! Fixed a few bugs with the old band pass filter
-- Expand IncrementalActivityProfileUnitTest to test the band pass filter as well for basic properties
-- Add new UnitTest for BandPassIncrementalActivityProfile
-- Added normalizeFromRealSpace to MathUtils
-- Cleanup unused code in new activity profiles
-- The incremental version now processes active regions as soon as they are ready to be processed, instead of waiting until the end of the shard as in the previous version. This means that ART walkers will now take much less memory than previously. On chr20 of NA12878 the majority of regions are processed with as few as 500 reads in memory. Over the whole chr20 only 5K reads were ever held in ART at one time.
-- Fixed bug in the way active regions worked with shard boundaries. The new implementation no longer see shard boundaries in any meaningful way, and that uncovered a problem that active regions were always being closed across shard boundaries. This behavior was actually encoded in the unit tests, so those needed to be updated as well.
-- Changed the way that preset regions work in ART. The new contract ensures that you get exactly the regions you requested. the isActive function is still called, but its result has no impact on the regions. With this functionality is should be possible to use the HC as a generic assembly by forcing it to operate over very large regions
-- Added a few misc. useful functions to IncrementalActivityProfile
-- Required before I jump in an redo the entire activity profile so it's can be run imcrementally
-- This restructuring makes the differences between the two functionalities clearer, as almost all of the functionality is in the base class. The only functionality provided by the BandPassActivityProfile is isolated to a finalizeProfile function overloaded from the base class.
-- Renamed ActivityProfileResult to ActivityProfileState, as this is a clearer indication of its actual functionality. Almost all of the misc. walker changes are due to this name update
-- Code cleanup and docs for TraverseActiveRegions
-- Expanded unit tests for ActivityProfile and ActivityProfileState
- checkAllLicenses was concatenating all tests in one big && statement which is good for coding style, but bad for shell blog size limitation.
GSATDG-9
- CheckLicense.py script checks license of sources against a provided license file
- checkAllLicenses.csh script runs CheckLicense for all files in the repo with the appropriate license for each
GSATDG-9
- Your commit will now fail gracefully with an error message if you mess up the license system
- Your file will be preserved (unmodified) if you fail the commit process
- Error message should be indicative of the error you need to fix (usually missing package information)
Set your pre-commit hook as a symlink to be automatically updated by new pushes with :
ln -s private/shell/pre-commit .git/hooks/
GSATDG-18 #resolve
This hook will automatically add / fix the license information in all files you commit to the repo.
To activate it, copy it to your hooks directory :
cp private/shell/pre-commit .git/hook/
Now everytime you commit, you will have all your java and scala files automatically updated.
GSATDG-5 GSATDG-7 GSATDG-8 #resolve
Testing for moltenized output, and for genotype-level filtering. This tool is now fully functional. There are three todo items:
1) Docs
2) An additional output table that gives concordance proportions normalized by records in both eval and comp (not just total in eval or total in comp)
3) Code cleanup for table creation (putting a table together the way I do takes -way- too many lines of code)
2. While making the previous fix and unifying FS for SNPs and indels, I noticed that FS was slightly broken in the general case for indels too; fixed.
3. I also fixed a minor bug in the Allele Biased Downsampling code for reduced reads.
I've resigned myself instead to create a mapping from Allele to Haplotype. It's cheap so not a big deal, but really shouldn't be necessary.
Ryan and I are talking about refactoring for GATK2.5.