Generic infrastructure for quantizing quality scores
-- Just infrastructure at this point (but with UnitTests!). -- Capable of taking a histogram of quality scores and a target number of levels (8 for example), and mapping the full range of input quality scores down to only 8. -- The selected quality scores are chosen to minimize the miscalibration rate of the resulting bins. I believe this adaptive approach is vastly better than the current systems being developed by EBI and NCBI -- This infrastructure is designed to work with BQSRv2. I envision a system where we feed in the projected empirical quality score distribution from the BQSRv2 table, compute the required deleveling for each of the B, I, and D qualities, and on the fly emit calibrated, compressed quality scores. -- Note the algorithm right now for determining the best intervals is both greedy (i.e., will miss the best overall choice) and potentially extremely slow. But it is enough for me to play with.
This commit is contained in:
parent
ba71b0aee4
commit
914c23da51