* Implements a transition-aware alignment scoring scheme and configuration presets for ICLR
* Fix to enable use of general scoring matrix in ksw as suggested by lh3
---------
Co-authored-by: koadman <>
Give the CIGAR constants names to clarify the code. So that ksw2.h
remains self-contained, define KSW_* versions of the CIGAR operators
it needs for use within ksw2.h. Other code should in general use the
full set of MM_CIGAR_* constants in minimap.h.
Define MM_CIGAR_STR to the full string of CIGAR operators (including
the 'B' operator as well) and use it throughout the C code.
It would be possible to use it from the Cython code too, but it's easier
to keep that as a Cython string literal to avoid adding extra runtime
code to handle locale conversion.
We may use a large --end-bonus to mimic end-to-end alignment. In the short-read
mode, the candidate alignment region may be out of the band, which leads to
truncated alignment.
Fix the logic that calculates the number of CIGAR entries when
match "M" entries are expanded into "=" and "X". The number
of entries depends not on the number of mismatches but rather
on the number of transitions between "=" to "X".
When aligning long reads, we would prefer to align through low-quality
regions. This requires a large Z-drop threshold. However, to find small
inversions, we need to use a small Z-drop. This commit address this
conflict with two Z-drop thresholds. When Z-drop exceeds the smaller
threshold, we perform a local alignment to check if there is a potential
inversion. If there is one, we break the alignment; otherwise we break
the alignment only if Z-drop excess the larger threshold.
This commit also fixes a bug that reported wrong coordinates when the
inversion is on the forward strand (#112).