Commit Graph

163 Commits (ad177f21659e6f7c88005215fdcbb3e4174210f3)

Author SHA1 Message Date
zzh ad177f2165 一些自动格式更改,添加一些注释等 2023-08-10 15:56:06 +08:00
Aaron Darling ace990c381
Illumina Complete Long Read presets (#1069)
* Implements a transition-aware alignment scoring scheme and configuration presets for ICLR

* Fix to enable use of general scoring matrix in ksw as suggested by lh3

---------

Co-authored-by: koadman <>
2023-06-04 11:06:15 -04:00
Heng Li 5e7242303c r1164: changed the syntax of -J 2023-04-07 22:54:33 -04:00
Heng Li 6c2cbf7903 miniprot-like splice model
slightly worse on iso-seq and slightly better on direct-RNA
2022-10-06 09:10:17 -04:00
Heng Li 8a1d52bcbe r1094: for --split-prefix update max_dp at the end 2021-08-06 21:40:43 -04:00
Heng Li 1a8373bb84 dev-r1080: fixed negative dp_max 2021-07-18 21:07:14 -04:00
Heng Li 161ae7ff73 dev-r1079: per-read error rate
more tuning needed
2021-07-18 20:38:53 -04:00
Heng Li 8a6edab847 dev-r1078: decoupling ranking penalty 2021-07-18 16:22:48 -04:00
Heng Li 2546999639 dev-r1076: log gap penalty 2021-07-17 18:23:59 -04:00
Heng Li 5f449c5cae fixed potential integer overflows 2021-07-16 17:20:05 -04:00
Heng Li b046052d82 Merge branch 'master' into utec 2021-07-16 13:32:47 -04:00
John Marshall 260a68d232 Use #defines for CIGAR operators in C code
Give the CIGAR constants names to clarify the code. So that ksw2.h
remains self-contained, define KSW_* versions of the CIGAR operators
it needs for use within ksw2.h. Other code should in general use the
full set of MM_CIGAR_* constants in minimap.h.
2021-07-02 13:03:03 -04:00
John Marshall 177eef259d Use the full MIDNSHP=X string whenever printing CIGAR strings
Define MM_CIGAR_STR to the full string of CIGAR operators (including
the 'B' operator as well) and use it throughout the C code.

It would be possible to use it from the Cython code too, but it's easier
to keep that as a Cython string literal to avoid adding extra runtime
code to handle locale conversion.
2021-07-02 13:03:03 -04:00
Heng Li 4f91558160 r1048: rescue long gaps 2021-05-24 16:09:09 -04:00
Heng Li 827ca4b461 r1012: fixed an off-by-one bug; resolves #489 2021-04-07 23:31:31 -04:00
Armin Töpfer c9874e2dc5 Initialize r->p if ez->zdropped 2020-06-12 09:22:18 -04:00
Heng Li da7109fd29 r985: optionally report cs/cg on the query strand
PAF only; not well tested
2020-04-21 12:37:35 -04:00
Heng Li a7a01fe5bd r973: fixed compiling errors caused 2020-01-21 10:43:31 -05:00
Heng Li eb3ed6993d support ALT mapping 2020-01-21 09:17:50 -05:00
Heng Li 69af86657e r935: fixed a cigar like 5I6D7I; resolved #392 2019-04-30 21:35:24 -04:00
Heng Li be171aa2dc implemented in exts; testing is the next 2019-04-28 16:47:12 -04:00
Heng Li cf2bae6e9b r904: fixed a corner-case segfault. Resolves #307. 2019-01-10 09:57:05 -05:00
Heng Li 83a8ee7038 r888: fixed incorrect CIGAR when --eqx in use
This was caused by mm_fix_cigar() which may change query/target offset in very
rare cases. Generating EQX has to beware of this change.

Resolves #266
2018-11-18 14:22:29 -05:00
Heng Li 88c421e8de r881: a recent change reduces sr accuracy 2018-11-05 22:03:59 -05:00
Heng Li 13981404e2 r876: skip DP if taking too much RAM (#259) 2018-11-05 11:43:10 -05:00
Heng Li 377c7099a8 r858: fixed a bug; resolves #254 2018-10-22 22:47:11 -04:00
Heng Li d04ac068fd r852: a minor when large --end-bonus is in use
We may use a large --end-bonus to mimic end-to-end alignment. In the short-read
mode, the candidate alignment region may be out of the band, which leads to
truncated alignment.
2018-10-15 21:28:27 -04:00
Heng Li 5ab6538757 r822: added option --no-end-flt 2018-08-05 19:42:12 -04:00
Heng Li 4b707aac92 working with toy examples 2018-07-15 10:55:00 -04:00
Heng Li 951c0d1d35 apparently mm_append_cigar() wastes some memory 2018-07-14 23:47:44 -04:00
Heng Li 66674afd09 r794: fixed a bug in seed filtering 2018-06-20 10:26:29 -04:00
Heng Li 7e6e8ca73f r792: fixed -Wextra warnings and resolved #184 2018-06-19 15:26:58 -04:00
Aaron Wenger 3d3bcc29a8 Fix CIGAR reallocation with --eqx
Fix the logic that calculates the number of CIGAR entries when
match "M" entries are expanded into "=" and "X".  The number
of entries depends not on the number of mismatches but rather
on the number of transitions between "=" to "X".
2018-06-19 14:37:41 -04:00
Heng Li 154d2caf5b r784: support the =/X CIGAR operators (#156) 2018-05-30 16:11:22 -04:00
Heng Li a3afeec0b2 r783: reverted to r781 (#155) 2018-05-30 15:25:34 -04:00
Heng Li 9f4309c376 r777: avoid skipping too many seeds 2018-05-11 10:25:18 -04:00
Heng Li 881b4ca3a2 r774: Merge branch 'hot-fix' into fix-long-gap 2018-05-11 10:02:17 -04:00
Heng Li e61812ee55 reduced gap len to trigger bad seed filtering 2018-05-01 16:17:21 -04:00
Heng Li 734ac379bb r770: matching N bases not working properly (#155) 2018-04-30 19:55:23 -04:00
Heng Li 759f8e4ac9 r769: filter out seeds breaking long gaps 2018-04-24 15:37:37 -04:00
Heng Li 83c57a9d98 r719: fixed bad memory access 2018-02-23 17:27:41 -05:00
Heng Li a0d62519c1 r710: fixed incorrect inversion coordinate (#112) 2018-02-15 14:23:42 -05:00
Heng Li 1372977a37 r708: implemented double Z-drop thresholds (#112)
When aligning long reads, we would prefer to align through low-quality
regions. This requires a large Z-drop threshold. However, to find small
inversions, we need to use a small Z-drop. This commit address this
conflict with two Z-drop thresholds. When Z-drop exceeds the smaller
threshold, we perform a local alignment to check if there is a potential
inversion. If there is one, we break the alignment; otherwise we break
the alignment only if Z-drop excess the larger threshold.

This commit also fixes a bug that reported wrong coordinates when the
inversion is on the forward strand (#112).
2018-02-15 10:50:49 -05:00
Heng Li c0e0d5d84b r707: bugfix for inversions on rev strand (#112) 2018-02-14 14:09:03 -05:00
Heng Li 7ef5490884 r703: added --max-clip-ratio
still testing the option
2018-02-12 13:29:18 -05:00
Heng Li a8d476c6ad r686: end seed trimming don't go over long join 2018-02-06 11:31:32 -05:00
Heng Li 29b4a1786c r685: tune end seed filter again 2018-02-05 11:48:22 -05:00
Heng Li dbf284b2d9 r684: separate end score from min_chain_score 2018-02-05 11:40:38 -05:00
Heng Li 35d3e064bf r677: reduce the change of missing hits
that are close to end of alignments. It is still possible to create examples
that fail the heuristic.
2018-02-02 10:35:33 -05:00
Heng Li 12a5a5fa3c r669: improved self chain extension (#10)
This has not fully resolved #10, only alleviated the issue.
2018-01-30 20:05:02 -05:00