Commit Graph

515 Commits (c2f07ff2ac8bdc5c6768e63191e614ea9012bd5d)

Author SHA1 Message Date
Heng Li eba237f39d r910: meaningful error message (#320)
when minimap2 fails to create temporary files
2019-01-29 10:29:27 -05:00
Heng Li 597212b9f3 r908: added an assertion to detect a potential bug
as in #311
2019-01-23 11:18:50 -05:00
Heng Li 48e230f40d r906: de tag is wrongly calculated given "N"
Resolves #309
2019-01-11 19:39:09 -05:00
Heng Li c404f49569 Release minimap2-2.15 (r905) 2019-01-10 12:34:45 -05:00
Heng Li cf2bae6e9b r904: fixed a corner-case segfault. Resolves #307. 2019-01-10 09:57:05 -05:00
Heng Li ea2b1c5b2a r894: added --max-qlen to filter out long query 2018-12-12 12:27:32 -05:00
Heng Li 2c52364527 r892: avoid de:f:0.0000 2018-11-24 21:54:28 -05:00
Heng Li 128476efc9 r891: compute gap-compressed divergence 2018-11-24 21:50:49 -05:00
Heng Li 1b3a6a0fe5 r890: removed "register" (#261) 2018-11-19 13:57:31 -05:00
Heng Li 83a8ee7038 r888: fixed incorrect CIGAR when --eqx in use
This was caused by mm_fix_cigar() which may change query/target offset in very
rare cases. Generating EQX has to beware of this change.

Resolves #266
2018-11-18 14:22:29 -05:00
Heng Li 91f548b497 r886: fixed two minor typos
Resolves #264
Resolves #265
2018-11-08 12:04:14 -05:00
Heng Li 6596c63dcd r884: for C++ compatibility (#261) 2018-11-06 22:07:11 -05:00
Heng Li 59f23f7579 Release minimap2-2.14 (r883) 2018-11-06 00:03:16 -05:00
Heng Li 5e55e397e9 r882: guard against -E0 (#263) 2018-11-05 23:36:12 -05:00
Heng Li 88c421e8de r881: a recent change reduces sr accuracy 2018-11-05 22:03:59 -05:00
Heng Li 3db5bfe6e5 r880: fixed false wrong FASTA/Q alert 2018-11-05 20:52:07 -05:00
Heng Li 1ede8ca170 r877: renamed cap-sw-mat to cap-sw-mem 2018-11-05 11:46:38 -05:00
Heng Li 13981404e2 r876: skip DP if taking too much RAM (#259) 2018-11-05 11:43:10 -05:00
Heng Li fd64dd26f6 r875: warn given incorrect FASTA/Q
resolves #252
resolves #255
2018-11-05 10:02:44 -05:00
Heng Li 24df95e4b8 r874: don't call x86_simd() so often
This takes a few percent of time in profiler.
2018-11-05 09:20:35 -05:00
Heng Li a8ee48c2ce r873: comforming to C99/C11; resolves #261 2018-11-05 08:25:07 -05:00
Heng Li 42baf287a4 r866: fixed a typo; resolves #262 2018-10-30 09:11:55 -04:00
Heng Li 9ed56b4a25 r860: MD/cs not working with --eqx 2018-10-26 23:23:53 -04:00
Heng Li 377c7099a8 r858: fixed a bug; resolves #254 2018-10-22 22:47:11 -04:00
Heng Li 7b0a49732e r856: wrongly reported for an unrecognized option
Resolved #250
2018-10-19 20:07:14 -04:00
Heng Li d04ac068fd r852: a minor when large --end-bonus is in use
We may use a large --end-bonus to mimic end-to-end alignment. In the short-read
mode, the candidate alignment region may be out of the band, which leads to
truncated alignment.
2018-10-15 21:28:27 -04:00
Heng Li 5d5d392c02 Release minimap2-2.13 (r850) 2018-10-11 13:18:31 -04:00
Heng Li 170863e553 r849: option -P doesn't work
I don't know why I haven't found it at the beginning.
2018-10-04 16:11:59 -04:00
Heng Li 97f97306a4 r847: guard against -N0 2018-09-27 15:13:44 -04:00
Heng Li 1077b7ddc8 r846: added --hard-mask-level for #244 2018-09-27 14:46:26 -04:00
Heng Li c57b59f02f r845: log peak memory 2018-09-23 20:27:49 -04:00
Heng Li c63a33904f r836: fixed an integer overflow
Forgot this one.
2018-09-14 23:29:31 -04:00
Heng Li 70b0fede64 r835: improved help message. Resolved #232 2018-09-14 22:29:25 -04:00
Heng Li 7d80d6de4a r832: fixed outdated -L. Resolved #231 and #233 2018-09-14 22:21:33 -04:00
Heng Li 7998fe9906 r829: replaced musl's getopt with ketopt 2018-09-01 21:18:02 -04:00
Heng Li 3a119d606f r828: --MD to support spliced alignment 2018-08-22 10:47:45 -04:00
Heng Li a5eafb75f9 Release minimap2-2.12 (r827) 2018-08-06 12:44:39 -04:00
Heng Li b0f39a1a61 r823: mappy to index a single sequence 2018-08-05 20:57:05 -04:00
Heng Li 5ab6538757 r822: added option --no-end-flt 2018-08-05 19:42:12 -04:00
Heng Li b32296e18f r821: fixed memory when -y is used 2018-07-31 15:14:37 -04:00
Heng Li ff9917a1c4 r819: mappy to support cs/MD 2018-07-24 23:29:55 -04:00
Heng Li 395c8d678a r815: fixed a memory leak 2018-07-15 22:11:32 -04:00
Heng Li 830da7fa27 r814: resumed versioning 2018-07-15 11:48:14 -04:00
Heng Li a655cbef86 print SAM header; remove tmp files 2018-07-15 11:03:18 -04:00
Heng Li e5277dbf5c code backup 2018-07-14 22:52:36 -04:00
Heng Li 1a55227d5a write hits to tmp files (unfinished) 2018-07-14 12:15:10 -04:00
Heng Li a609a07f8c optionally output unmapped query in PAF 2018-07-07 10:26:08 -05:00
Heng Li 0517972d02 Release minimap2-2.11 (r797) 2018-06-21 00:04:08 -04:00
Heng Li d46e68e6ad r796: don't use ssize_t 2018-06-20 12:45:27 -04:00
Heng Li 2584a4149a r295: use -r2000 for ava-ont, NOT for ava-pb 2018-06-20 12:24:43 -04:00
Heng Li 66674afd09 r794: fixed a bug in seed filtering 2018-06-20 10:26:29 -04:00
Heng Li 7e6e8ca73f r792: fixed -Wextra warnings and resolved #184 2018-06-19 15:26:58 -04:00
Heng Li 154d2caf5b r784: support the =/X CIGAR operators (#156) 2018-05-30 16:11:22 -04:00
Heng Li a3afeec0b2 r783: reverted to r781 (#155) 2018-05-30 15:25:34 -04:00
Heng Li 3573784b4d r782: no mask a chain having long ref ovlp (#155) 2018-05-30 13:53:45 -04:00
Heng Li 872f300955 r781: fixed the buggy heapmerge (resolves #166) 2018-05-30 11:55:14 -04:00
Heng Li 9f4309c376 r777: avoid skipping too many seeds 2018-05-11 10:25:18 -04:00
Heng Li 881b4ca3a2 r774: Merge branch 'hot-fix' into fix-long-gap 2018-05-11 10:02:17 -04:00
Heng Li 10c6dd2551 r773: fixed an integer overflow 2018-05-11 10:01:23 -04:00
Heng Li 7ec6721c44 r772: option -Y not working 2018-05-11 10:00:11 -04:00
Heng Li 734ac379bb r770: matching N bases not working properly (#155) 2018-04-30 19:55:23 -04:00
Heng Li 759f8e4ac9 r769: filter out seeds breaking long gaps 2018-04-24 15:37:37 -04:00
Heng Li aef7b0744c r768: shortened preset; added dv tag (#25)
Also added asm20 to command line help (#151)
2018-04-24 12:48:54 -04:00
Heng Li 372c90ceb5 r764: fixed incorrect inversion mapq (#148) 2018-04-10 09:11:49 -04:00
Heng Li ee4cd089f7 r763: fine control long join flank len (#128) 2018-03-29 14:16:58 -04:00
Heng Li 2d7ec75d50 Release minimap2-2.10 (r761) 2018-03-27 11:45:44 -04:00
Heng Li 5ef9580b17 r753: change bandwidth in ava-ont to 2000bp 2018-03-23 10:15:23 -04:00
Heng Li 08bd2123b6 r752: option to copy comments to output (#136) 2018-03-23 10:04:33 -04:00
Heng Li 8766d286df r751: optionally output MD (#118) 2018-03-22 14:15:33 -04:00
Heng Li 623b5d9d48 r750: check puts() return (#132 & #103) 2018-03-22 11:31:58 -04:00
Heng Li 18659118cd r749: don't print version etc at low verbose 2018-03-22 11:10:55 -04:00
Heng Li d1050f4eaf r748: optionally to use system getopt() (#134) 2018-03-19 11:18:26 -04:00
Heng Li bdc615c1d4 r741: added --min-occ-floor to improve #107 2018-03-12 14:32:27 -04:00
Heng Li eeb314edd6 Release minimap2-2.9 (r720) 2018-02-24 09:31:09 -05:00
Heng Li 83c57a9d98 r719: fixed bad memory access 2018-02-23 17:27:41 -05:00
Heng Li 24a4808826 r718: retrieve sequence from the index 2018-02-23 10:18:26 -05:00
Heng Li 8fc5f8dc90 r711: assign proper mapq to primary inversions 2018-02-15 14:34:59 -05:00
Heng Li a0d62519c1 r710: fixed incorrect inversion coordinate (#112) 2018-02-15 14:23:42 -05:00
Heng Li 1372977a37 r708: implemented double Z-drop thresholds (#112)
When aligning long reads, we would prefer to align through low-quality
regions. This requires a large Z-drop threshold. However, to find small
inversions, we need to use a small Z-drop. This commit address this
conflict with two Z-drop thresholds. When Z-drop exceeds the smaller
threshold, we perform a local alignment to check if there is a potential
inversion. If there is one, we break the alignment; otherwise we break
the alignment only if Z-drop excess the larger threshold.

This commit also fixes a bug that reported wrong coordinates when the
inversion is on the forward strand (#112).
2018-02-15 10:50:49 -05:00
Heng Li c0e0d5d84b r707: bugfix for inversions on rev strand (#112) 2018-02-14 14:09:03 -05:00
Heng Li b328795051 r706: don't segfault upon wrong FASTA/Q (#111)
The lack of robustness cost me several hours to identify.
2018-02-13 10:00:22 -05:00
Heng Li 7ef5490884 r703: added --max-clip-ratio
still testing the option
2018-02-12 13:29:18 -05:00
Heng Li a8d476c6ad r686: end seed trimming don't go over long join 2018-02-06 11:31:32 -05:00
Heng Li 29b4a1786c r685: tune end seed filter again 2018-02-05 11:48:22 -05:00
Heng Li dbf284b2d9 r684: separate end score from min_chain_score 2018-02-05 11:40:38 -05:00
Heng Li 35d3e064bf r677: reduce the change of missing hits
that are close to end of alignments. It is still possible to create examples
that fail the heuristic.
2018-02-02 10:35:33 -05:00
Heng Li 53ce317e59 Release minimap2-2.8 (r672) 2018-02-01 12:50:20 -05:00
Heng Li da6947cfa3 r671: cleanup command line options 2018-01-31 13:59:52 -05:00
Heng Li 46d6349af4 r670: added PE support to mappy
and minor code cleanup
2018-01-31 11:33:08 -05:00
Heng Li 12a5a5fa3c r669: improved self chain extension (#10)
This has not fully resolved #10, only alleviated the issue.
2018-01-30 20:05:02 -05:00
Heng Li 43bfa6199d r667: warn if one query file has fewer records #92 2018-01-28 17:36:21 -05:00
Heng Li 72b9b0e3b6 r666: report if >=3 query files in SR mode #92 2018-01-28 17:15:57 -05:00
Heng Li d676a5314b r664: use --heat-sort for sr by default 2018-01-26 12:25:42 -05:00
Heng Li 123bc1d91d put option operations in another file 2018-01-26 08:38:37 -05:00
Heng Li 543fa12e68 r659: for C++ compatibility 2018-01-19 10:40:18 -05:00
Heng Li af1a871270 r658: gives a warning if -N0 is used 2018-01-19 08:33:20 -05:00
Heng Li 2b71181a37 r657: check -p (#96)
Well, in principle, every option should be checked. Will do when someone raise
issues...
2018-01-19 01:03:38 -05:00
Heng Li 33f8157961 r655: options to map to one strand of the ref #91 2018-01-16 10:34:30 -05:00
Heng Li eecc06086f Released minimap2-2.7 (r654) 2018-01-09 13:16:00 -05:00
Heng Li dfea113f28 r653: the last change may write "N" wrongly 2018-01-08 11:33:53 -05:00
Heng Li f5cfd439ee r651: incorrectly treat introns as deletions
This happened when the last operation during backtracking is an intron.
2018-01-07 19:42:50 -05:00
Heng Li dc9e3dcf4a r639: changed -O/-E validation 2017-12-30 20:39:29 -05:00
Heng Li cc75c12905 r638: disabled scoring checking
I haven't figured out the exact bounds...
2017-12-30 07:50:40 -05:00
Heng Li e420b17496 r629: API to construct index from strings 2017-12-18 22:29:46 -05:00
Heng Li ab345e600b r626: function to check incorrect scoring system 2017-12-13 12:23:43 -05:00
Heng Li d003a00d71 r625: HPC sketch still has one minor issue 2017-12-13 09:40:42 -05:00
Heng Li eb819c29e8 Release minimap2-2.6 (r623) 2017-12-12 11:09:59 -05:00
Heng Li fb630de40a r622: fixed bug in sdust due to recent refactor 2017-12-11 15:32:28 -05:00
Heng Li 43960a8ca7 r621: --print-qname also shows kalloc status 2017-12-11 12:30:08 -05:00
Heng Li f6608fe99c r620: revamped thread-local memory management
* Don't preallocate sdust_buf or minizer list. kalloc should be fast enough -
  benchmarks needed to confirm.

* Fixed a memory leak caused by divergence estimate (post v2.5)

* Reset the kalloc buffer after mapping a long query. This reduces peak memory
  when large chunks of memory are allocated, at the cost of performance, though.
2017-12-11 12:11:10 -05:00
Heng Li 98a6e52c06 r618: heuristics to avoid tiny terminal exons 2017-12-11 00:57:55 -05:00
Heng Li 824712a4ee r617: removed some unused code 2017-12-10 17:54:50 -05:00
Heng Li 98a999fe44 r611: added pseudocount when est divergence 2017-12-08 12:57:57 -05:00
Heng Li fec7bd713f r610: warning if db sequence is 0-lengthed (#69) 2017-12-07 21:05:39 -05:00
Heng Li 2f693e8ca4 r609: bugfix - SDUST masking not working 2017-12-07 11:45:38 -05:00
Heng Li 704ff9f4c6 r607: estimate sequence divergence
Currently using the simplest method. There may be a more accurate estimate.
2017-12-06 16:14:39 -05:00
Heng Li 68c63f2d68 r606: fixed a sketch bug for long 256bp k-mer
sketch() writes {-1,-1} to the output array.
2017-12-06 16:13:29 -05:00
Heng Li 984f7846c0 r601: bugfix - a similar issue to r600
This bug unsets the alignment score of suboptimal alignments.
2017-11-30 11:51:34 -05:00
Heng Li af1d6afba9 r600: bugfix - missing secondary alignments (#71)
This should very rarely happen to typical data, but has a higher chance in
artifactual data.
2017-11-30 11:34:10 -05:00
Heng Li 131cfc6938 r574: build index without sequences 2017-11-11 21:38:38 -05:00
Heng Li 2f463b1db0 r573: prepare to generalize index 2017-11-11 19:54:06 -05:00
Heng Li 3b518271ee Release minimap2-2.5 (r572) 2017-11-11 11:29:28 -05:00
Heng Li d7a31e40e6 r569: last commit is buggy 2017-11-09 23:20:41 -05:00
Heng Li dd18cd75de r568: revert - don't take max(dp_max, dp_score) 2017-11-09 23:12:48 -05:00
Heng Li 99a2709913 r567: minor change to #56 2017-11-09 19:17:45 -05:00
mvdbeek 1cb0bf4bef Implement -Y for soft clipping of supp. alignments
I tried to base this on bwa-mem and it seems to work for sam alignments.
2017-11-09 19:22:36 +01:00
Heng Li a7b38f6900 r562: fixed a severe bug: wrong query start 2017-11-08 22:31:05 -05:00
Heng Li e896c9ec05 r559: prefer a chain involving more segments 2017-11-08 13:22:16 -05:00
Heng Li 98ba8928c6 r558: dp_max no less than dp_score 2017-11-08 10:06:10 -05:00
Heng Li b24d68ae9f r557: fixed another mapq underestimate
When a chain is split during base-level alignment, its chaining score is
reduced. However, the chaining score of its suboptimal chain remains the same.
This leads to underestimated mapping quality.
2017-11-07 23:20:49 -05:00
Heng Li 65deedfa96 r556: bugfix - underestimate mapq for split aln 2017-11-07 22:37:12 -05:00
Heng Li 21a46ba652 Release minimap2-2.4 (r555) 2017-11-06 12:54:02 -05:00
Heng Li fa5a645ca5 r552: fixed a tiny typo on struct packing
The old packing wastes memory, thought very small.
2017-11-05 08:27:26 -05:00
Heng Li a3f0aa1d5b r550: fixed -L issues with secondary and supp aln 2017-11-04 12:13:38 -04:00
Heng Li 22290db3e4 r546: minor mapQ tuning 2017-11-01 13:20:39 -04:00
Heng Li cd24dc8834 r545: removed option -i, not working well 2017-10-31 22:23:27 -04:00
Heng Li b8e758df0f r544: increased PE mapQ 2017-10-31 16:55:02 -04:00
Heng Li 311fa90030 r543: applied some sr mapq changes to long reads 2017-10-31 15:24:05 -04:00
Heng Li fb8a1b5536 r542: tuning mapQ calculation 2017-10-31 14:25:09 -04:00
Heng Li 285eb0da05 r540: removed a buggy debugging line 2017-10-29 00:02:41 -04:00
Heng Li 192217a10c r539: use --splice-flank=yes by default
In human/mouse, the GTr..yAG pattern occurs to 91/92% of all GT-AG introns.
Modeling r..y clearly leads to higher accuracy. However, in SIRV, this
percentage is reduced to ~60%. The default "--splice --splice-flank=yes"
leads to lower accuracy. If someone benchmark minimap2 on SIRV, this would be
bad, but minimap2 is developed for practical applications, not for benchmarks.
I will live with that.
2017-10-28 22:29:55 -04:00
Heng Li f22a94e868 r538: fixed a long existing bug in HPC k-mer (#47)
This bug may lead to a wrong minimizer when a HPC k-mer is longer than 256bp.
When there is a seed match involving this wrong HPC k-mer, the correct seed
sequences do not match in fact. This violates the assumption in align.c and
subsequently causes a segfault, which is what #47 has caught. This bug lurked
in the earliest piece of code and affected all released minimap2 versions so
far. It is extremely rare and does not affect the prebuilt GRCh37/38 indices.
2017-10-28 19:21:10 -04:00
Heng Li 79b0caca95 r537: model the next base to GT/AG
[PMID:18688272] shows that the base following GT tends to be A or G (i.e. R) in
both human and yeast, and that the base preceeding AG tends to be C or T (i.e.
Y). In the new model, we pay no cost to GTr..yAG, but we pay half of the cost
if there is no r or y. This improves the junction accuracy when mapping to
human and mouse and decreases the accuacy when mapping to SIRV. My guess is
that SIRV does not honor this trend. Need to investigate in future.

Also in this commit, --cost-non-gt-ag is aliased to -C. The default is changed
to 9 instead of 5. I also added --splice-flank to enable the above model. This
may become the default once I confirm my hypothesis on SIRV.
2017-10-28 00:25:01 -04:00
Heng Li afc2f2e84b r536: removed an unnecessary assert() 2017-10-24 21:08:54 -04:00
Heng Li d4b5dfc297 r533: added --no-pairing
to prevent the use of any pairing information for paired-end reads.
2017-10-23 14:09:32 -04:00
Heng Li 306e4541f8 Released minimap2-2.3 (r531) 2017-10-22 23:13:35 -04:00
Heng Li beeb806829 r526: fixed a bug when HPC is in use
It happened when the query HPC minimizer is longer than the reference HPC
minimizer close to the beginning of a contig. We may get a negative coordinate,
which causes an assertion failure.
2017-10-21 19:54:04 -04:00
Heng Li be7f3c4ffe r525: fixed a bug in chaining; handle ovlp ends 2017-10-20 21:34:52 -04:00
Heng Li bd04372873 r524: reverted to bwa-mem end bonus
and reduced the cost of clipping when filtering by identity
2017-10-20 16:57:31 -04:00
Heng Li 15ed0712c2 r523: fixed a performance bug in ksw2_ll
Wont' affect accuracy.
2017-10-20 13:00:10 -04:00