Commit Graph

515 Commits (c2f07ff2ac8bdc5c6768e63191e614ea9012bd5d)

Author SHA1 Message Date
Heng Li 66674afd09 r794: fixed a bug in seed filtering 2018-06-20 10:26:29 -04:00
Heng Li 7e6e8ca73f r792: fixed -Wextra warnings and resolved #184 2018-06-19 15:26:58 -04:00
Heng Li 154d2caf5b r784: support the =/X CIGAR operators (#156) 2018-05-30 16:11:22 -04:00
Heng Li a3afeec0b2 r783: reverted to r781 (#155) 2018-05-30 15:25:34 -04:00
Heng Li 3573784b4d r782: no mask a chain having long ref ovlp (#155) 2018-05-30 13:53:45 -04:00
Heng Li 872f300955 r781: fixed the buggy heapmerge (resolves #166) 2018-05-30 11:55:14 -04:00
Heng Li 9f4309c376 r777: avoid skipping too many seeds 2018-05-11 10:25:18 -04:00
Heng Li 881b4ca3a2 r774: Merge branch 'hot-fix' into fix-long-gap 2018-05-11 10:02:17 -04:00
Heng Li 10c6dd2551 r773: fixed an integer overflow 2018-05-11 10:01:23 -04:00
Heng Li 7ec6721c44 r772: option -Y not working 2018-05-11 10:00:11 -04:00
Heng Li 734ac379bb r770: matching N bases not working properly (#155) 2018-04-30 19:55:23 -04:00
Heng Li 759f8e4ac9 r769: filter out seeds breaking long gaps 2018-04-24 15:37:37 -04:00
Heng Li aef7b0744c r768: shortened preset; added dv tag (#25)
Also added asm20 to command line help (#151)
2018-04-24 12:48:54 -04:00
Heng Li 372c90ceb5 r764: fixed incorrect inversion mapq (#148) 2018-04-10 09:11:49 -04:00
Heng Li ee4cd089f7 r763: fine control long join flank len (#128) 2018-03-29 14:16:58 -04:00
Heng Li 2d7ec75d50 Release minimap2-2.10 (r761) 2018-03-27 11:45:44 -04:00
Heng Li 5ef9580b17 r753: change bandwidth in ava-ont to 2000bp 2018-03-23 10:15:23 -04:00
Heng Li 08bd2123b6 r752: option to copy comments to output (#136) 2018-03-23 10:04:33 -04:00
Heng Li 8766d286df r751: optionally output MD (#118) 2018-03-22 14:15:33 -04:00
Heng Li 623b5d9d48 r750: check puts() return (#132 & #103) 2018-03-22 11:31:58 -04:00
Heng Li 18659118cd r749: don't print version etc at low verbose 2018-03-22 11:10:55 -04:00
Heng Li d1050f4eaf r748: optionally to use system getopt() (#134) 2018-03-19 11:18:26 -04:00
Heng Li bdc615c1d4 r741: added --min-occ-floor to improve #107 2018-03-12 14:32:27 -04:00
Heng Li eeb314edd6 Release minimap2-2.9 (r720) 2018-02-24 09:31:09 -05:00
Heng Li 83c57a9d98 r719: fixed bad memory access 2018-02-23 17:27:41 -05:00
Heng Li 24a4808826 r718: retrieve sequence from the index 2018-02-23 10:18:26 -05:00
Heng Li 8fc5f8dc90 r711: assign proper mapq to primary inversions 2018-02-15 14:34:59 -05:00
Heng Li a0d62519c1 r710: fixed incorrect inversion coordinate (#112) 2018-02-15 14:23:42 -05:00
Heng Li 1372977a37 r708: implemented double Z-drop thresholds (#112)
When aligning long reads, we would prefer to align through low-quality
regions. This requires a large Z-drop threshold. However, to find small
inversions, we need to use a small Z-drop. This commit address this
conflict with two Z-drop thresholds. When Z-drop exceeds the smaller
threshold, we perform a local alignment to check if there is a potential
inversion. If there is one, we break the alignment; otherwise we break
the alignment only if Z-drop excess the larger threshold.

This commit also fixes a bug that reported wrong coordinates when the
inversion is on the forward strand (#112).
2018-02-15 10:50:49 -05:00
Heng Li c0e0d5d84b r707: bugfix for inversions on rev strand (#112) 2018-02-14 14:09:03 -05:00
Heng Li b328795051 r706: don't segfault upon wrong FASTA/Q (#111)
The lack of robustness cost me several hours to identify.
2018-02-13 10:00:22 -05:00
Heng Li 7ef5490884 r703: added --max-clip-ratio
still testing the option
2018-02-12 13:29:18 -05:00
Heng Li a8d476c6ad r686: end seed trimming don't go over long join 2018-02-06 11:31:32 -05:00
Heng Li 29b4a1786c r685: tune end seed filter again 2018-02-05 11:48:22 -05:00
Heng Li dbf284b2d9 r684: separate end score from min_chain_score 2018-02-05 11:40:38 -05:00
Heng Li 35d3e064bf r677: reduce the change of missing hits
that are close to end of alignments. It is still possible to create examples
that fail the heuristic.
2018-02-02 10:35:33 -05:00
Heng Li 53ce317e59 Release minimap2-2.8 (r672) 2018-02-01 12:50:20 -05:00
Heng Li da6947cfa3 r671: cleanup command line options 2018-01-31 13:59:52 -05:00
Heng Li 46d6349af4 r670: added PE support to mappy
and minor code cleanup
2018-01-31 11:33:08 -05:00
Heng Li 12a5a5fa3c r669: improved self chain extension (#10)
This has not fully resolved #10, only alleviated the issue.
2018-01-30 20:05:02 -05:00
Heng Li 43bfa6199d r667: warn if one query file has fewer records #92 2018-01-28 17:36:21 -05:00
Heng Li 72b9b0e3b6 r666: report if >=3 query files in SR mode #92 2018-01-28 17:15:57 -05:00
Heng Li d676a5314b r664: use --heat-sort for sr by default 2018-01-26 12:25:42 -05:00
Heng Li 123bc1d91d put option operations in another file 2018-01-26 08:38:37 -05:00
Heng Li 543fa12e68 r659: for C++ compatibility 2018-01-19 10:40:18 -05:00
Heng Li af1a871270 r658: gives a warning if -N0 is used 2018-01-19 08:33:20 -05:00
Heng Li 2b71181a37 r657: check -p (#96)
Well, in principle, every option should be checked. Will do when someone raise
issues...
2018-01-19 01:03:38 -05:00
Heng Li 33f8157961 r655: options to map to one strand of the ref #91 2018-01-16 10:34:30 -05:00
Heng Li eecc06086f Released minimap2-2.7 (r654) 2018-01-09 13:16:00 -05:00
Heng Li dfea113f28 r653: the last change may write "N" wrongly 2018-01-08 11:33:53 -05:00
Heng Li f5cfd439ee r651: incorrectly treat introns as deletions
This happened when the last operation during backtracking is an intron.
2018-01-07 19:42:50 -05:00
Heng Li dc9e3dcf4a r639: changed -O/-E validation 2017-12-30 20:39:29 -05:00
Heng Li cc75c12905 r638: disabled scoring checking
I haven't figured out the exact bounds...
2017-12-30 07:50:40 -05:00
Heng Li e420b17496 r629: API to construct index from strings 2017-12-18 22:29:46 -05:00
Heng Li ab345e600b r626: function to check incorrect scoring system 2017-12-13 12:23:43 -05:00
Heng Li d003a00d71 r625: HPC sketch still has one minor issue 2017-12-13 09:40:42 -05:00
Heng Li eb819c29e8 Release minimap2-2.6 (r623) 2017-12-12 11:09:59 -05:00
Heng Li fb630de40a r622: fixed bug in sdust due to recent refactor 2017-12-11 15:32:28 -05:00
Heng Li 43960a8ca7 r621: --print-qname also shows kalloc status 2017-12-11 12:30:08 -05:00
Heng Li f6608fe99c r620: revamped thread-local memory management
* Don't preallocate sdust_buf or minizer list. kalloc should be fast enough -
  benchmarks needed to confirm.

* Fixed a memory leak caused by divergence estimate (post v2.5)

* Reset the kalloc buffer after mapping a long query. This reduces peak memory
  when large chunks of memory are allocated, at the cost of performance, though.
2017-12-11 12:11:10 -05:00
Heng Li 98a6e52c06 r618: heuristics to avoid tiny terminal exons 2017-12-11 00:57:55 -05:00
Heng Li 824712a4ee r617: removed some unused code 2017-12-10 17:54:50 -05:00
Heng Li 98a999fe44 r611: added pseudocount when est divergence 2017-12-08 12:57:57 -05:00
Heng Li fec7bd713f r610: warning if db sequence is 0-lengthed (#69) 2017-12-07 21:05:39 -05:00
Heng Li 2f693e8ca4 r609: bugfix - SDUST masking not working 2017-12-07 11:45:38 -05:00
Heng Li 704ff9f4c6 r607: estimate sequence divergence
Currently using the simplest method. There may be a more accurate estimate.
2017-12-06 16:14:39 -05:00
Heng Li 68c63f2d68 r606: fixed a sketch bug for long 256bp k-mer
sketch() writes {-1,-1} to the output array.
2017-12-06 16:13:29 -05:00
Heng Li 984f7846c0 r601: bugfix - a similar issue to r600
This bug unsets the alignment score of suboptimal alignments.
2017-11-30 11:51:34 -05:00
Heng Li af1d6afba9 r600: bugfix - missing secondary alignments (#71)
This should very rarely happen to typical data, but has a higher chance in
artifactual data.
2017-11-30 11:34:10 -05:00
Heng Li 131cfc6938 r574: build index without sequences 2017-11-11 21:38:38 -05:00
Heng Li 2f463b1db0 r573: prepare to generalize index 2017-11-11 19:54:06 -05:00
Heng Li 3b518271ee Release minimap2-2.5 (r572) 2017-11-11 11:29:28 -05:00
Heng Li d7a31e40e6 r569: last commit is buggy 2017-11-09 23:20:41 -05:00
Heng Li dd18cd75de r568: revert - don't take max(dp_max, dp_score) 2017-11-09 23:12:48 -05:00
Heng Li 99a2709913 r567: minor change to #56 2017-11-09 19:17:45 -05:00
mvdbeek 1cb0bf4bef Implement -Y for soft clipping of supp. alignments
I tried to base this on bwa-mem and it seems to work for sam alignments.
2017-11-09 19:22:36 +01:00
Heng Li a7b38f6900 r562: fixed a severe bug: wrong query start 2017-11-08 22:31:05 -05:00
Heng Li e896c9ec05 r559: prefer a chain involving more segments 2017-11-08 13:22:16 -05:00
Heng Li 98ba8928c6 r558: dp_max no less than dp_score 2017-11-08 10:06:10 -05:00
Heng Li b24d68ae9f r557: fixed another mapq underestimate
When a chain is split during base-level alignment, its chaining score is
reduced. However, the chaining score of its suboptimal chain remains the same.
This leads to underestimated mapping quality.
2017-11-07 23:20:49 -05:00
Heng Li 65deedfa96 r556: bugfix - underestimate mapq for split aln 2017-11-07 22:37:12 -05:00
Heng Li 21a46ba652 Release minimap2-2.4 (r555) 2017-11-06 12:54:02 -05:00
Heng Li fa5a645ca5 r552: fixed a tiny typo on struct packing
The old packing wastes memory, thought very small.
2017-11-05 08:27:26 -05:00
Heng Li a3f0aa1d5b r550: fixed -L issues with secondary and supp aln 2017-11-04 12:13:38 -04:00
Heng Li 22290db3e4 r546: minor mapQ tuning 2017-11-01 13:20:39 -04:00
Heng Li cd24dc8834 r545: removed option -i, not working well 2017-10-31 22:23:27 -04:00
Heng Li b8e758df0f r544: increased PE mapQ 2017-10-31 16:55:02 -04:00
Heng Li 311fa90030 r543: applied some sr mapq changes to long reads 2017-10-31 15:24:05 -04:00
Heng Li fb8a1b5536 r542: tuning mapQ calculation 2017-10-31 14:25:09 -04:00
Heng Li 285eb0da05 r540: removed a buggy debugging line 2017-10-29 00:02:41 -04:00
Heng Li 192217a10c r539: use --splice-flank=yes by default
In human/mouse, the GTr..yAG pattern occurs to 91/92% of all GT-AG introns.
Modeling r..y clearly leads to higher accuracy. However, in SIRV, this
percentage is reduced to ~60%. The default "--splice --splice-flank=yes"
leads to lower accuracy. If someone benchmark minimap2 on SIRV, this would be
bad, but minimap2 is developed for practical applications, not for benchmarks.
I will live with that.
2017-10-28 22:29:55 -04:00
Heng Li f22a94e868 r538: fixed a long existing bug in HPC k-mer (#47)
This bug may lead to a wrong minimizer when a HPC k-mer is longer than 256bp.
When there is a seed match involving this wrong HPC k-mer, the correct seed
sequences do not match in fact. This violates the assumption in align.c and
subsequently causes a segfault, which is what #47 has caught. This bug lurked
in the earliest piece of code and affected all released minimap2 versions so
far. It is extremely rare and does not affect the prebuilt GRCh37/38 indices.
2017-10-28 19:21:10 -04:00
Heng Li 79b0caca95 r537: model the next base to GT/AG
[PMID:18688272] shows that the base following GT tends to be A or G (i.e. R) in
both human and yeast, and that the base preceeding AG tends to be C or T (i.e.
Y). In the new model, we pay no cost to GTr..yAG, but we pay half of the cost
if there is no r or y. This improves the junction accuracy when mapping to
human and mouse and decreases the accuacy when mapping to SIRV. My guess is
that SIRV does not honor this trend. Need to investigate in future.

Also in this commit, --cost-non-gt-ag is aliased to -C. The default is changed
to 9 instead of 5. I also added --splice-flank to enable the above model. This
may become the default once I confirm my hypothesis on SIRV.
2017-10-28 00:25:01 -04:00
Heng Li afc2f2e84b r536: removed an unnecessary assert() 2017-10-24 21:08:54 -04:00
Heng Li d4b5dfc297 r533: added --no-pairing
to prevent the use of any pairing information for paired-end reads.
2017-10-23 14:09:32 -04:00
Heng Li 306e4541f8 Released minimap2-2.3 (r531) 2017-10-22 23:13:35 -04:00
Heng Li beeb806829 r526: fixed a bug when HPC is in use
It happened when the query HPC minimizer is longer than the reference HPC
minimizer close to the beginning of a contig. We may get a negative coordinate,
which causes an assertion failure.
2017-10-21 19:54:04 -04:00
Heng Li be7f3c4ffe r525: fixed a bug in chaining; handle ovlp ends 2017-10-20 21:34:52 -04:00
Heng Li bd04372873 r524: reverted to bwa-mem end bonus
and reduced the cost of clipping when filtering by identity
2017-10-20 16:57:31 -04:00
Heng Li 15ed0712c2 r523: fixed a performance bug in ksw2_ll
Wont' affect accuracy.
2017-10-20 13:00:10 -04:00
Heng Li 4683da2455 r520: added option -L to write long cigar to CG 2017-10-17 17:32:44 -04:00
Heng Li ffd953029f r519: fixed a severe bug that misses long alns 2017-10-17 15:52:36 -04:00
Heng Li 04cf4ebf5e r518: increased the default -K to 500M
This helps multi-thread performance for ultra-long reads.
2017-10-17 13:21:29 -04:00
Heng Li 25ffd72690 r517: replaced --print-2nd with --secondary 2017-10-17 11:41:56 -04:00
Heng Li aa2d9d4e1b r516: throw a warning if -N0 is used 2017-10-16 14:55:35 -04:00
Heng Li addb61bcb2 r515: more conservative hit exclusion
When a hit covers a long query subsequence that has not been covered by better
primary hits, this hit is more likely to become a new primary hit.
2017-10-16 13:58:01 -04:00
Heng Li adf6cd7f52 r513: merged pre- and post-cigar blen and mlen
This saves a bit memory and is cleaner.
2017-10-16 10:55:18 -04:00
Heng Li e6f525edaf r512: option to filter poorly aligned reads 2017-10-16 10:38:22 -04:00
Heng Li 858213d513 r511: fixed wrong primary sam record 2017-10-12 23:02:18 -04:00
Heng Li dea3b60918 r510: fixed an off-by-1 bug for unmapped mate 2017-10-12 17:31:13 -04:00
Heng Li 7c555f9b7e r508: use two I/O threads for mapping
-x sr applies this option by default
2017-10-12 14:56:01 -04:00
Heng Li 2801ed9b4b r507: -K not working as is intended (#36) 2017-10-12 14:16:05 -04:00
Heng Li ce06188203 r506: fixed a memory leak 2017-10-12 10:12:22 -04:00
Heng Li 9862a75cd3 r505: a bit code simplification 2017-10-11 21:54:32 -04:00
Heng Li 3073f4a758 r504: better heuristics to reduce excessive ext 2017-10-11 21:42:11 -04:00
Heng Li 9364bc64d7 r501: added end_bonus to extz2 2017-10-11 09:39:41 -04:00
Heng Li 65abdb8f3c r500: temporarily disabled region trunc
because it is causing other problems.
2017-10-11 00:16:04 -04:00
Heng Li 7345621759 r499: end bonus working; DP region needs improve! 2017-10-11 00:14:25 -04:00
Heng Li ca632f907b r498: fixed a bug when merging like "4I5I" 2017-10-10 21:22:37 -04:00
Heng Li 6c78a980b6 r497: the previous change not working at the ends 2017-10-10 17:32:28 -04:00
Heng Li c217eecdb7 r496: avoid DP extending into another chain
When deciding the region for DP, exclude regions in the adjacent chain
2017-10-10 17:25:12 -04:00
Heng Li 13b66aad4d r495: fix impropriate CIGAR
1. Not left aligned
2. In one case, 50M24D50M becomes 24D100M. The leading D needs to be removed.
3. Avoid identical hits after DP
2017-10-10 11:59:44 -04:00
Heng Li 46fa520db9 r494: simpler and better SR gap filling
Still one thing to do: left alignment
2017-10-09 22:02:30 -04:00
Heng Li 1e53610fb4 r493: reduced calling extd2 for ungapped aln
Still need to improve in case of 3I5M3D
2017-10-09 21:13:34 -04:00
Heng Li 9396d9e11b r452: typo in the last commit 2017-10-09 10:05:32 -04:00
Heng Li 198849a716 r491: an ambiguous base costs the same as gap ext 2017-10-09 09:59:42 -04:00
Heng Li 9fea4d16b3 r490: improved short-read extension heuristic
Now we find the best scoring ungapped seeded segment and then extend from it.
There is no gap filling for short reads.
2017-10-08 21:36:34 -04:00
Heng Li f9415628a8 r489: don't use approximate zdrop
it doesn't work well
2017-10-08 19:29:09 -04:00
Heng Li 61e56c941d r488: parameter to control max fragment length 2017-10-07 23:54:32 -04:00
Heng Li f150257a0d r487: demote "map10k"; improved README 2017-10-07 19:19:40 -04:00
Heng Li bf2d4f7aec r486: treat "U" as "T" for RNA reads (#33) 2017-10-07 18:53:25 -04:00
Heng Li c6384ed2c8 r482: increased short-read bandwidth to 100
This has very minor effect on speed.
2017-10-06 10:20:32 -04:00
Heng Li e0baf1ad54 r479: a bit code cleanup 2017-10-05 16:15:14 -04:00
Heng Li f266092699 r478: simplied useless code, a tiny bit 2017-10-05 15:56:00 -04:00
Heng Li 9c5767f9ed r477: renamed multi_seg to frag_mode 2017-10-05 15:48:17 -04:00
Heng Li ae2adf04d4 r476: multi-file fragment mode working 2017-10-05 15:39:26 -04:00
Heng Li b839758335 r475: added --cs=none; updated manpage 2017-10-05 15:27:37 -04:00
Heng Li f4a5d3a692 r474: replaced -S and --cs-no-equal with --cs 2017-10-05 15:03:03 -04:00
Heng Li 3ff6eda3a4 r473: don't count introns into blen 2017-10-05 14:37:21 -04:00
Heng Li 1a90bc8603 r472: fixed a bug when printing MAPQ/CIGAR 2017-10-05 12:46:11 -04:00
Heng Li abf2a90363 r471: all SAM features implemented; more tests! 2017-10-05 12:37:30 -04:00
Heng Li 7cc4f6f965 r469: first step towards PE SAM 2017-10-05 10:38:09 -04:00
Heng Li 16e6e589a8 r468: replaced ^ with ~ in cs 2017-10-04 22:17:12 -04:00
Heng Li 9aba11769c r467: added : (equal length) and ^ (intron) ops 2017-10-04 21:55:37 -04:00
Heng Li 7d50e646dd r466: detect multi-part index more smartly
though it might not work in an extremely rare case: the end of a sequence ends
at X*16384 and it is the last sequence in a batch. This can be resolved by
never letting the kstream_t buffer empty.
2017-10-04 17:32:58 -04:00
Heng Li 1554149158 r465: apply option -x before other options 2017-10-04 13:52:28 -04:00
Heng Li 19c39e704f r464: fixed a bug in pairing, due to randomization 2017-10-04 13:37:40 -04:00
Heng Li 2581c44a21 r463: optionally disable secondary hits 2017-10-04 13:24:41 -04:00
Heng Li 5babf41a38 r462: SAM primary flag not properly set 2017-10-04 13:11:29 -04:00
Heng Li 2a1e738a94 r461: randomize repetitive hits 2017-10-04 13:05:18 -04:00
Heng Li cf55c84056 r460: added option --no-long-join 2017-10-04 12:08:44 -04:00
Heng Li 841763ec24 Merge branch 'master' into sr 2017-10-04 11:42:44 -04:00
Heng Li 95eb1dec36 r458: fixed wrong chr for inversion aln (#30) 2017-10-04 11:32:06 -04:00
Heng Li 0fd0f2aed1 r457: fixed a bug on parsing -f 2017-09-30 00:00:44 -04:00
Heng Li ee9b2773a8 r456: min chain score should >k-mer length
or chain_dp() wastes time on unnecessarily sorting chains with one k-mer.
2017-09-29 22:33:55 -04:00
Heng Li 340483821e r455: set max_occ on command line 2017-09-29 22:18:43 -04:00
Heng Li 04fb2c2ec0 r454: rechain with higher max_occ if no good chain 2017-09-29 19:24:32 -04:00
Heng Li 0d4ecd19ee r453: avoid duplicated strcmp() for ava 2017-09-28 15:52:05 -04:00
Heng Li 0c63325985 r452: fixed - -G not working with -x sr 2017-09-28 14:28:12 -04:00
Heng Li 2a554a92e9 r451: changed rep_len mapq heuristic 2017-09-28 14:23:14 -04:00
Heng Li 935a6e6064 r450: differentiate exact repeats via mapq 2017-09-27 23:51:05 -04:00
Heng Li 8301222174 r448: fixed a bug when computing PE quality 2017-09-27 21:54:07 -04:00
Heng Li 7e0d70bfd3 r445: pair coordinate adjustment working
Next: mapq adjustment, which will be tricky...
2017-09-27 15:38:18 -04:00
Heng Li a349d85280 r444: changed the way orientation is specified
The old model doesn't work with RF or RR orientation. The new model only works
with paired-end reads. For >2 segments, only FF is supported.
2017-09-27 12:33:10 -04:00
Heng Li f611edf6f2 r443: don't filter small cm for split seg 2017-09-26 16:17:58 -04:00
Heng Li 1b1dd0cd57 r442: default max_gap to 200 in the sr mode 2017-09-26 13:31:01 -04:00
Heng Li 55d1e4f638 r440: better chain filtering for PE reads 2017-09-26 11:03:36 -04:00
Heng Li 64c0ad6b35 r439: use splice-like chain gap cost between segs
This improves accuracy
2017-09-25 16:04:38 -04:00
Heng Li 9538c985aa r438: fixed a rare case that leads to missing hits
It is a bug in chaining.
2017-09-25 14:59:34 -04:00
Heng Li 8f25cfa36e r437: fixed uninialized memory on rep_len 2017-09-25 14:22:45 -04:00
Heng Li 81008dd371 r436: working on short reads
The result is mixed - lots of room for tuning
2017-09-25 14:06:29 -04:00
Heng Li 3bb66e1ed3 multi-seg working on toy examples 2017-09-25 13:42:04 -04:00
Heng Li 5b39a1b34b Merge branch 'master' into sr 2017-09-20 12:24:08 -04:00
Heng Li e3b5802b2e r424: reduce memory for long query seqs 2017-09-20 12:22:13 -04:00
Heng Li 645db3350e Merge branch 'master' into sr 2017-09-20 11:15:14 -04:00
Heng Li 75e6bbc9f6 r421: removed the MM_F_SPLICE_BOTH mode
In the default splice mode, minimap2 applies two rounds of spliced alignment:
first assuming GT-AG to be the splice signal across all splicing sites and then
assuming CT-AC to be the signal. This is the idea strategy.

In the MM_F_SPLICE_BOTH mode, minimap2 applies one round of spliced alignment,
assuming GT-AG and CT-AC to be the splice signals AT THE SAME TIME. This will
be faster but less accurate. I don't think anyone would like to run minimap2 in
this mode, so I am removing it for clarity.
2017-09-20 11:11:53 -04:00
Heng Li 7a9b4db874 replaced --approx-ext with --sr
--sr disables Z-drop and may come with other heurstics
2017-09-20 10:51:18 -04:00
Heng Li b99c22840f r414: avoid assertion failure for 0-length reads 2017-09-19 22:21:27 -04:00
Heng Li 11081c6c27 r411: refactored kalloc for clarity
The new version is closer to K&R's original implementation.
2017-09-18 19:49:15 -04:00
Heng Li ea5a0cd17d Release minimap2-2.2 (r409) 2017-09-17 20:08:47 -04:00
Heng Li e9c57f6d8b r402: exposed kseq (for API in mappy later) 2017-09-17 13:09:16 -04:00
Heng Li c07f9f9a49 r372: default mm_verbose to 1, and change in main 2017-09-16 09:14:34 -04:00
Heng Li 14b853499f r369: updated example with the latest API 2017-09-14 22:44:10 -04:00
Heng Li 75ff7ceec5 r368: API documentation 2017-09-14 22:23:04 -04:00
Heng Li e2823d4aee r367: index reader optionally writes index 2017-09-14 21:18:13 -04:00
Heng Li eb00521d9b redesigned indexing and option APIs 2017-09-14 17:02:01 -04:00
Heng Li 0f7455cefa r365: documented the "sr" preset 2017-09-14 12:57:21 -04:00
Heng Li 4d3768bf26 r364: improved the mapq heuristics
* use repetitive seed lengths, not counts
* compute n_sub to higher accuracy
* use bwa-mem mapq heuristic as a backup

For short single-end reads, minimap2's ROC is not as good as bwa-mem's, but is
close.
2017-09-14 12:37:03 -04:00
Heng Li 47e9d76ca1 further mapq tuning 2017-09-14 10:46:14 -04:00
Heng Li f4a8766283 r362: fixed overestimated chaining score
Caused by ilog2_32(0)=-1. This bug was fixed once and reoccurred as I was
tuning the score function but forgot to apply the fix.
2017-09-14 10:15:22 -04:00
Heng Li 6a82a21dee r361: improved mapq for short reads 2017-09-13 15:32:39 -04:00
Heng Li 3c91d652dd r360: allow to set integer max occ 2017-09-13 11:37:00 -04:00
Heng Li d7f2ac1d4f better parameters for short reads
It turns out the key problem is not the minimizer density. It is the max
occurrence that tends to affect results more, especially sensitivity. There is
still lots of work to do, but for now, it seems a good start.
2017-09-12 16:11:23 -04:00
Heng Li eea9e851d8 Merge branch 'dev' into short 2017-09-11 09:32:28 -04:00
Heng Li c7c3585531 r347: merged mm_map_frag() into mm_map()
mm_map_frag() was separated due to an earlier design that has been rejected.
2017-09-10 15:02:55 -04:00
Heng Li 87a278d06a Merge branch 'dev' into short 2017-09-09 08:49:58 -04:00
Heng Li f422175e4e r344: avoid unnecessary refName retrieval 2017-09-08 22:44:14 -04:00
Heng Li 709b6ec1f1 increase seed occurrences 2017-09-08 22:42:39 -04:00
Heng Li 0031158936 Merge branch 'master' into short 2017-09-07 11:41:32 -04:00
Heng Li ef3f7ea2f2 Release minimap2-2.1.1 (r341) 2017-09-06 13:46:51 -04:00