Commit Graph

307 Commits (43960a8ca7fee69d7bbe1f35e8da2df36c0100ab)

Author SHA1 Message Date
Heng Li 858213d513 r511: fixed wrong primary sam record 2017-10-12 23:02:18 -04:00
Heng Li dea3b60918 r510: fixed an off-by-1 bug for unmapped mate 2017-10-12 17:31:13 -04:00
Heng Li 7c555f9b7e r508: use two I/O threads for mapping
-x sr applies this option by default
2017-10-12 14:56:01 -04:00
Heng Li 2801ed9b4b r507: -K not working as is intended (#36) 2017-10-12 14:16:05 -04:00
Heng Li ce06188203 r506: fixed a memory leak 2017-10-12 10:12:22 -04:00
Heng Li 9862a75cd3 r505: a bit code simplification 2017-10-11 21:54:32 -04:00
Heng Li 3073f4a758 r504: better heuristics to reduce excessive ext 2017-10-11 21:42:11 -04:00
Heng Li 9364bc64d7 r501: added end_bonus to extz2 2017-10-11 09:39:41 -04:00
Heng Li 65abdb8f3c r500: temporarily disabled region trunc
because it is causing other problems.
2017-10-11 00:16:04 -04:00
Heng Li 7345621759 r499: end bonus working; DP region needs improve! 2017-10-11 00:14:25 -04:00
Heng Li ca632f907b r498: fixed a bug when merging like "4I5I" 2017-10-10 21:22:37 -04:00
Heng Li 6c78a980b6 r497: the previous change not working at the ends 2017-10-10 17:32:28 -04:00
Heng Li c217eecdb7 r496: avoid DP extending into another chain
When deciding the region for DP, exclude regions in the adjacent chain
2017-10-10 17:25:12 -04:00
Heng Li 13b66aad4d r495: fix impropriate CIGAR
1. Not left aligned
2. In one case, 50M24D50M becomes 24D100M. The leading D needs to be removed.
3. Avoid identical hits after DP
2017-10-10 11:59:44 -04:00
Heng Li 46fa520db9 r494: simpler and better SR gap filling
Still one thing to do: left alignment
2017-10-09 22:02:30 -04:00
Heng Li 1e53610fb4 r493: reduced calling extd2 for ungapped aln
Still need to improve in case of 3I5M3D
2017-10-09 21:13:34 -04:00
Heng Li 9396d9e11b r452: typo in the last commit 2017-10-09 10:05:32 -04:00
Heng Li 198849a716 r491: an ambiguous base costs the same as gap ext 2017-10-09 09:59:42 -04:00
Heng Li 9fea4d16b3 r490: improved short-read extension heuristic
Now we find the best scoring ungapped seeded segment and then extend from it.
There is no gap filling for short reads.
2017-10-08 21:36:34 -04:00
Heng Li f9415628a8 r489: don't use approximate zdrop
it doesn't work well
2017-10-08 19:29:09 -04:00
Heng Li 61e56c941d r488: parameter to control max fragment length 2017-10-07 23:54:32 -04:00
Heng Li f150257a0d r487: demote "map10k"; improved README 2017-10-07 19:19:40 -04:00
Heng Li bf2d4f7aec r486: treat "U" as "T" for RNA reads (#33) 2017-10-07 18:53:25 -04:00
Heng Li c6384ed2c8 r482: increased short-read bandwidth to 100
This has very minor effect on speed.
2017-10-06 10:20:32 -04:00
Heng Li e0baf1ad54 r479: a bit code cleanup 2017-10-05 16:15:14 -04:00
Heng Li f266092699 r478: simplied useless code, a tiny bit 2017-10-05 15:56:00 -04:00
Heng Li 9c5767f9ed r477: renamed multi_seg to frag_mode 2017-10-05 15:48:17 -04:00
Heng Li ae2adf04d4 r476: multi-file fragment mode working 2017-10-05 15:39:26 -04:00
Heng Li b839758335 r475: added --cs=none; updated manpage 2017-10-05 15:27:37 -04:00
Heng Li f4a5d3a692 r474: replaced -S and --cs-no-equal with --cs 2017-10-05 15:03:03 -04:00
Heng Li 3ff6eda3a4 r473: don't count introns into blen 2017-10-05 14:37:21 -04:00
Heng Li 1a90bc8603 r472: fixed a bug when printing MAPQ/CIGAR 2017-10-05 12:46:11 -04:00
Heng Li abf2a90363 r471: all SAM features implemented; more tests! 2017-10-05 12:37:30 -04:00
Heng Li 7cc4f6f965 r469: first step towards PE SAM 2017-10-05 10:38:09 -04:00
Heng Li 16e6e589a8 r468: replaced ^ with ~ in cs 2017-10-04 22:17:12 -04:00
Heng Li 9aba11769c r467: added : (equal length) and ^ (intron) ops 2017-10-04 21:55:37 -04:00
Heng Li 7d50e646dd r466: detect multi-part index more smartly
though it might not work in an extremely rare case: the end of a sequence ends
at X*16384 and it is the last sequence in a batch. This can be resolved by
never letting the kstream_t buffer empty.
2017-10-04 17:32:58 -04:00
Heng Li 1554149158 r465: apply option -x before other options 2017-10-04 13:52:28 -04:00
Heng Li 19c39e704f r464: fixed a bug in pairing, due to randomization 2017-10-04 13:37:40 -04:00
Heng Li 2581c44a21 r463: optionally disable secondary hits 2017-10-04 13:24:41 -04:00
Heng Li 5babf41a38 r462: SAM primary flag not properly set 2017-10-04 13:11:29 -04:00
Heng Li 2a1e738a94 r461: randomize repetitive hits 2017-10-04 13:05:18 -04:00
Heng Li cf55c84056 r460: added option --no-long-join 2017-10-04 12:08:44 -04:00
Heng Li 841763ec24 Merge branch 'master' into sr 2017-10-04 11:42:44 -04:00
Heng Li 95eb1dec36 r458: fixed wrong chr for inversion aln (#30) 2017-10-04 11:32:06 -04:00
Heng Li 0fd0f2aed1 r457: fixed a bug on parsing -f 2017-09-30 00:00:44 -04:00
Heng Li ee9b2773a8 r456: min chain score should >k-mer length
or chain_dp() wastes time on unnecessarily sorting chains with one k-mer.
2017-09-29 22:33:55 -04:00
Heng Li 340483821e r455: set max_occ on command line 2017-09-29 22:18:43 -04:00
Heng Li 04fb2c2ec0 r454: rechain with higher max_occ if no good chain 2017-09-29 19:24:32 -04:00
Heng Li 0d4ecd19ee r453: avoid duplicated strcmp() for ava 2017-09-28 15:52:05 -04:00
Heng Li 0c63325985 r452: fixed - -G not working with -x sr 2017-09-28 14:28:12 -04:00
Heng Li 2a554a92e9 r451: changed rep_len mapq heuristic 2017-09-28 14:23:14 -04:00
Heng Li 935a6e6064 r450: differentiate exact repeats via mapq 2017-09-27 23:51:05 -04:00
Heng Li 8301222174 r448: fixed a bug when computing PE quality 2017-09-27 21:54:07 -04:00
Heng Li 7e0d70bfd3 r445: pair coordinate adjustment working
Next: mapq adjustment, which will be tricky...
2017-09-27 15:38:18 -04:00
Heng Li a349d85280 r444: changed the way orientation is specified
The old model doesn't work with RF or RR orientation. The new model only works
with paired-end reads. For >2 segments, only FF is supported.
2017-09-27 12:33:10 -04:00
Heng Li f611edf6f2 r443: don't filter small cm for split seg 2017-09-26 16:17:58 -04:00
Heng Li 1b1dd0cd57 r442: default max_gap to 200 in the sr mode 2017-09-26 13:31:01 -04:00
Heng Li 55d1e4f638 r440: better chain filtering for PE reads 2017-09-26 11:03:36 -04:00
Heng Li 64c0ad6b35 r439: use splice-like chain gap cost between segs
This improves accuracy
2017-09-25 16:04:38 -04:00
Heng Li 9538c985aa r438: fixed a rare case that leads to missing hits
It is a bug in chaining.
2017-09-25 14:59:34 -04:00
Heng Li 8f25cfa36e r437: fixed uninialized memory on rep_len 2017-09-25 14:22:45 -04:00
Heng Li 81008dd371 r436: working on short reads
The result is mixed - lots of room for tuning
2017-09-25 14:06:29 -04:00
Heng Li 3bb66e1ed3 multi-seg working on toy examples 2017-09-25 13:42:04 -04:00
Heng Li 5b39a1b34b Merge branch 'master' into sr 2017-09-20 12:24:08 -04:00
Heng Li e3b5802b2e r424: reduce memory for long query seqs 2017-09-20 12:22:13 -04:00
Heng Li 645db3350e Merge branch 'master' into sr 2017-09-20 11:15:14 -04:00
Heng Li 75e6bbc9f6 r421: removed the MM_F_SPLICE_BOTH mode
In the default splice mode, minimap2 applies two rounds of spliced alignment:
first assuming GT-AG to be the splice signal across all splicing sites and then
assuming CT-AC to be the signal. This is the idea strategy.

In the MM_F_SPLICE_BOTH mode, minimap2 applies one round of spliced alignment,
assuming GT-AG and CT-AC to be the splice signals AT THE SAME TIME. This will
be faster but less accurate. I don't think anyone would like to run minimap2 in
this mode, so I am removing it for clarity.
2017-09-20 11:11:53 -04:00
Heng Li 7a9b4db874 replaced --approx-ext with --sr
--sr disables Z-drop and may come with other heurstics
2017-09-20 10:51:18 -04:00
Heng Li b99c22840f r414: avoid assertion failure for 0-length reads 2017-09-19 22:21:27 -04:00
Heng Li 11081c6c27 r411: refactored kalloc for clarity
The new version is closer to K&R's original implementation.
2017-09-18 19:49:15 -04:00
Heng Li ea5a0cd17d Release minimap2-2.2 (r409) 2017-09-17 20:08:47 -04:00
Heng Li e9c57f6d8b r402: exposed kseq (for API in mappy later) 2017-09-17 13:09:16 -04:00
Heng Li c07f9f9a49 r372: default mm_verbose to 1, and change in main 2017-09-16 09:14:34 -04:00
Heng Li 14b853499f r369: updated example with the latest API 2017-09-14 22:44:10 -04:00
Heng Li 75ff7ceec5 r368: API documentation 2017-09-14 22:23:04 -04:00
Heng Li e2823d4aee r367: index reader optionally writes index 2017-09-14 21:18:13 -04:00
Heng Li eb00521d9b redesigned indexing and option APIs 2017-09-14 17:02:01 -04:00
Heng Li 0f7455cefa r365: documented the "sr" preset 2017-09-14 12:57:21 -04:00
Heng Li 4d3768bf26 r364: improved the mapq heuristics
* use repetitive seed lengths, not counts
* compute n_sub to higher accuracy
* use bwa-mem mapq heuristic as a backup

For short single-end reads, minimap2's ROC is not as good as bwa-mem's, but is
close.
2017-09-14 12:37:03 -04:00
Heng Li 47e9d76ca1 further mapq tuning 2017-09-14 10:46:14 -04:00
Heng Li f4a8766283 r362: fixed overestimated chaining score
Caused by ilog2_32(0)=-1. This bug was fixed once and reoccurred as I was
tuning the score function but forgot to apply the fix.
2017-09-14 10:15:22 -04:00
Heng Li 6a82a21dee r361: improved mapq for short reads 2017-09-13 15:32:39 -04:00
Heng Li 3c91d652dd r360: allow to set integer max occ 2017-09-13 11:37:00 -04:00
Heng Li d7f2ac1d4f better parameters for short reads
It turns out the key problem is not the minimizer density. It is the max
occurrence that tends to affect results more, especially sensitivity. There is
still lots of work to do, but for now, it seems a good start.
2017-09-12 16:11:23 -04:00
Heng Li eea9e851d8 Merge branch 'dev' into short 2017-09-11 09:32:28 -04:00
Heng Li c7c3585531 r347: merged mm_map_frag() into mm_map()
mm_map_frag() was separated due to an earlier design that has been rejected.
2017-09-10 15:02:55 -04:00
Heng Li 87a278d06a Merge branch 'dev' into short 2017-09-09 08:49:58 -04:00
Heng Li f422175e4e r344: avoid unnecessary refName retrieval 2017-09-08 22:44:14 -04:00
Heng Li 709b6ec1f1 increase seed occurrences 2017-09-08 22:42:39 -04:00
Heng Li 0031158936 Merge branch 'master' into short 2017-09-07 11:41:32 -04:00
Heng Li ef3f7ea2f2 Release minimap2-2.1.1 (r341) 2017-09-06 13:46:51 -04:00
Heng Li 8b9f2aaf04 r339: improved SIMD detection
old code does not check AVX2
2017-09-05 13:10:30 -04:00
Heng Li 46e8b6a4f9 r338: portable CPU dispatch, which is the default
working with gcc, icc, clang and msvc.
2017-09-03 20:29:24 -04:00
Heng Li 3c997ca016 r337: support CPU dispatch for gcc-4.8+
using __builtin_cpu_supports()
2017-09-03 14:29:49 -04:00
Heng Li f9ccc522cd Merge branch 'master' into short 2017-09-03 11:58:15 -04:00
Heng Li 101b8bb97d r335: report an error if query can't be opened 2017-09-03 11:54:38 -04:00
Heng Li 0a3ebdc916 for better windows compatibility 2017-09-02 17:52:33 -04:00
Heng Li 743d26eab0 Merge pull request #20 from nanoporetech/msvc14
ONT source code changes to compile with MSVC 14
2017-09-02 14:35:02 -07:00
Heng Li 62535ecd7f Merge branch 'dev' 2017-09-01 10:06:21 -07:00