Commit Graph

283 Commits (2191ac58ad6cb147e5b67652525bc5e2ec84bf7a)

Author SHA1 Message Date
Heng Li fa5a645ca5 r552: fixed a tiny typo on struct packing
The old packing wastes memory, thought very small.
2017-11-05 08:27:26 -05:00
Heng Li a3f0aa1d5b r550: fixed -L issues with secondary and supp aln 2017-11-04 12:13:38 -04:00
Heng Li 22290db3e4 r546: minor mapQ tuning 2017-11-01 13:20:39 -04:00
Heng Li cd24dc8834 r545: removed option -i, not working well 2017-10-31 22:23:27 -04:00
Heng Li b8e758df0f r544: increased PE mapQ 2017-10-31 16:55:02 -04:00
Heng Li 311fa90030 r543: applied some sr mapq changes to long reads 2017-10-31 15:24:05 -04:00
Heng Li fb8a1b5536 r542: tuning mapQ calculation 2017-10-31 14:25:09 -04:00
Heng Li 285eb0da05 r540: removed a buggy debugging line 2017-10-29 00:02:41 -04:00
Heng Li 192217a10c r539: use --splice-flank=yes by default
In human/mouse, the GTr..yAG pattern occurs to 91/92% of all GT-AG introns.
Modeling r..y clearly leads to higher accuracy. However, in SIRV, this
percentage is reduced to ~60%. The default "--splice --splice-flank=yes"
leads to lower accuracy. If someone benchmark minimap2 on SIRV, this would be
bad, but minimap2 is developed for practical applications, not for benchmarks.
I will live with that.
2017-10-28 22:29:55 -04:00
Heng Li f22a94e868 r538: fixed a long existing bug in HPC k-mer (#47)
This bug may lead to a wrong minimizer when a HPC k-mer is longer than 256bp.
When there is a seed match involving this wrong HPC k-mer, the correct seed
sequences do not match in fact. This violates the assumption in align.c and
subsequently causes a segfault, which is what #47 has caught. This bug lurked
in the earliest piece of code and affected all released minimap2 versions so
far. It is extremely rare and does not affect the prebuilt GRCh37/38 indices.
2017-10-28 19:21:10 -04:00
Heng Li 79b0caca95 r537: model the next base to GT/AG
[PMID:18688272] shows that the base following GT tends to be A or G (i.e. R) in
both human and yeast, and that the base preceeding AG tends to be C or T (i.e.
Y). In the new model, we pay no cost to GTr..yAG, but we pay half of the cost
if there is no r or y. This improves the junction accuracy when mapping to
human and mouse and decreases the accuacy when mapping to SIRV. My guess is
that SIRV does not honor this trend. Need to investigate in future.

Also in this commit, --cost-non-gt-ag is aliased to -C. The default is changed
to 9 instead of 5. I also added --splice-flank to enable the above model. This
may become the default once I confirm my hypothesis on SIRV.
2017-10-28 00:25:01 -04:00
Heng Li afc2f2e84b r536: removed an unnecessary assert() 2017-10-24 21:08:54 -04:00
Heng Li d4b5dfc297 r533: added --no-pairing
to prevent the use of any pairing information for paired-end reads.
2017-10-23 14:09:32 -04:00
Heng Li 306e4541f8 Released minimap2-2.3 (r531) 2017-10-22 23:13:35 -04:00
Heng Li beeb806829 r526: fixed a bug when HPC is in use
It happened when the query HPC minimizer is longer than the reference HPC
minimizer close to the beginning of a contig. We may get a negative coordinate,
which causes an assertion failure.
2017-10-21 19:54:04 -04:00
Heng Li be7f3c4ffe r525: fixed a bug in chaining; handle ovlp ends 2017-10-20 21:34:52 -04:00
Heng Li bd04372873 r524: reverted to bwa-mem end bonus
and reduced the cost of clipping when filtering by identity
2017-10-20 16:57:31 -04:00
Heng Li 15ed0712c2 r523: fixed a performance bug in ksw2_ll
Wont' affect accuracy.
2017-10-20 13:00:10 -04:00
Heng Li 4683da2455 r520: added option -L to write long cigar to CG 2017-10-17 17:32:44 -04:00
Heng Li ffd953029f r519: fixed a severe bug that misses long alns 2017-10-17 15:52:36 -04:00
Heng Li 04cf4ebf5e r518: increased the default -K to 500M
This helps multi-thread performance for ultra-long reads.
2017-10-17 13:21:29 -04:00
Heng Li 25ffd72690 r517: replaced --print-2nd with --secondary 2017-10-17 11:41:56 -04:00
Heng Li aa2d9d4e1b r516: throw a warning if -N0 is used 2017-10-16 14:55:35 -04:00
Heng Li addb61bcb2 r515: more conservative hit exclusion
When a hit covers a long query subsequence that has not been covered by better
primary hits, this hit is more likely to become a new primary hit.
2017-10-16 13:58:01 -04:00
Heng Li adf6cd7f52 r513: merged pre- and post-cigar blen and mlen
This saves a bit memory and is cleaner.
2017-10-16 10:55:18 -04:00
Heng Li e6f525edaf r512: option to filter poorly aligned reads 2017-10-16 10:38:22 -04:00
Heng Li 858213d513 r511: fixed wrong primary sam record 2017-10-12 23:02:18 -04:00
Heng Li dea3b60918 r510: fixed an off-by-1 bug for unmapped mate 2017-10-12 17:31:13 -04:00
Heng Li 7c555f9b7e r508: use two I/O threads for mapping
-x sr applies this option by default
2017-10-12 14:56:01 -04:00
Heng Li 2801ed9b4b r507: -K not working as is intended (#36) 2017-10-12 14:16:05 -04:00
Heng Li ce06188203 r506: fixed a memory leak 2017-10-12 10:12:22 -04:00
Heng Li 9862a75cd3 r505: a bit code simplification 2017-10-11 21:54:32 -04:00
Heng Li 3073f4a758 r504: better heuristics to reduce excessive ext 2017-10-11 21:42:11 -04:00
Heng Li 9364bc64d7 r501: added end_bonus to extz2 2017-10-11 09:39:41 -04:00
Heng Li 65abdb8f3c r500: temporarily disabled region trunc
because it is causing other problems.
2017-10-11 00:16:04 -04:00
Heng Li 7345621759 r499: end bonus working; DP region needs improve! 2017-10-11 00:14:25 -04:00
Heng Li ca632f907b r498: fixed a bug when merging like "4I5I" 2017-10-10 21:22:37 -04:00
Heng Li 6c78a980b6 r497: the previous change not working at the ends 2017-10-10 17:32:28 -04:00
Heng Li c217eecdb7 r496: avoid DP extending into another chain
When deciding the region for DP, exclude regions in the adjacent chain
2017-10-10 17:25:12 -04:00
Heng Li 13b66aad4d r495: fix impropriate CIGAR
1. Not left aligned
2. In one case, 50M24D50M becomes 24D100M. The leading D needs to be removed.
3. Avoid identical hits after DP
2017-10-10 11:59:44 -04:00
Heng Li 46fa520db9 r494: simpler and better SR gap filling
Still one thing to do: left alignment
2017-10-09 22:02:30 -04:00
Heng Li 1e53610fb4 r493: reduced calling extd2 for ungapped aln
Still need to improve in case of 3I5M3D
2017-10-09 21:13:34 -04:00
Heng Li 9396d9e11b r452: typo in the last commit 2017-10-09 10:05:32 -04:00
Heng Li 198849a716 r491: an ambiguous base costs the same as gap ext 2017-10-09 09:59:42 -04:00
Heng Li 9fea4d16b3 r490: improved short-read extension heuristic
Now we find the best scoring ungapped seeded segment and then extend from it.
There is no gap filling for short reads.
2017-10-08 21:36:34 -04:00
Heng Li f9415628a8 r489: don't use approximate zdrop
it doesn't work well
2017-10-08 19:29:09 -04:00
Heng Li 61e56c941d r488: parameter to control max fragment length 2017-10-07 23:54:32 -04:00
Heng Li f150257a0d r487: demote "map10k"; improved README 2017-10-07 19:19:40 -04:00
Heng Li bf2d4f7aec r486: treat "U" as "T" for RNA reads (#33) 2017-10-07 18:53:25 -04:00
Heng Li c6384ed2c8 r482: increased short-read bandwidth to 100
This has very minor effect on speed.
2017-10-06 10:20:32 -04:00