Commit Graph

57 Commits (70428ca3a8a1d643a8fe1d5b651b86ea1fec1fa9)

Author SHA1 Message Date
Heng Li 8fc5f8dc90 r711: assign proper mapq to primary inversions 2018-02-15 14:34:59 -05:00
Heng Li 1372977a37 r708: implemented double Z-drop thresholds (#112)
When aligning long reads, we would prefer to align through low-quality
regions. This requires a large Z-drop threshold. However, to find small
inversions, we need to use a small Z-drop. This commit address this
conflict with two Z-drop thresholds. When Z-drop exceeds the smaller
threshold, we perform a local alignment to check if there is a potential
inversion. If there is one, we break the alignment; otherwise we break
the alignment only if Z-drop excess the larger threshold.

This commit also fixes a bug that reported wrong coordinates when the
inversion is on the forward strand (#112).
2018-02-15 10:50:49 -05:00
Heng Li 7ef5490884 r703: added --max-clip-ratio
still testing the option
2018-02-12 13:29:18 -05:00
Heng Li 46d6349af4 r670: added PE support to mappy
and minor code cleanup
2018-01-31 11:33:08 -05:00
Heng Li 98a999fe44 r611: added pseudocount when est divergence 2017-12-08 12:57:57 -05:00
Heng Li 984f7846c0 r601: bugfix - a similar issue to r600
This bug unsets the alignment score of suboptimal alignments.
2017-11-30 11:51:34 -05:00
Heng Li af1d6afba9 r600: bugfix - missing secondary alignments (#71)
This should very rarely happen to typical data, but has a higher chance in
artifactual data.
2017-11-30 11:34:10 -05:00
Heng Li b24d68ae9f r557: fixed another mapq underestimate
When a chain is split during base-level alignment, its chaining score is
reduced. However, the chaining score of its suboptimal chain remains the same.
This leads to underestimated mapping quality.
2017-11-07 23:20:49 -05:00
Heng Li 65deedfa96 r556: bugfix - underestimate mapq for split aln 2017-11-07 22:37:12 -05:00
Heng Li cd24dc8834 r545: removed option -i, not working well 2017-10-31 22:23:27 -04:00
Heng Li 311fa90030 r543: applied some sr mapq changes to long reads 2017-10-31 15:24:05 -04:00
Heng Li fb8a1b5536 r542: tuning mapQ calculation 2017-10-31 14:25:09 -04:00
Heng Li bd04372873 r524: reverted to bwa-mem end bonus
and reduced the cost of clipping when filtering by identity
2017-10-20 16:57:31 -04:00
Heng Li addb61bcb2 r515: more conservative hit exclusion
When a hit covers a long query subsequence that has not been covered by better
primary hits, this hit is more likely to become a new primary hit.
2017-10-16 13:58:01 -04:00
Heng Li adf6cd7f52 r513: merged pre- and post-cigar blen and mlen
This saves a bit memory and is cleaner.
2017-10-16 10:55:18 -04:00
Heng Li e6f525edaf r512: option to filter poorly aligned reads 2017-10-16 10:38:22 -04:00
Heng Li ce06188203 r506: fixed a memory leak 2017-10-12 10:12:22 -04:00
Heng Li 13b66aad4d r495: fix impropriate CIGAR
1. Not left aligned
2. In one case, 50M24D50M becomes 24D100M. The leading D needs to be removed.
3. Avoid identical hits after DP
2017-10-10 11:59:44 -04:00
Heng Li 2a1e738a94 r461: randomize repetitive hits 2017-10-04 13:05:18 -04:00
Heng Li 2a554a92e9 r451: changed rep_len mapq heuristic 2017-09-28 14:23:14 -04:00
Heng Li 935a6e6064 r450: differentiate exact repeats via mapq 2017-09-27 23:51:05 -04:00
Heng Li f611edf6f2 r443: don't filter small cm for split seg 2017-09-26 16:17:58 -04:00
Heng Li 55d1e4f638 r440: better chain filtering for PE reads 2017-09-26 11:03:36 -04:00
Heng Li 9943e5fdd0 backup 2017-09-20 14:35:46 -04:00
Heng Li 03d6894517 backup 2017-09-20 11:47:46 -04:00
Heng Li 11081c6c27 r411: refactored kalloc for clarity
The new version is closer to K&R's original implementation.
2017-09-18 19:49:15 -04:00
Heng Li 4d3768bf26 r364: improved the mapq heuristics
* use repetitive seed lengths, not counts
* compute n_sub to higher accuracy
* use bwa-mem mapq heuristic as a backup

For short single-end reads, minimap2's ROC is not as good as bwa-mem's, but is
close.
2017-09-14 12:37:03 -04:00
Heng Li 47e9d76ca1 further mapq tuning 2017-09-14 10:46:14 -04:00
Heng Li 6a82a21dee r361: improved mapq for short reads 2017-09-13 15:32:39 -04:00
Heng Li 19d6ec885e r224: inversion alignment around Z-drop break 2017-07-29 13:09:10 -04:00
Heng Li 254280b8af r216: a bit cleanup; identical output to r215 2017-07-28 11:54:18 -04:00
Heng Li a01d758af6 r206: mapq penalize short chains further
The old code penalized at the log() scale. Now added a linear-scaled factor. If
the chain consists of few minimizers, its quality is really not good.
2017-07-26 11:50:04 -04:00
Heng Li e9dc1ce2b6 r205: when computing mapq, consider min_chain_sc
Not doing this was a mistake.
2017-07-26 11:34:14 -04:00
Heng Li 00c6db5073 r203: check more subopt aln if score small 2017-07-25 20:02:44 -04:00
Heng Li 38aa66fa30 r178: fixed integer overflow in mapq calculation 2017-07-16 21:45:39 -04:00
Heng Li b4280d186f r176: removed seedcov_ratio; changed default opt
min_seedcov_ratio is not used
2017-07-12 12:47:46 -04:00
Heng Li 801bc84b01 r169: output more accurate col. 10&11 to PAF
In r168, col.10 is smaller than what it should be. This confuses miniasm.
2017-07-11 14:09:51 -04:00
Heng Li 782449975d r168: fixed a bug in long join: a[] not sorted
Also added length requirement for long join and changed -g in the ava mode
2017-07-09 12:14:20 -04:00
Heng Li 1ac48556ae r167: long join threshold depends on gap
also caught a bug for reverse strand join
2017-07-09 10:38:51 -04:00
Heng Li 38b2830e18 r161: filter bad seeds; changed default -g/-r 2017-07-08 13:31:27 -04:00
Heng Li e07daad7ad r153: sam primary record not set sometimes 2017-07-03 13:18:57 -04:00
Heng Li b625247300 r150: mm_sync_regs() doesn't work with negative id 2017-07-03 11:36:34 -04:00
Heng Li 2e4fd9f1d0 r148: revamped regs handling after cigar 2017-07-03 10:44:26 -04:00
Heng Li 696ebce66e backup; still buggy 2017-07-03 00:52:00 -04:00
Heng Li e06c342659 r146: in filtering, drop children if parent out
This has been causing several segfaults.
2017-07-03 00:28:12 -04:00
Heng Li 632b8638d2 r144: adjust primary aln after cigar 2017-07-02 22:43:02 -04:00
Heng Li 2b45ba7a0b r143: fixed a segfault and incorrect .parent 2017-07-02 19:56:21 -04:00
Heng Li 74d306a596 fixed bug when retaining 2ndary aln; still buggy 2017-07-02 19:08:30 -04:00
Heng Li 426c2975f6 r126: filter by fraction of seed coverage
otherwise we may get too many poor overlap mappings.
2017-06-30 22:15:45 -04:00
Heng Li 3a5486325a r123: fixed a mem leak; more presets 2017-06-30 15:39:05 -04:00