Commit Graph

123 Commits (d135feb1a5b14612a22818a971a8cdc728df9b04)

Author SHA1 Message Date
Heng Li 83c57a9d98 r719: fixed bad memory access 2018-02-23 17:27:41 -05:00
Heng Li a0d62519c1 r710: fixed incorrect inversion coordinate (#112) 2018-02-15 14:23:42 -05:00
Heng Li 1372977a37 r708: implemented double Z-drop thresholds (#112)
When aligning long reads, we would prefer to align through low-quality
regions. This requires a large Z-drop threshold. However, to find small
inversions, we need to use a small Z-drop. This commit address this
conflict with two Z-drop thresholds. When Z-drop exceeds the smaller
threshold, we perform a local alignment to check if there is a potential
inversion. If there is one, we break the alignment; otherwise we break
the alignment only if Z-drop excess the larger threshold.

This commit also fixes a bug that reported wrong coordinates when the
inversion is on the forward strand (#112).
2018-02-15 10:50:49 -05:00
Heng Li c0e0d5d84b r707: bugfix for inversions on rev strand (#112) 2018-02-14 14:09:03 -05:00
Heng Li 7ef5490884 r703: added --max-clip-ratio
still testing the option
2018-02-12 13:29:18 -05:00
Heng Li a8d476c6ad r686: end seed trimming don't go over long join 2018-02-06 11:31:32 -05:00
Heng Li 29b4a1786c r685: tune end seed filter again 2018-02-05 11:48:22 -05:00
Heng Li dbf284b2d9 r684: separate end score from min_chain_score 2018-02-05 11:40:38 -05:00
Heng Li 35d3e064bf r677: reduce the change of missing hits
that are close to end of alignments. It is still possible to create examples
that fail the heuristic.
2018-02-02 10:35:33 -05:00
Heng Li 12a5a5fa3c r669: improved self chain extension (#10)
This has not fully resolved #10, only alleviated the issue.
2018-01-30 20:05:02 -05:00
Heng Li 33f8157961 r655: options to map to one strand of the ref #91 2018-01-16 10:34:30 -05:00
Heng Li f5cfd439ee r651: incorrectly treat introns as deletions
This happened when the last operation during backtracking is an intron.
2018-01-07 19:42:50 -05:00
Heng Li 98a6e52c06 r618: heuristics to avoid tiny terminal exons 2017-12-11 00:57:55 -05:00
Heng Li 824712a4ee r617: removed some unused code 2017-12-10 17:54:50 -05:00
Heng Li 0e42628ef6 r611: document --idx-no-seq; better inv aln 2017-12-08 13:16:18 -05:00
Heng Li 2f463b1db0 r573: prepare to generalize index 2017-11-11 19:54:06 -05:00
Heng Li d7a31e40e6 r569: last commit is buggy 2017-11-09 23:20:41 -05:00
Heng Li dd18cd75de r568: revert - don't take max(dp_max, dp_score) 2017-11-09 23:12:48 -05:00
Heng Li a7b38f6900 r562: fixed a severe bug: wrong query start 2017-11-08 22:31:05 -05:00
Heng Li 98ba8928c6 r558: dp_max no less than dp_score 2017-11-08 10:06:10 -05:00
Heng Li cd24dc8834 r545: removed option -i, not working well 2017-10-31 22:23:27 -04:00
Heng Li 79b0caca95 r537: model the next base to GT/AG
[PMID:18688272] shows that the base following GT tends to be A or G (i.e. R) in
both human and yeast, and that the base preceeding AG tends to be C or T (i.e.
Y). In the new model, we pay no cost to GTr..yAG, but we pay half of the cost
if there is no r or y. This improves the junction accuracy when mapping to
human and mouse and decreases the accuacy when mapping to SIRV. My guess is
that SIRV does not honor this trend. Need to investigate in future.

Also in this commit, --cost-non-gt-ag is aliased to -C. The default is changed
to 9 instead of 5. I also added --splice-flank to enable the above model. This
may become the default once I confirm my hypothesis on SIRV.
2017-10-28 00:25:01 -04:00
Heng Li beeb806829 r526: fixed a bug when HPC is in use
It happened when the query HPC minimizer is longer than the reference HPC
minimizer close to the beginning of a contig. We may get a negative coordinate,
which causes an assertion failure.
2017-10-21 19:54:04 -04:00
Heng Li ffd953029f r519: fixed a severe bug that misses long alns 2017-10-17 15:52:36 -04:00
Heng Li adf6cd7f52 r513: merged pre- and post-cigar blen and mlen
This saves a bit memory and is cleaner.
2017-10-16 10:55:18 -04:00
Heng Li e6f525edaf r512: option to filter poorly aligned reads 2017-10-16 10:38:22 -04:00
Heng Li 9862a75cd3 r505: a bit code simplification 2017-10-11 21:54:32 -04:00
Heng Li 3073f4a758 r504: better heuristics to reduce excessive ext 2017-10-11 21:42:11 -04:00
Heng Li 9364bc64d7 r501: added end_bonus to extz2 2017-10-11 09:39:41 -04:00
Heng Li 65abdb8f3c r500: temporarily disabled region trunc
because it is causing other problems.
2017-10-11 00:16:04 -04:00
Heng Li 7345621759 r499: end bonus working; DP region needs improve! 2017-10-11 00:14:25 -04:00
Heng Li ca632f907b r498: fixed a bug when merging like "4I5I" 2017-10-10 21:22:37 -04:00
Heng Li 6c78a980b6 r497: the previous change not working at the ends 2017-10-10 17:32:28 -04:00
Heng Li c217eecdb7 r496: avoid DP extending into another chain
When deciding the region for DP, exclude regions in the adjacent chain
2017-10-10 17:25:12 -04:00
Heng Li 13b66aad4d r495: fix impropriate CIGAR
1. Not left aligned
2. In one case, 50M24D50M becomes 24D100M. The leading D needs to be removed.
3. Avoid identical hits after DP
2017-10-10 11:59:44 -04:00
Heng Li 46fa520db9 r494: simpler and better SR gap filling
Still one thing to do: left alignment
2017-10-09 22:02:30 -04:00
Heng Li 1e53610fb4 r493: reduced calling extd2 for ungapped aln
Still need to improve in case of 3I5M3D
2017-10-09 21:13:34 -04:00
Heng Li 9fea4d16b3 r490: improved short-read extension heuristic
Now we find the best scoring ungapped seeded segment and then extend from it.
There is no gap filling for short reads.
2017-10-08 21:36:34 -04:00
Heng Li f9415628a8 r489: don't use approximate zdrop
it doesn't work well
2017-10-08 19:29:09 -04:00
Heng Li e0baf1ad54 r479: a bit code cleanup 2017-10-05 16:15:14 -04:00
Heng Li 3ff6eda3a4 r473: don't count introns into blen 2017-10-05 14:37:21 -04:00
Heng Li 841763ec24 Merge branch 'master' into sr 2017-10-04 11:42:44 -04:00
Heng Li 95eb1dec36 r458: fixed wrong chr for inversion aln (#30) 2017-10-04 11:32:06 -04:00
Heng Li 645db3350e Merge branch 'master' into sr 2017-09-20 11:15:14 -04:00
Heng Li 75e6bbc9f6 r421: removed the MM_F_SPLICE_BOTH mode
In the default splice mode, minimap2 applies two rounds of spliced alignment:
first assuming GT-AG to be the splice signal across all splicing sites and then
assuming CT-AC to be the signal. This is the idea strategy.

In the MM_F_SPLICE_BOTH mode, minimap2 applies one round of spliced alignment,
assuming GT-AG and CT-AC to be the splice signals AT THE SAME TIME. This will
be faster but less accurate. I don't think anyone would like to run minimap2 in
this mode, so I am removing it for clarity.
2017-09-20 11:11:53 -04:00
Heng Li 7a9b4db874 replaced --approx-ext with --sr
--sr disables Z-drop and may come with other heurstics
2017-09-20 10:51:18 -04:00
Heng Li 11081c6c27 r411: refactored kalloc for clarity
The new version is closer to K&R's original implementation.
2017-09-18 19:49:15 -04:00
Heng Li 0f7455cefa r365: documented the "sr" preset 2017-09-14 12:57:21 -04:00
Heng Li d7f2ac1d4f better parameters for short reads
It turns out the key problem is not the minimizer density. It is the max
occurrence that tends to affect results more, especially sensitivity. There is
still lots of work to do, but for now, it seems a good start.
2017-09-12 16:11:23 -04:00
Heng Li eccdb3a1ca r315: added getopt from musl 2017-09-01 20:20:34 +08:00