Heng Li
7e6e8ca73f
r792: fixed -Wextra warnings and resolved #184
2018-06-19 15:26:58 -04:00
Aaron Wenger
3d3bcc29a8
Fix CIGAR reallocation with --eqx
...
Fix the logic that calculates the number of CIGAR entries when
match "M" entries are expanded into "=" and "X". The number
of entries depends not on the number of mismatches but rather
on the number of transitions between "=" to "X".
2018-06-19 14:37:41 -04:00
Heng Li
154d2caf5b
r784: support the =/X CIGAR operators ( #156 )
2018-05-30 16:11:22 -04:00
Heng Li
a3afeec0b2
r783: reverted to r781 ( #155 )
2018-05-30 15:25:34 -04:00
Heng Li
9f4309c376
r777: avoid skipping too many seeds
2018-05-11 10:25:18 -04:00
Heng Li
881b4ca3a2
r774: Merge branch 'hot-fix' into fix-long-gap
2018-05-11 10:02:17 -04:00
Heng Li
e61812ee55
reduced gap len to trigger bad seed filtering
2018-05-01 16:17:21 -04:00
Heng Li
734ac379bb
r770: matching N bases not working properly ( #155 )
2018-04-30 19:55:23 -04:00
Heng Li
759f8e4ac9
r769: filter out seeds breaking long gaps
2018-04-24 15:37:37 -04:00
Heng Li
83c57a9d98
r719: fixed bad memory access
2018-02-23 17:27:41 -05:00
Heng Li
a0d62519c1
r710: fixed incorrect inversion coordinate ( #112 )
2018-02-15 14:23:42 -05:00
Heng Li
1372977a37
r708: implemented double Z-drop thresholds ( #112 )
...
When aligning long reads, we would prefer to align through low-quality
regions. This requires a large Z-drop threshold. However, to find small
inversions, we need to use a small Z-drop. This commit address this
conflict with two Z-drop thresholds. When Z-drop exceeds the smaller
threshold, we perform a local alignment to check if there is a potential
inversion. If there is one, we break the alignment; otherwise we break
the alignment only if Z-drop excess the larger threshold.
This commit also fixes a bug that reported wrong coordinates when the
inversion is on the forward strand (#112 ).
2018-02-15 10:50:49 -05:00
Heng Li
c0e0d5d84b
r707: bugfix for inversions on rev strand ( #112 )
2018-02-14 14:09:03 -05:00
Heng Li
7ef5490884
r703: added --max-clip-ratio
...
still testing the option
2018-02-12 13:29:18 -05:00
Heng Li
a8d476c6ad
r686: end seed trimming don't go over long join
2018-02-06 11:31:32 -05:00
Heng Li
29b4a1786c
r685: tune end seed filter again
2018-02-05 11:48:22 -05:00
Heng Li
dbf284b2d9
r684: separate end score from min_chain_score
2018-02-05 11:40:38 -05:00
Heng Li
35d3e064bf
r677: reduce the change of missing hits
...
that are close to end of alignments. It is still possible to create examples
that fail the heuristic.
2018-02-02 10:35:33 -05:00
Heng Li
12a5a5fa3c
r669: improved self chain extension ( #10 )
...
This has not fully resolved #10 , only alleviated the issue.
2018-01-30 20:05:02 -05:00
Heng Li
33f8157961
r655: options to map to one strand of the ref #91
2018-01-16 10:34:30 -05:00
Heng Li
f5cfd439ee
r651: incorrectly treat introns as deletions
...
This happened when the last operation during backtracking is an intron.
2018-01-07 19:42:50 -05:00
Heng Li
98a6e52c06
r618: heuristics to avoid tiny terminal exons
2017-12-11 00:57:55 -05:00
Heng Li
824712a4ee
r617: removed some unused code
2017-12-10 17:54:50 -05:00
Heng Li
0e42628ef6
r611: document --idx-no-seq; better inv aln
2017-12-08 13:16:18 -05:00
Heng Li
2f463b1db0
r573: prepare to generalize index
2017-11-11 19:54:06 -05:00
Heng Li
d7a31e40e6
r569: last commit is buggy
2017-11-09 23:20:41 -05:00
Heng Li
dd18cd75de
r568: revert - don't take max(dp_max, dp_score)
2017-11-09 23:12:48 -05:00
Heng Li
a7b38f6900
r562: fixed a severe bug: wrong query start
2017-11-08 22:31:05 -05:00
Heng Li
98ba8928c6
r558: dp_max no less than dp_score
2017-11-08 10:06:10 -05:00
Heng Li
cd24dc8834
r545: removed option -i, not working well
2017-10-31 22:23:27 -04:00
Heng Li
79b0caca95
r537: model the next base to GT/AG
...
[PMID:18688272] shows that the base following GT tends to be A or G (i.e. R) in
both human and yeast, and that the base preceeding AG tends to be C or T (i.e.
Y). In the new model, we pay no cost to GTr..yAG, but we pay half of the cost
if there is no r or y. This improves the junction accuracy when mapping to
human and mouse and decreases the accuacy when mapping to SIRV. My guess is
that SIRV does not honor this trend. Need to investigate in future.
Also in this commit, --cost-non-gt-ag is aliased to -C. The default is changed
to 9 instead of 5. I also added --splice-flank to enable the above model. This
may become the default once I confirm my hypothesis on SIRV.
2017-10-28 00:25:01 -04:00
Heng Li
beeb806829
r526: fixed a bug when HPC is in use
...
It happened when the query HPC minimizer is longer than the reference HPC
minimizer close to the beginning of a contig. We may get a negative coordinate,
which causes an assertion failure.
2017-10-21 19:54:04 -04:00
Heng Li
ffd953029f
r519: fixed a severe bug that misses long alns
2017-10-17 15:52:36 -04:00
Heng Li
adf6cd7f52
r513: merged pre- and post-cigar blen and mlen
...
This saves a bit memory and is cleaner.
2017-10-16 10:55:18 -04:00
Heng Li
e6f525edaf
r512: option to filter poorly aligned reads
2017-10-16 10:38:22 -04:00
Heng Li
9862a75cd3
r505: a bit code simplification
2017-10-11 21:54:32 -04:00
Heng Li
3073f4a758
r504: better heuristics to reduce excessive ext
2017-10-11 21:42:11 -04:00
Heng Li
9364bc64d7
r501: added end_bonus to extz2
2017-10-11 09:39:41 -04:00
Heng Li
65abdb8f3c
r500: temporarily disabled region trunc
...
because it is causing other problems.
2017-10-11 00:16:04 -04:00
Heng Li
7345621759
r499: end bonus working; DP region needs improve!
2017-10-11 00:14:25 -04:00
Heng Li
ca632f907b
r498: fixed a bug when merging like "4I5I"
2017-10-10 21:22:37 -04:00
Heng Li
6c78a980b6
r497: the previous change not working at the ends
2017-10-10 17:32:28 -04:00
Heng Li
c217eecdb7
r496: avoid DP extending into another chain
...
When deciding the region for DP, exclude regions in the adjacent chain
2017-10-10 17:25:12 -04:00
Heng Li
13b66aad4d
r495: fix impropriate CIGAR
...
1. Not left aligned
2. In one case, 50M24D50M becomes 24D100M. The leading D needs to be removed.
3. Avoid identical hits after DP
2017-10-10 11:59:44 -04:00
Heng Li
46fa520db9
r494: simpler and better SR gap filling
...
Still one thing to do: left alignment
2017-10-09 22:02:30 -04:00
Heng Li
1e53610fb4
r493: reduced calling extd2 for ungapped aln
...
Still need to improve in case of 3I5M3D
2017-10-09 21:13:34 -04:00
Heng Li
9fea4d16b3
r490: improved short-read extension heuristic
...
Now we find the best scoring ungapped seeded segment and then extend from it.
There is no gap filling for short reads.
2017-10-08 21:36:34 -04:00
Heng Li
f9415628a8
r489: don't use approximate zdrop
...
it doesn't work well
2017-10-08 19:29:09 -04:00
Heng Li
e0baf1ad54
r479: a bit code cleanup
2017-10-05 16:15:14 -04:00
Heng Li
3ff6eda3a4
r473: don't count introns into blen
2017-10-05 14:37:21 -04:00