Heng Li
10c6dd2551
r773: fixed an integer overflow
2018-05-11 10:01:23 -04:00
Heng Li
7ec6721c44
r772: option -Y not working
2018-05-11 10:00:11 -04:00
Heng Li
734ac379bb
r770: matching N bases not working properly ( #155 )
2018-04-30 19:55:23 -04:00
Heng Li
759f8e4ac9
r769: filter out seeds breaking long gaps
2018-04-24 15:37:37 -04:00
Heng Li
aef7b0744c
r768: shortened preset; added dv tag ( #25 )
...
Also added asm20 to command line help (#151 )
2018-04-24 12:48:54 -04:00
Heng Li
372c90ceb5
r764: fixed incorrect inversion mapq ( #148 )
2018-04-10 09:11:49 -04:00
Heng Li
ee4cd089f7
r763: fine control long join flank len ( #128 )
2018-03-29 14:16:58 -04:00
Heng Li
2d7ec75d50
Release minimap2-2.10 (r761)
2018-03-27 11:45:44 -04:00
Heng Li
5ef9580b17
r753: change bandwidth in ava-ont to 2000bp
2018-03-23 10:15:23 -04:00
Heng Li
08bd2123b6
r752: option to copy comments to output ( #136 )
2018-03-23 10:04:33 -04:00
Heng Li
8766d286df
r751: optionally output MD ( #118 )
2018-03-22 14:15:33 -04:00
Heng Li
623b5d9d48
r750: check puts() return ( #132 & #103 )
2018-03-22 11:31:58 -04:00
Heng Li
18659118cd
r749: don't print version etc at low verbose
2018-03-22 11:10:55 -04:00
Heng Li
d1050f4eaf
r748: optionally to use system getopt() ( #134 )
2018-03-19 11:18:26 -04:00
Heng Li
bdc615c1d4
r741: added --min-occ-floor to improve #107
2018-03-12 14:32:27 -04:00
Heng Li
eeb314edd6
Release minimap2-2.9 (r720)
2018-02-24 09:31:09 -05:00
Heng Li
83c57a9d98
r719: fixed bad memory access
2018-02-23 17:27:41 -05:00
Heng Li
24a4808826
r718: retrieve sequence from the index
2018-02-23 10:18:26 -05:00
Heng Li
8fc5f8dc90
r711: assign proper mapq to primary inversions
2018-02-15 14:34:59 -05:00
Heng Li
a0d62519c1
r710: fixed incorrect inversion coordinate ( #112 )
2018-02-15 14:23:42 -05:00
Heng Li
1372977a37
r708: implemented double Z-drop thresholds ( #112 )
...
When aligning long reads, we would prefer to align through low-quality
regions. This requires a large Z-drop threshold. However, to find small
inversions, we need to use a small Z-drop. This commit address this
conflict with two Z-drop thresholds. When Z-drop exceeds the smaller
threshold, we perform a local alignment to check if there is a potential
inversion. If there is one, we break the alignment; otherwise we break
the alignment only if Z-drop excess the larger threshold.
This commit also fixes a bug that reported wrong coordinates when the
inversion is on the forward strand (#112 ).
2018-02-15 10:50:49 -05:00
Heng Li
c0e0d5d84b
r707: bugfix for inversions on rev strand ( #112 )
2018-02-14 14:09:03 -05:00
Heng Li
b328795051
r706: don't segfault upon wrong FASTA/Q ( #111 )
...
The lack of robustness cost me several hours to identify.
2018-02-13 10:00:22 -05:00
Heng Li
7ef5490884
r703: added --max-clip-ratio
...
still testing the option
2018-02-12 13:29:18 -05:00
Heng Li
a8d476c6ad
r686: end seed trimming don't go over long join
2018-02-06 11:31:32 -05:00
Heng Li
29b4a1786c
r685: tune end seed filter again
2018-02-05 11:48:22 -05:00
Heng Li
dbf284b2d9
r684: separate end score from min_chain_score
2018-02-05 11:40:38 -05:00
Heng Li
35d3e064bf
r677: reduce the change of missing hits
...
that are close to end of alignments. It is still possible to create examples
that fail the heuristic.
2018-02-02 10:35:33 -05:00
Heng Li
53ce317e59
Release minimap2-2.8 (r672)
2018-02-01 12:50:20 -05:00
Heng Li
da6947cfa3
r671: cleanup command line options
2018-01-31 13:59:52 -05:00
Heng Li
46d6349af4
r670: added PE support to mappy
...
and minor code cleanup
2018-01-31 11:33:08 -05:00
Heng Li
12a5a5fa3c
r669: improved self chain extension ( #10 )
...
This has not fully resolved #10 , only alleviated the issue.
2018-01-30 20:05:02 -05:00
Heng Li
43bfa6199d
r667: warn if one query file has fewer records #92
2018-01-28 17:36:21 -05:00
Heng Li
72b9b0e3b6
r666: report if >=3 query files in SR mode #92
2018-01-28 17:15:57 -05:00
Heng Li
d676a5314b
r664: use --heat-sort for sr by default
2018-01-26 12:25:42 -05:00
Heng Li
123bc1d91d
put option operations in another file
2018-01-26 08:38:37 -05:00
Heng Li
543fa12e68
r659: for C++ compatibility
2018-01-19 10:40:18 -05:00
Heng Li
af1a871270
r658: gives a warning if -N0 is used
2018-01-19 08:33:20 -05:00
Heng Li
2b71181a37
r657: check -p ( #96 )
...
Well, in principle, every option should be checked. Will do when someone raise
issues...
2018-01-19 01:03:38 -05:00
Heng Li
33f8157961
r655: options to map to one strand of the ref #91
2018-01-16 10:34:30 -05:00
Heng Li
eecc06086f
Released minimap2-2.7 (r654)
2018-01-09 13:16:00 -05:00
Heng Li
dfea113f28
r653: the last change may write "N" wrongly
2018-01-08 11:33:53 -05:00
Heng Li
f5cfd439ee
r651: incorrectly treat introns as deletions
...
This happened when the last operation during backtracking is an intron.
2018-01-07 19:42:50 -05:00
Heng Li
dc9e3dcf4a
r639: changed -O/-E validation
2017-12-30 20:39:29 -05:00
Heng Li
cc75c12905
r638: disabled scoring checking
...
I haven't figured out the exact bounds...
2017-12-30 07:50:40 -05:00
Heng Li
e420b17496
r629: API to construct index from strings
2017-12-18 22:29:46 -05:00
Heng Li
ab345e600b
r626: function to check incorrect scoring system
2017-12-13 12:23:43 -05:00
Heng Li
d003a00d71
r625: HPC sketch still has one minor issue
2017-12-13 09:40:42 -05:00
Heng Li
eb819c29e8
Release minimap2-2.6 (r623)
2017-12-12 11:09:59 -05:00
Heng Li
fb630de40a
r622: fixed bug in sdust due to recent refactor
2017-12-11 15:32:28 -05:00
Heng Li
43960a8ca7
r621: --print-qname also shows kalloc status
2017-12-11 12:30:08 -05:00
Heng Li
f6608fe99c
r620: revamped thread-local memory management
...
* Don't preallocate sdust_buf or minizer list. kalloc should be fast enough -
benchmarks needed to confirm.
* Fixed a memory leak caused by divergence estimate (post v2.5)
* Reset the kalloc buffer after mapping a long query. This reduces peak memory
when large chunks of memory are allocated, at the cost of performance, though.
2017-12-11 12:11:10 -05:00
Heng Li
98a6e52c06
r618: heuristics to avoid tiny terminal exons
2017-12-11 00:57:55 -05:00
Heng Li
824712a4ee
r617: removed some unused code
2017-12-10 17:54:50 -05:00
Heng Li
98a999fe44
r611: added pseudocount when est divergence
2017-12-08 12:57:57 -05:00
Heng Li
fec7bd713f
r610: warning if db sequence is 0-lengthed ( #69 )
2017-12-07 21:05:39 -05:00
Heng Li
2f693e8ca4
r609: bugfix - SDUST masking not working
2017-12-07 11:45:38 -05:00
Heng Li
704ff9f4c6
r607: estimate sequence divergence
...
Currently using the simplest method. There may be a more accurate estimate.
2017-12-06 16:14:39 -05:00
Heng Li
68c63f2d68
r606: fixed a sketch bug for long 256bp k-mer
...
sketch() writes {-1,-1} to the output array.
2017-12-06 16:13:29 -05:00
Heng Li
984f7846c0
r601: bugfix - a similar issue to r600
...
This bug unsets the alignment score of suboptimal alignments.
2017-11-30 11:51:34 -05:00
Heng Li
af1d6afba9
r600: bugfix - missing secondary alignments ( #71 )
...
This should very rarely happen to typical data, but has a higher chance in
artifactual data.
2017-11-30 11:34:10 -05:00
Heng Li
131cfc6938
r574: build index without sequences
2017-11-11 21:38:38 -05:00
Heng Li
2f463b1db0
r573: prepare to generalize index
2017-11-11 19:54:06 -05:00
Heng Li
3b518271ee
Release minimap2-2.5 (r572)
2017-11-11 11:29:28 -05:00
Heng Li
d7a31e40e6
r569: last commit is buggy
2017-11-09 23:20:41 -05:00
Heng Li
dd18cd75de
r568: revert - don't take max(dp_max, dp_score)
2017-11-09 23:12:48 -05:00
Heng Li
99a2709913
r567: minor change to #56
2017-11-09 19:17:45 -05:00
mvdbeek
1cb0bf4bef
Implement -Y for soft clipping of supp. alignments
...
I tried to base this on bwa-mem and it seems to work for sam alignments.
2017-11-09 19:22:36 +01:00
Heng Li
a7b38f6900
r562: fixed a severe bug: wrong query start
2017-11-08 22:31:05 -05:00
Heng Li
e896c9ec05
r559: prefer a chain involving more segments
2017-11-08 13:22:16 -05:00
Heng Li
98ba8928c6
r558: dp_max no less than dp_score
2017-11-08 10:06:10 -05:00
Heng Li
b24d68ae9f
r557: fixed another mapq underestimate
...
When a chain is split during base-level alignment, its chaining score is
reduced. However, the chaining score of its suboptimal chain remains the same.
This leads to underestimated mapping quality.
2017-11-07 23:20:49 -05:00
Heng Li
65deedfa96
r556: bugfix - underestimate mapq for split aln
2017-11-07 22:37:12 -05:00
Heng Li
21a46ba652
Release minimap2-2.4 (r555)
2017-11-06 12:54:02 -05:00
Heng Li
fa5a645ca5
r552: fixed a tiny typo on struct packing
...
The old packing wastes memory, thought very small.
2017-11-05 08:27:26 -05:00
Heng Li
a3f0aa1d5b
r550: fixed -L issues with secondary and supp aln
2017-11-04 12:13:38 -04:00
Heng Li
22290db3e4
r546: minor mapQ tuning
2017-11-01 13:20:39 -04:00
Heng Li
cd24dc8834
r545: removed option -i, not working well
2017-10-31 22:23:27 -04:00
Heng Li
b8e758df0f
r544: increased PE mapQ
2017-10-31 16:55:02 -04:00
Heng Li
311fa90030
r543: applied some sr mapq changes to long reads
2017-10-31 15:24:05 -04:00
Heng Li
fb8a1b5536
r542: tuning mapQ calculation
2017-10-31 14:25:09 -04:00
Heng Li
285eb0da05
r540: removed a buggy debugging line
2017-10-29 00:02:41 -04:00
Heng Li
192217a10c
r539: use --splice-flank=yes by default
...
In human/mouse, the GTr..yAG pattern occurs to 91/92% of all GT-AG introns.
Modeling r..y clearly leads to higher accuracy. However, in SIRV, this
percentage is reduced to ~60%. The default "--splice --splice-flank=yes"
leads to lower accuracy. If someone benchmark minimap2 on SIRV, this would be
bad, but minimap2 is developed for practical applications, not for benchmarks.
I will live with that.
2017-10-28 22:29:55 -04:00
Heng Li
f22a94e868
r538: fixed a long existing bug in HPC k-mer ( #47 )
...
This bug may lead to a wrong minimizer when a HPC k-mer is longer than 256bp.
When there is a seed match involving this wrong HPC k-mer, the correct seed
sequences do not match in fact. This violates the assumption in align.c and
subsequently causes a segfault, which is what #47 has caught. This bug lurked
in the earliest piece of code and affected all released minimap2 versions so
far. It is extremely rare and does not affect the prebuilt GRCh37/38 indices.
2017-10-28 19:21:10 -04:00
Heng Li
79b0caca95
r537: model the next base to GT/AG
...
[PMID:18688272] shows that the base following GT tends to be A or G (i.e. R) in
both human and yeast, and that the base preceeding AG tends to be C or T (i.e.
Y). In the new model, we pay no cost to GTr..yAG, but we pay half of the cost
if there is no r or y. This improves the junction accuracy when mapping to
human and mouse and decreases the accuacy when mapping to SIRV. My guess is
that SIRV does not honor this trend. Need to investigate in future.
Also in this commit, --cost-non-gt-ag is aliased to -C. The default is changed
to 9 instead of 5. I also added --splice-flank to enable the above model. This
may become the default once I confirm my hypothesis on SIRV.
2017-10-28 00:25:01 -04:00
Heng Li
afc2f2e84b
r536: removed an unnecessary assert()
2017-10-24 21:08:54 -04:00
Heng Li
d4b5dfc297
r533: added --no-pairing
...
to prevent the use of any pairing information for paired-end reads.
2017-10-23 14:09:32 -04:00
Heng Li
306e4541f8
Released minimap2-2.3 (r531)
2017-10-22 23:13:35 -04:00
Heng Li
beeb806829
r526: fixed a bug when HPC is in use
...
It happened when the query HPC minimizer is longer than the reference HPC
minimizer close to the beginning of a contig. We may get a negative coordinate,
which causes an assertion failure.
2017-10-21 19:54:04 -04:00
Heng Li
be7f3c4ffe
r525: fixed a bug in chaining; handle ovlp ends
2017-10-20 21:34:52 -04:00
Heng Li
bd04372873
r524: reverted to bwa-mem end bonus
...
and reduced the cost of clipping when filtering by identity
2017-10-20 16:57:31 -04:00
Heng Li
15ed0712c2
r523: fixed a performance bug in ksw2_ll
...
Wont' affect accuracy.
2017-10-20 13:00:10 -04:00
Heng Li
4683da2455
r520: added option -L to write long cigar to CG
2017-10-17 17:32:44 -04:00
Heng Li
ffd953029f
r519: fixed a severe bug that misses long alns
2017-10-17 15:52:36 -04:00
Heng Li
04cf4ebf5e
r518: increased the default -K to 500M
...
This helps multi-thread performance for ultra-long reads.
2017-10-17 13:21:29 -04:00
Heng Li
25ffd72690
r517: replaced --print-2nd with --secondary
2017-10-17 11:41:56 -04:00
Heng Li
aa2d9d4e1b
r516: throw a warning if -N0 is used
2017-10-16 14:55:35 -04:00
Heng Li
addb61bcb2
r515: more conservative hit exclusion
...
When a hit covers a long query subsequence that has not been covered by better
primary hits, this hit is more likely to become a new primary hit.
2017-10-16 13:58:01 -04:00
Heng Li
adf6cd7f52
r513: merged pre- and post-cigar blen and mlen
...
This saves a bit memory and is cleaner.
2017-10-16 10:55:18 -04:00
Heng Li
e6f525edaf
r512: option to filter poorly aligned reads
2017-10-16 10:38:22 -04:00
Heng Li
858213d513
r511: fixed wrong primary sam record
2017-10-12 23:02:18 -04:00
Heng Li
dea3b60918
r510: fixed an off-by-1 bug for unmapped mate
2017-10-12 17:31:13 -04:00
Heng Li
7c555f9b7e
r508: use two I/O threads for mapping
...
-x sr applies this option by default
2017-10-12 14:56:01 -04:00
Heng Li
2801ed9b4b
r507: -K not working as is intended ( #36 )
2017-10-12 14:16:05 -04:00
Heng Li
ce06188203
r506: fixed a memory leak
2017-10-12 10:12:22 -04:00
Heng Li
9862a75cd3
r505: a bit code simplification
2017-10-11 21:54:32 -04:00
Heng Li
3073f4a758
r504: better heuristics to reduce excessive ext
2017-10-11 21:42:11 -04:00
Heng Li
9364bc64d7
r501: added end_bonus to extz2
2017-10-11 09:39:41 -04:00
Heng Li
65abdb8f3c
r500: temporarily disabled region trunc
...
because it is causing other problems.
2017-10-11 00:16:04 -04:00
Heng Li
7345621759
r499: end bonus working; DP region needs improve!
2017-10-11 00:14:25 -04:00
Heng Li
ca632f907b
r498: fixed a bug when merging like "4I5I"
2017-10-10 21:22:37 -04:00
Heng Li
6c78a980b6
r497: the previous change not working at the ends
2017-10-10 17:32:28 -04:00
Heng Li
c217eecdb7
r496: avoid DP extending into another chain
...
When deciding the region for DP, exclude regions in the adjacent chain
2017-10-10 17:25:12 -04:00
Heng Li
13b66aad4d
r495: fix impropriate CIGAR
...
1. Not left aligned
2. In one case, 50M24D50M becomes 24D100M. The leading D needs to be removed.
3. Avoid identical hits after DP
2017-10-10 11:59:44 -04:00
Heng Li
46fa520db9
r494: simpler and better SR gap filling
...
Still one thing to do: left alignment
2017-10-09 22:02:30 -04:00
Heng Li
1e53610fb4
r493: reduced calling extd2 for ungapped aln
...
Still need to improve in case of 3I5M3D
2017-10-09 21:13:34 -04:00
Heng Li
9396d9e11b
r452: typo in the last commit
2017-10-09 10:05:32 -04:00
Heng Li
198849a716
r491: an ambiguous base costs the same as gap ext
2017-10-09 09:59:42 -04:00
Heng Li
9fea4d16b3
r490: improved short-read extension heuristic
...
Now we find the best scoring ungapped seeded segment and then extend from it.
There is no gap filling for short reads.
2017-10-08 21:36:34 -04:00
Heng Li
f9415628a8
r489: don't use approximate zdrop
...
it doesn't work well
2017-10-08 19:29:09 -04:00
Heng Li
61e56c941d
r488: parameter to control max fragment length
2017-10-07 23:54:32 -04:00
Heng Li
f150257a0d
r487: demote "map10k"; improved README
2017-10-07 19:19:40 -04:00
Heng Li
bf2d4f7aec
r486: treat "U" as "T" for RNA reads ( #33 )
2017-10-07 18:53:25 -04:00
Heng Li
c6384ed2c8
r482: increased short-read bandwidth to 100
...
This has very minor effect on speed.
2017-10-06 10:20:32 -04:00
Heng Li
e0baf1ad54
r479: a bit code cleanup
2017-10-05 16:15:14 -04:00
Heng Li
f266092699
r478: simplied useless code, a tiny bit
2017-10-05 15:56:00 -04:00
Heng Li
9c5767f9ed
r477: renamed multi_seg to frag_mode
2017-10-05 15:48:17 -04:00
Heng Li
ae2adf04d4
r476: multi-file fragment mode working
2017-10-05 15:39:26 -04:00
Heng Li
b839758335
r475: added --cs=none; updated manpage
2017-10-05 15:27:37 -04:00
Heng Li
f4a5d3a692
r474: replaced -S and --cs-no-equal with --cs
2017-10-05 15:03:03 -04:00
Heng Li
3ff6eda3a4
r473: don't count introns into blen
2017-10-05 14:37:21 -04:00
Heng Li
1a90bc8603
r472: fixed a bug when printing MAPQ/CIGAR
2017-10-05 12:46:11 -04:00
Heng Li
abf2a90363
r471: all SAM features implemented; more tests!
2017-10-05 12:37:30 -04:00
Heng Li
7cc4f6f965
r469: first step towards PE SAM
2017-10-05 10:38:09 -04:00
Heng Li
16e6e589a8
r468: replaced ^ with ~ in cs
2017-10-04 22:17:12 -04:00
Heng Li
9aba11769c
r467: added : (equal length) and ^ (intron) ops
2017-10-04 21:55:37 -04:00
Heng Li
7d50e646dd
r466: detect multi-part index more smartly
...
though it might not work in an extremely rare case: the end of a sequence ends
at X*16384 and it is the last sequence in a batch. This can be resolved by
never letting the kstream_t buffer empty.
2017-10-04 17:32:58 -04:00
Heng Li
1554149158
r465: apply option -x before other options
2017-10-04 13:52:28 -04:00
Heng Li
19c39e704f
r464: fixed a bug in pairing, due to randomization
2017-10-04 13:37:40 -04:00
Heng Li
2581c44a21
r463: optionally disable secondary hits
2017-10-04 13:24:41 -04:00
Heng Li
5babf41a38
r462: SAM primary flag not properly set
2017-10-04 13:11:29 -04:00
Heng Li
2a1e738a94
r461: randomize repetitive hits
2017-10-04 13:05:18 -04:00
Heng Li
cf55c84056
r460: added option --no-long-join
2017-10-04 12:08:44 -04:00
Heng Li
841763ec24
Merge branch 'master' into sr
2017-10-04 11:42:44 -04:00
Heng Li
95eb1dec36
r458: fixed wrong chr for inversion aln ( #30 )
2017-10-04 11:32:06 -04:00
Heng Li
0fd0f2aed1
r457: fixed a bug on parsing -f
2017-09-30 00:00:44 -04:00
Heng Li
ee9b2773a8
r456: min chain score should >k-mer length
...
or chain_dp() wastes time on unnecessarily sorting chains with one k-mer.
2017-09-29 22:33:55 -04:00
Heng Li
340483821e
r455: set max_occ on command line
2017-09-29 22:18:43 -04:00
Heng Li
04fb2c2ec0
r454: rechain with higher max_occ if no good chain
2017-09-29 19:24:32 -04:00
Heng Li
0d4ecd19ee
r453: avoid duplicated strcmp() for ava
2017-09-28 15:52:05 -04:00