Heng Li
872f300955
r781: fixed the buggy heapmerge ( resolves #166 )
2018-05-30 11:55:14 -04:00
Heng Li
08bd2123b6
r752: option to copy comments to output ( #136 )
2018-03-23 10:04:33 -04:00
Heng Li
623b5d9d48
r750: check puts() return ( #132 & #103 )
2018-03-22 11:31:58 -04:00
Heng Li
8fc5f8dc90
r711: assign proper mapq to primary inversions
2018-02-15 14:34:59 -05:00
Heng Li
da6947cfa3
r671: cleanup command line options
2018-01-31 13:59:52 -05:00
Heng Li
46d6349af4
r670: added PE support to mappy
...
and minor code cleanup
2018-01-31 11:33:08 -05:00
Heng Li
12a5a5fa3c
r669: improved self chain extension ( #10 )
...
This has not fully resolved #10 , only alleviated the issue.
2018-01-30 20:05:02 -05:00
Heng Li
dfc78b39d3
refactor the old sorting
2018-01-26 09:37:48 -05:00
Heng Li
7b57c9a619
heap sort working on MT
2018-01-26 09:21:45 -05:00
Heng Li
123bc1d91d
put option operations in another file
2018-01-26 08:38:37 -05:00
Heng Li
dd18307e66
code backup
2018-01-25 21:52:49 -05:00
Heng Li
af1a871270
r658: gives a warning if -N0 is used
2018-01-19 08:33:20 -05:00
Heng Li
2b71181a37
r657: check -p ( #96 )
...
Well, in principle, every option should be checked. Will do when someone raise
issues...
2018-01-19 01:03:38 -05:00
Heng Li
33f8157961
r655: options to map to one strand of the ref #91
2018-01-16 10:34:30 -05:00
Heng Li
dc9e3dcf4a
r639: changed -O/-E validation
2017-12-30 20:39:29 -05:00
Heng Li
cc75c12905
r638: disabled scoring checking
...
I haven't figured out the exact bounds...
2017-12-30 07:50:40 -05:00
Heng Li
ab345e600b
r626: function to check incorrect scoring system
2017-12-13 12:23:43 -05:00
Heng Li
43960a8ca7
r621: --print-qname also shows kalloc status
2017-12-11 12:30:08 -05:00
Heng Li
f6608fe99c
r620: revamped thread-local memory management
...
* Don't preallocate sdust_buf or minizer list. kalloc should be fast enough -
benchmarks needed to confirm.
* Fixed a memory leak caused by divergence estimate (post v2.5)
* Reset the kalloc buffer after mapping a long query. This reduces peak memory
when large chunks of memory are allocated, at the cost of performance, though.
2017-12-11 12:11:10 -05:00
Heng Li
98a6e52c06
r618: heuristics to avoid tiny terminal exons
2017-12-11 00:57:55 -05:00
Heng Li
824712a4ee
r617: removed some unused code
2017-12-10 17:54:50 -05:00
Heng Li
98a999fe44
r611: added pseudocount when est divergence
2017-12-08 12:57:57 -05:00
Heng Li
2f693e8ca4
r609: bugfix - SDUST masking not working
2017-12-07 11:45:38 -05:00
Heng Li
704ff9f4c6
r607: estimate sequence divergence
...
Currently using the simplest method. There may be a more accurate estimate.
2017-12-06 16:14:39 -05:00
Heng Li
2f463b1db0
r573: prepare to generalize index
2017-11-11 19:54:06 -05:00
Heng Li
cd24dc8834
r545: removed option -i, not working well
2017-10-31 22:23:27 -04:00
Heng Li
fb8a1b5536
r542: tuning mapQ calculation
2017-10-31 14:25:09 -04:00
Heng Li
192217a10c
r539: use --splice-flank=yes by default
...
In human/mouse, the GTr..yAG pattern occurs to 91/92% of all GT-AG introns.
Modeling r..y clearly leads to higher accuracy. However, in SIRV, this
percentage is reduced to ~60%. The default "--splice --splice-flank=yes"
leads to lower accuracy. If someone benchmark minimap2 on SIRV, this would be
bad, but minimap2 is developed for practical applications, not for benchmarks.
I will live with that.
2017-10-28 22:29:55 -04:00
Heng Li
79b0caca95
r537: model the next base to GT/AG
...
[PMID:18688272] shows that the base following GT tends to be A or G (i.e. R) in
both human and yeast, and that the base preceeding AG tends to be C or T (i.e.
Y). In the new model, we pay no cost to GTr..yAG, but we pay half of the cost
if there is no r or y. This improves the junction accuracy when mapping to
human and mouse and decreases the accuacy when mapping to SIRV. My guess is
that SIRV does not honor this trend. Need to investigate in future.
Also in this commit, --cost-non-gt-ag is aliased to -C. The default is changed
to 9 instead of 5. I also added --splice-flank to enable the above model. This
may become the default once I confirm my hypothesis on SIRV.
2017-10-28 00:25:01 -04:00
Heng Li
d4b5dfc297
r533: added --no-pairing
...
to prevent the use of any pairing information for paired-end reads.
2017-10-23 14:09:32 -04:00
Heng Li
306e4541f8
Released minimap2-2.3 (r531)
2017-10-22 23:13:35 -04:00
Heng Li
bd04372873
r524: reverted to bwa-mem end bonus
...
and reduced the cost of clipping when filtering by identity
2017-10-20 16:57:31 -04:00
Heng Li
04cf4ebf5e
r518: increased the default -K to 500M
...
This helps multi-thread performance for ultra-long reads.
2017-10-17 13:21:29 -04:00
Heng Li
e6f525edaf
r512: option to filter poorly aligned reads
2017-10-16 10:38:22 -04:00
Heng Li
858213d513
r511: fixed wrong primary sam record
2017-10-12 23:02:18 -04:00
Heng Li
7c555f9b7e
r508: use two I/O threads for mapping
...
-x sr applies this option by default
2017-10-12 14:56:01 -04:00
Heng Li
7345621759
r499: end bonus working; DP region needs improve!
2017-10-11 00:14:25 -04:00
Heng Li
9396d9e11b
r452: typo in the last commit
2017-10-09 10:05:32 -04:00
Heng Li
198849a716
r491: an ambiguous base costs the same as gap ext
2017-10-09 09:59:42 -04:00
Heng Li
9fea4d16b3
r490: improved short-read extension heuristic
...
Now we find the best scoring ungapped seeded segment and then extend from it.
There is no gap filling for short reads.
2017-10-08 21:36:34 -04:00
Heng Li
61e56c941d
r488: parameter to control max fragment length
2017-10-07 23:54:32 -04:00
Heng Li
c6384ed2c8
r482: increased short-read bandwidth to 100
...
This has very minor effect on speed.
2017-10-06 10:20:32 -04:00
Heng Li
e0baf1ad54
r479: a bit code cleanup
2017-10-05 16:15:14 -04:00
Heng Li
f266092699
r478: simplied useless code, a tiny bit
2017-10-05 15:56:00 -04:00
Heng Li
9c5767f9ed
r477: renamed multi_seg to frag_mode
2017-10-05 15:48:17 -04:00
Heng Li
ae2adf04d4
r476: multi-file fragment mode working
2017-10-05 15:39:26 -04:00
Heng Li
5ab99eb26e
more accurate SAM flag
2017-10-05 10:59:38 -04:00
Heng Li
7cc4f6f965
r469: first step towards PE SAM
2017-10-05 10:38:09 -04:00
Heng Li
7d50e646dd
r466: detect multi-part index more smartly
...
though it might not work in an extremely rare case: the end of a sequence ends
at X*16384 and it is the last sequence in a batch. This can be resolved by
never letting the kstream_t buffer empty.
2017-10-04 17:32:58 -04:00
Heng Li
1554149158
r465: apply option -x before other options
2017-10-04 13:52:28 -04:00
Heng Li
2581c44a21
r463: optionally disable secondary hits
2017-10-04 13:24:41 -04:00
Heng Li
2a1e738a94
r461: randomize repetitive hits
2017-10-04 13:05:18 -04:00
Heng Li
cf55c84056
r460: added option --no-long-join
2017-10-04 12:08:44 -04:00
Heng Li
ee9b2773a8
r456: min chain score should >k-mer length
...
or chain_dp() wastes time on unnecessarily sorting chains with one k-mer.
2017-09-29 22:33:55 -04:00
Heng Li
04fb2c2ec0
r454: rechain with higher max_occ if no good chain
2017-09-29 19:24:32 -04:00
Heng Li
0d4ecd19ee
r453: avoid duplicated strcmp() for ava
2017-09-28 15:52:05 -04:00
Heng Li
9541052564
r447: paired-end mapping quality
...
not as good as I would hope...
2017-09-27 15:39:25 -04:00
Heng Li
7e0d70bfd3
r445: pair coordinate adjustment working
...
Next: mapq adjustment, which will be tricky...
2017-09-27 15:38:18 -04:00
Heng Li
a349d85280
r444: changed the way orientation is specified
...
The old model doesn't work with RF or RR orientation. The new model only works
with paired-end reads. For >2 segments, only FF is supported.
2017-09-27 12:33:10 -04:00
Heng Li
f611edf6f2
r443: don't filter small cm for split seg
2017-09-26 16:17:58 -04:00
Heng Li
1b1dd0cd57
r442: default max_gap to 200 in the sr mode
2017-09-26 13:31:01 -04:00
Heng Li
55d1e4f638
r440: better chain filtering for PE reads
2017-09-26 11:03:36 -04:00
Heng Li
8f25cfa36e
r437: fixed uninialized memory on rep_len
2017-09-25 14:22:45 -04:00
Heng Li
81008dd371
r436: working on short reads
...
The result is mixed - lots of room for tuning
2017-09-25 14:06:29 -04:00
Heng Li
3bb66e1ed3
multi-seg working on toy examples
2017-09-25 13:42:04 -04:00
Heng Li
a742f10164
get multi-seg code ready; probably not working yet
2017-09-24 15:17:17 -04:00
Heng Li
f0951141a1
allow to read multiple files interleaved
2017-09-24 14:33:05 -04:00
Heng Li
19d8eca3a1
moved array shrinking into chain_dp()
2017-09-20 14:58:57 -04:00
Heng Li
9943e5fdd0
backup
2017-09-20 14:35:46 -04:00
Heng Li
5b39a1b34b
Merge branch 'master' into sr
2017-09-20 12:24:08 -04:00
Heng Li
e3b5802b2e
r424: reduce memory for long query seqs
2017-09-20 12:22:13 -04:00
Heng Li
03d6894517
backup
2017-09-20 11:47:46 -04:00
Heng Li
645db3350e
Merge branch 'master' into sr
2017-09-20 11:15:14 -04:00
Heng Li
75e6bbc9f6
r421: removed the MM_F_SPLICE_BOTH mode
...
In the default splice mode, minimap2 applies two rounds of spliced alignment:
first assuming GT-AG to be the splice signal across all splicing sites and then
assuming CT-AC to be the signal. This is the idea strategy.
In the MM_F_SPLICE_BOTH mode, minimap2 applies one round of spliced alignment,
assuming GT-AG and CT-AC to be the splice signals AT THE SAME TIME. This will
be faster but less accurate. I don't think anyone would like to run minimap2 in
this mode, so I am removing it for clarity.
2017-09-20 11:11:53 -04:00
Heng Li
7a9b4db874
replaced --approx-ext with --sr
...
--sr disables Z-drop and may come with other heurstics
2017-09-20 10:51:18 -04:00
Heng Li
fd14618e61
no effective changes
2017-09-20 10:11:05 -04:00
Heng Li
56014ba3db
avoid assertion failure given 0-length reads
2017-09-19 22:30:32 -04:00
Heng Li
b99c22840f
r414: avoid assertion failure for 0-length reads
2017-09-19 22:21:27 -04:00
Heng Li
c04420698e
fixed an uninitialized value
2017-09-19 16:21:21 -04:00
Heng Li
fb1bcc0084
early exploration
2017-09-19 16:18:28 -04:00
Heng Li
e2823d4aee
r367: index reader optionally writes index
2017-09-14 21:18:13 -04:00
Heng Li
eb00521d9b
redesigned indexing and option APIs
2017-09-14 17:02:01 -04:00
Heng Li
4d3768bf26
r364: improved the mapq heuristics
...
* use repetitive seed lengths, not counts
* compute n_sub to higher accuracy
* use bwa-mem mapq heuristic as a backup
For short single-end reads, minimap2's ROC is not as good as bwa-mem's, but is
close.
2017-09-14 12:37:03 -04:00
Heng Li
6a82a21dee
r361: improved mapq for short reads
2017-09-13 15:32:39 -04:00
Heng Li
3c91d652dd
r360: allow to set integer max occ
2017-09-13 11:37:00 -04:00
Heng Li
c7c3585531
r347: merged mm_map_frag() into mm_map()
...
mm_map_frag() was separated due to an earlier design that has been rejected.
2017-09-10 15:02:55 -04:00
Heng Li
59c822b722
removed some commented code
...
which *might* return at some time later
2017-09-09 08:38:39 -04:00
Heng Li
f422175e4e
r344: avoid unnecessary refName retrieval
2017-09-08 22:44:14 -04:00
Heng Li
101b8bb97d
r335: report an error if query can't be opened
2017-09-03 11:54:38 -04:00
Heng Li
0fe1a224ab
r309: improved SAM header output
2017-08-25 10:35:58 +08:00
Heng Li
993a2bb521
r301: separate introns from deletions
...
When an intron is adjacent to a deletion, the old code count both as introns,
which lead to an inaccurate exon boundary.
2017-08-18 15:31:15 +08:00
Heng Li
64c1389e1a
Merge branch 'master' into splice
2017-08-17 23:39:27 +08:00
Heng Li
bbb37d95f2
support inserting RG lines
2017-08-17 23:34:09 +08:00
Heng Li
2cde8d257c
r297: bidirectional RNA alignment
2017-08-17 06:02:44 -04:00
Heng Li
d240318741
r287: refined CLI options and manpage
2017-08-12 12:26:04 -04:00
Heng Li
0f4c823b0c
r286: ignore introns when computing max seg score
2017-08-12 10:58:16 -04:00
Heng Li
c59b0781bc
r280: output introns as "N" in the cdna mode
2017-08-09 11:45:02 -04:00
Heng Li
1a7d782131
r273: cdna mapping mode for testing
...
Differences from the typical mapping mode:
* banded alignment disabled
* log gap cost during chaining
* zero long-gap extension during alignment
* up to 100kb (by default) reference gap
* bad seeding not filtered (to tune later)
2017-08-08 11:31:49 -04:00
Heng Li
4c0713ee14
r235: optionally output tag cs in PAF
...
cs encodes the query, the reference sequence and CIGAR.
2017-07-31 12:06:49 -04:00
Heng Li
19d6ec885e
r224: inversion alignment around Z-drop break
2017-07-29 13:09:10 -04:00