minimap2

Commit Graph

Author	SHA1	Message	Date
Heng Li	2f463b1db0	r573: prepare to generalize index	2017-11-11 19:54:06 -05:00
Heng Li	cd24dc8834	r545: removed option -i, not working well	2017-10-31 22:23:27 -04:00
Heng Li	fb8a1b5536	r542: tuning mapQ calculation	2017-10-31 14:25:09 -04:00
Heng Li	192217a10c	r539: use --splice-flank=yes by default In human/mouse, the GTr..yAG pattern occurs to 91/92% of all GT-AG introns. Modeling r..y clearly leads to higher accuracy. However, in SIRV, this percentage is reduced to ~60%. The default "--splice --splice-flank=yes" leads to lower accuracy. If someone benchmark minimap2 on SIRV, this would be bad, but minimap2 is developed for practical applications, not for benchmarks. I will live with that.	2017-10-28 22:29:55 -04:00
Heng Li	79b0caca95	r537: model the next base to GT/AG [PMID:18688272] shows that the base following GT tends to be A or G (i.e. R) in both human and yeast, and that the base preceeding AG tends to be C or T (i.e. Y). In the new model, we pay no cost to GTr..yAG, but we pay half of the cost if there is no r or y. This improves the junction accuracy when mapping to human and mouse and decreases the accuacy when mapping to SIRV. My guess is that SIRV does not honor this trend. Need to investigate in future. Also in this commit, --cost-non-gt-ag is aliased to -C. The default is changed to 9 instead of 5. I also added --splice-flank to enable the above model. This may become the default once I confirm my hypothesis on SIRV.	2017-10-28 00:25:01 -04:00
Heng Li	d4b5dfc297	r533: added --no-pairing to prevent the use of any pairing information for paired-end reads.	2017-10-23 14:09:32 -04:00
Heng Li	306e4541f8	Released minimap2-2.3 (r531)	2017-10-22 23:13:35 -04:00
Heng Li	bd04372873	r524: reverted to bwa-mem end bonus and reduced the cost of clipping when filtering by identity	2017-10-20 16:57:31 -04:00
Heng Li	04cf4ebf5e	r518: increased the default -K to 500M This helps multi-thread performance for ultra-long reads.	2017-10-17 13:21:29 -04:00
Heng Li	e6f525edaf	r512: option to filter poorly aligned reads	2017-10-16 10:38:22 -04:00
Heng Li	858213d513	r511: fixed wrong primary sam record	2017-10-12 23:02:18 -04:00
Heng Li	7c555f9b7e	r508: use two I/O threads for mapping -x sr applies this option by default	2017-10-12 14:56:01 -04:00
Heng Li	7345621759	r499: end bonus working; DP region needs improve!	2017-10-11 00:14:25 -04:00
Heng Li	9396d9e11b	r452: typo in the last commit	2017-10-09 10:05:32 -04:00
Heng Li	198849a716	r491: an ambiguous base costs the same as gap ext	2017-10-09 09:59:42 -04:00
Heng Li	9fea4d16b3	r490: improved short-read extension heuristic Now we find the best scoring ungapped seeded segment and then extend from it. There is no gap filling for short reads.	2017-10-08 21:36:34 -04:00
Heng Li	61e56c941d	r488: parameter to control max fragment length	2017-10-07 23:54:32 -04:00
Heng Li	c6384ed2c8	r482: increased short-read bandwidth to 100 This has very minor effect on speed.	2017-10-06 10:20:32 -04:00
Heng Li	e0baf1ad54	r479: a bit code cleanup	2017-10-05 16:15:14 -04:00
Heng Li	f266092699	r478: simplied useless code, a tiny bit	2017-10-05 15:56:00 -04:00
Heng Li	9c5767f9ed	r477: renamed multi_seg to frag_mode	2017-10-05 15:48:17 -04:00
Heng Li	ae2adf04d4	r476: multi-file fragment mode working	2017-10-05 15:39:26 -04:00
Heng Li	5ab99eb26e	more accurate SAM flag	2017-10-05 10:59:38 -04:00
Heng Li	7cc4f6f965	r469: first step towards PE SAM	2017-10-05 10:38:09 -04:00
Heng Li	7d50e646dd	r466: detect multi-part index more smartly though it might not work in an extremely rare case: the end of a sequence ends at X*16384 and it is the last sequence in a batch. This can be resolved by never letting the kstream_t buffer empty.	2017-10-04 17:32:58 -04:00
Heng Li	1554149158	r465: apply option -x before other options	2017-10-04 13:52:28 -04:00
Heng Li	2581c44a21	r463: optionally disable secondary hits	2017-10-04 13:24:41 -04:00
Heng Li	2a1e738a94	r461: randomize repetitive hits	2017-10-04 13:05:18 -04:00
Heng Li	cf55c84056	r460: added option --no-long-join	2017-10-04 12:08:44 -04:00
Heng Li	ee9b2773a8	r456: min chain score should >k-mer length or chain_dp() wastes time on unnecessarily sorting chains with one k-mer.	2017-09-29 22:33:55 -04:00
Heng Li	04fb2c2ec0	r454: rechain with higher max_occ if no good chain	2017-09-29 19:24:32 -04:00
Heng Li	0d4ecd19ee	r453: avoid duplicated strcmp() for ava	2017-09-28 15:52:05 -04:00
Heng Li	9541052564	r447: paired-end mapping quality not as good as I would hope...	2017-09-27 15:39:25 -04:00
Heng Li	7e0d70bfd3	r445: pair coordinate adjustment working Next: mapq adjustment, which will be tricky...	2017-09-27 15:38:18 -04:00
Heng Li	a349d85280	r444: changed the way orientation is specified The old model doesn't work with RF or RR orientation. The new model only works with paired-end reads. For >2 segments, only FF is supported.	2017-09-27 12:33:10 -04:00
Heng Li	f611edf6f2	r443: don't filter small cm for split seg	2017-09-26 16:17:58 -04:00
Heng Li	1b1dd0cd57	r442: default max_gap to 200 in the sr mode	2017-09-26 13:31:01 -04:00
Heng Li	55d1e4f638	r440: better chain filtering for PE reads	2017-09-26 11:03:36 -04:00
Heng Li	8f25cfa36e	r437: fixed uninialized memory on rep_len	2017-09-25 14:22:45 -04:00
Heng Li	81008dd371	r436: working on short reads The result is mixed - lots of room for tuning	2017-09-25 14:06:29 -04:00
Heng Li	3bb66e1ed3	multi-seg working on toy examples	2017-09-25 13:42:04 -04:00
Heng Li	a742f10164	get multi-seg code ready; probably not working yet	2017-09-24 15:17:17 -04:00
Heng Li	f0951141a1	allow to read multiple files interleaved	2017-09-24 14:33:05 -04:00
Heng Li	19d8eca3a1	moved array shrinking into chain_dp()	2017-09-20 14:58:57 -04:00
Heng Li	9943e5fdd0	backup	2017-09-20 14:35:46 -04:00
Heng Li	5b39a1b34b	Merge branch 'master' into sr	2017-09-20 12:24:08 -04:00
Heng Li	e3b5802b2e	r424: reduce memory for long query seqs	2017-09-20 12:22:13 -04:00
Heng Li	03d6894517	backup	2017-09-20 11:47:46 -04:00
Heng Li	645db3350e	Merge branch 'master' into sr	2017-09-20 11:15:14 -04:00
Heng Li	75e6bbc9f6	r421: removed the MM_F_SPLICE_BOTH mode In the default splice mode, minimap2 applies two rounds of spliced alignment: first assuming GT-AG to be the splice signal across all splicing sites and then assuming CT-AC to be the signal. This is the idea strategy. In the MM_F_SPLICE_BOTH mode, minimap2 applies one round of spliced alignment, assuming GT-AG and CT-AC to be the splice signals AT THE SAME TIME. This will be faster but less accurate. I don't think anyone would like to run minimap2 in this mode, so I am removing it for clarity.	2017-09-20 11:11:53 -04:00

1 2 3 4

161 Commits (984f7846c0e851af0cc1fdc21e5fc1d6eb6bec21)