Heng Li
e6f525edaf
r512: option to filter poorly aligned reads
2017-10-16 10:38:22 -04:00
Heng Li
858213d513
r511: fixed wrong primary sam record
2017-10-12 23:02:18 -04:00
Heng Li
7c555f9b7e
r508: use two I/O threads for mapping
...
-x sr applies this option by default
2017-10-12 14:56:01 -04:00
Heng Li
7345621759
r499: end bonus working; DP region needs improve!
2017-10-11 00:14:25 -04:00
Heng Li
9396d9e11b
r452: typo in the last commit
2017-10-09 10:05:32 -04:00
Heng Li
198849a716
r491: an ambiguous base costs the same as gap ext
2017-10-09 09:59:42 -04:00
Heng Li
9fea4d16b3
r490: improved short-read extension heuristic
...
Now we find the best scoring ungapped seeded segment and then extend from it.
There is no gap filling for short reads.
2017-10-08 21:36:34 -04:00
Heng Li
61e56c941d
r488: parameter to control max fragment length
2017-10-07 23:54:32 -04:00
Heng Li
c6384ed2c8
r482: increased short-read bandwidth to 100
...
This has very minor effect on speed.
2017-10-06 10:20:32 -04:00
Heng Li
e0baf1ad54
r479: a bit code cleanup
2017-10-05 16:15:14 -04:00
Heng Li
f266092699
r478: simplied useless code, a tiny bit
2017-10-05 15:56:00 -04:00
Heng Li
9c5767f9ed
r477: renamed multi_seg to frag_mode
2017-10-05 15:48:17 -04:00
Heng Li
ae2adf04d4
r476: multi-file fragment mode working
2017-10-05 15:39:26 -04:00
Heng Li
5ab99eb26e
more accurate SAM flag
2017-10-05 10:59:38 -04:00
Heng Li
7cc4f6f965
r469: first step towards PE SAM
2017-10-05 10:38:09 -04:00
Heng Li
7d50e646dd
r466: detect multi-part index more smartly
...
though it might not work in an extremely rare case: the end of a sequence ends
at X*16384 and it is the last sequence in a batch. This can be resolved by
never letting the kstream_t buffer empty.
2017-10-04 17:32:58 -04:00
Heng Li
1554149158
r465: apply option -x before other options
2017-10-04 13:52:28 -04:00
Heng Li
2581c44a21
r463: optionally disable secondary hits
2017-10-04 13:24:41 -04:00
Heng Li
2a1e738a94
r461: randomize repetitive hits
2017-10-04 13:05:18 -04:00
Heng Li
cf55c84056
r460: added option --no-long-join
2017-10-04 12:08:44 -04:00
Heng Li
ee9b2773a8
r456: min chain score should >k-mer length
...
or chain_dp() wastes time on unnecessarily sorting chains with one k-mer.
2017-09-29 22:33:55 -04:00
Heng Li
04fb2c2ec0
r454: rechain with higher max_occ if no good chain
2017-09-29 19:24:32 -04:00
Heng Li
0d4ecd19ee
r453: avoid duplicated strcmp() for ava
2017-09-28 15:52:05 -04:00
Heng Li
9541052564
r447: paired-end mapping quality
...
not as good as I would hope...
2017-09-27 15:39:25 -04:00
Heng Li
7e0d70bfd3
r445: pair coordinate adjustment working
...
Next: mapq adjustment, which will be tricky...
2017-09-27 15:38:18 -04:00
Heng Li
a349d85280
r444: changed the way orientation is specified
...
The old model doesn't work with RF or RR orientation. The new model only works
with paired-end reads. For >2 segments, only FF is supported.
2017-09-27 12:33:10 -04:00
Heng Li
f611edf6f2
r443: don't filter small cm for split seg
2017-09-26 16:17:58 -04:00
Heng Li
1b1dd0cd57
r442: default max_gap to 200 in the sr mode
2017-09-26 13:31:01 -04:00
Heng Li
55d1e4f638
r440: better chain filtering for PE reads
2017-09-26 11:03:36 -04:00
Heng Li
8f25cfa36e
r437: fixed uninialized memory on rep_len
2017-09-25 14:22:45 -04:00
Heng Li
81008dd371
r436: working on short reads
...
The result is mixed - lots of room for tuning
2017-09-25 14:06:29 -04:00
Heng Li
3bb66e1ed3
multi-seg working on toy examples
2017-09-25 13:42:04 -04:00
Heng Li
a742f10164
get multi-seg code ready; probably not working yet
2017-09-24 15:17:17 -04:00
Heng Li
f0951141a1
allow to read multiple files interleaved
2017-09-24 14:33:05 -04:00
Heng Li
19d8eca3a1
moved array shrinking into chain_dp()
2017-09-20 14:58:57 -04:00
Heng Li
9943e5fdd0
backup
2017-09-20 14:35:46 -04:00
Heng Li
5b39a1b34b
Merge branch 'master' into sr
2017-09-20 12:24:08 -04:00
Heng Li
e3b5802b2e
r424: reduce memory for long query seqs
2017-09-20 12:22:13 -04:00
Heng Li
03d6894517
backup
2017-09-20 11:47:46 -04:00
Heng Li
645db3350e
Merge branch 'master' into sr
2017-09-20 11:15:14 -04:00
Heng Li
75e6bbc9f6
r421: removed the MM_F_SPLICE_BOTH mode
...
In the default splice mode, minimap2 applies two rounds of spliced alignment:
first assuming GT-AG to be the splice signal across all splicing sites and then
assuming CT-AC to be the signal. This is the idea strategy.
In the MM_F_SPLICE_BOTH mode, minimap2 applies one round of spliced alignment,
assuming GT-AG and CT-AC to be the splice signals AT THE SAME TIME. This will
be faster but less accurate. I don't think anyone would like to run minimap2 in
this mode, so I am removing it for clarity.
2017-09-20 11:11:53 -04:00
Heng Li
7a9b4db874
replaced --approx-ext with --sr
...
--sr disables Z-drop and may come with other heurstics
2017-09-20 10:51:18 -04:00
Heng Li
fd14618e61
no effective changes
2017-09-20 10:11:05 -04:00
Heng Li
56014ba3db
avoid assertion failure given 0-length reads
2017-09-19 22:30:32 -04:00
Heng Li
b99c22840f
r414: avoid assertion failure for 0-length reads
2017-09-19 22:21:27 -04:00
Heng Li
c04420698e
fixed an uninitialized value
2017-09-19 16:21:21 -04:00
Heng Li
fb1bcc0084
early exploration
2017-09-19 16:18:28 -04:00
Heng Li
e2823d4aee
r367: index reader optionally writes index
2017-09-14 21:18:13 -04:00
Heng Li
eb00521d9b
redesigned indexing and option APIs
2017-09-14 17:02:01 -04:00
Heng Li
4d3768bf26
r364: improved the mapq heuristics
...
* use repetitive seed lengths, not counts
* compute n_sub to higher accuracy
* use bwa-mem mapq heuristic as a backup
For short single-end reads, minimap2's ROC is not as good as bwa-mem's, but is
close.
2017-09-14 12:37:03 -04:00
Heng Li
6a82a21dee
r361: improved mapq for short reads
2017-09-13 15:32:39 -04:00
Heng Li
3c91d652dd
r360: allow to set integer max occ
2017-09-13 11:37:00 -04:00
Heng Li
c7c3585531
r347: merged mm_map_frag() into mm_map()
...
mm_map_frag() was separated due to an earlier design that has been rejected.
2017-09-10 15:02:55 -04:00
Heng Li
59c822b722
removed some commented code
...
which *might* return at some time later
2017-09-09 08:38:39 -04:00
Heng Li
f422175e4e
r344: avoid unnecessary refName retrieval
2017-09-08 22:44:14 -04:00
Heng Li
101b8bb97d
r335: report an error if query can't be opened
2017-09-03 11:54:38 -04:00
Heng Li
0fe1a224ab
r309: improved SAM header output
2017-08-25 10:35:58 +08:00
Heng Li
993a2bb521
r301: separate introns from deletions
...
When an intron is adjacent to a deletion, the old code count both as introns,
which lead to an inaccurate exon boundary.
2017-08-18 15:31:15 +08:00
Heng Li
64c1389e1a
Merge branch 'master' into splice
2017-08-17 23:39:27 +08:00
Heng Li
bbb37d95f2
support inserting RG lines
2017-08-17 23:34:09 +08:00
Heng Li
2cde8d257c
r297: bidirectional RNA alignment
2017-08-17 06:02:44 -04:00
Heng Li
d240318741
r287: refined CLI options and manpage
2017-08-12 12:26:04 -04:00
Heng Li
0f4c823b0c
r286: ignore introns when computing max seg score
2017-08-12 10:58:16 -04:00
Heng Li
c59b0781bc
r280: output introns as "N" in the cdna mode
2017-08-09 11:45:02 -04:00
Heng Li
1a7d782131
r273: cdna mapping mode for testing
...
Differences from the typical mapping mode:
* banded alignment disabled
* log gap cost during chaining
* zero long-gap extension during alignment
* up to 100kb (by default) reference gap
* bad seeding not filtered (to tune later)
2017-08-08 11:31:49 -04:00
Heng Li
4c0713ee14
r235: optionally output tag cs in PAF
...
cs encodes the query, the reference sequence and CIGAR.
2017-07-31 12:06:49 -04:00
Heng Li
19d6ec885e
r224: inversion alignment around Z-drop break
2017-07-29 13:09:10 -04:00
Heng Li
2179e9e24b
r221: output SA in the SAM output
2017-07-28 23:08:39 -04:00
Heng Li
254280b8af
r216: a bit cleanup; identical output to r215
2017-07-28 11:54:18 -04:00
Heng Li
b927838495
r212: better heuristic to fix wrong seeding
...
but not good enough. Will explore more.
2017-07-27 11:24:51 -04:00
Heng Li
e9dc1ce2b6
r205: when computing mapq, consider min_chain_sc
...
Not doing this was a mistake.
2017-07-26 11:34:14 -04:00
Heng Li
00c6db5073
r203: check more subopt aln if score small
2017-07-25 20:02:44 -04:00
Heng Li
71c988f6ab
r188: renamed bseq* to mm_bseq*
...
to avoid naming collisions between minimap2 and bwa/fermi-lite/etc
2017-07-19 09:26:46 -04:00
Heng Li
71e2a97a4c
r180: changed -x asm5 settings
2017-07-18 00:00:36 -04:00
Heng Li
b4280d186f
r176: removed seedcov_ratio; changed default opt
...
min_seedcov_ratio is not used
2017-07-12 12:47:46 -04:00
Heng Li
52caf79395
r175: halved max-chain-skip in the ava mode
2017-07-12 10:42:19 -04:00
Heng Li
eeeb2ffb68
r174: make max-chain-skip work
...
The max-chain-skip heuristics did not work due to a bug. Without this
heuristics, chaining is too slow for long-read overlap.
2017-07-12 10:08:06 -04:00
Heng Li
33451aba45
r173: changed the debugging output format
2017-07-11 15:23:28 -04:00
Heng Li
826c8ba892
r170: added a debugging flag
...
something wrong with chaining
2017-07-11 14:47:35 -04:00
Heng Li
1ac48556ae
r167: long join threshold depends on gap
...
also caught a bug for reverse strand join
2017-07-09 10:38:51 -04:00
Heng Li
42846ce65d
r163: reduced long join score requirement
...
because the chaining score is generally smaller with the last few commits.
2017-07-08 15:51:52 -04:00
Heng Li
38b2830e18
r161: filter bad seeds; changed default -g/-r
2017-07-08 13:31:27 -04:00
Heng Li
cc554aee43
r159: use two-piece gap penalty
2017-07-08 10:26:00 -04:00
Heng Li
9823317e8f
r158: optionally ignore base quality
2017-07-05 18:23:50 -04:00
Heng Li
e07daad7ad
r153: sam primary record not set sometimes
2017-07-03 13:18:57 -04:00
Heng Li
b625247300
r150: mm_sync_regs() doesn't work with negative id
2017-07-03 11:36:34 -04:00
Heng Li
53c4bf5e4f
r149: introduced debugging flags on CLI
2017-07-03 11:02:32 -04:00
Heng Li
2e4fd9f1d0
r148: revamped regs handling after cigar
2017-07-03 10:44:26 -04:00
Heng Li
51cfb60520
r145: changed default -p from 2 to 0.8
...
For long reads, secondary alignments can be very information.
2017-07-02 22:51:45 -04:00
Heng Li
74d306a596
fixed bug when retaining 2ndary aln; still buggy
2017-07-02 19:08:30 -04:00
Heng Li
41efd03d7a
r129: fixed memory leak caused by qualities
2017-06-30 23:48:00 -04:00
Heng Li
426c2975f6
r126: filter by fraction of seed coverage
...
otherwise we may get too many poor overlap mappings.
2017-06-30 22:15:45 -04:00
Heng Li
646a746cdc
r122: filter contained aln after DP extension
2017-06-30 15:23:30 -04:00
Heng Li
fce87ce7bd
r121: output QUAL and unmapped to SAM
2017-06-30 14:40:54 -04:00
Heng Li
d11049eb32
r120: use max-scoring seg to control output
...
much better now
2017-06-30 14:21:44 -04:00
Heng Li
1a903486b9
r118: bugfix - regs unsorted before filtering
2017-06-30 12:52:28 -04:00
Heng Li
03267e8fa7
r113: fixed a sam header bug
2017-06-29 22:43:06 -04:00
Heng Li
3825feeeac
r111: changed the default z-drop to 200
2017-06-29 21:37:56 -04:00
Heng Li
08cbb09fcc
r109: changed the default scoring
2017-06-29 20:21:57 -04:00
Heng Li
4cd456b9ba
r108: refactoring, move reg1 routines to hit.c
2017-06-29 19:44:11 -04:00