Heng Li
2f463b1db0
r573: prepare to generalize index
2017-11-11 19:54:06 -05:00
mvdbeek
1cb0bf4bef
Implement -Y for soft clipping of supp. alignments
...
I tried to base this on bwa-mem and it seems to work for sam alignments.
2017-11-09 19:22:36 +01:00
Heng Li
b24d68ae9f
r557: fixed another mapq underestimate
...
When a chain is split during base-level alignment, its chaining score is
reduced. However, the chaining score of its suboptimal chain remains the same.
This leads to underestimated mapping quality.
2017-11-07 23:20:49 -05:00
Heng Li
fa5a645ca5
r552: fixed a tiny typo on struct packing
...
The old packing wastes memory, thought very small.
2017-11-05 08:27:26 -05:00
Heng Li
cd24dc8834
r545: removed option -i, not working well
2017-10-31 22:23:27 -04:00
Heng Li
79b0caca95
r537: model the next base to GT/AG
...
[PMID:18688272] shows that the base following GT tends to be A or G (i.e. R) in
both human and yeast, and that the base preceeding AG tends to be C or T (i.e.
Y). In the new model, we pay no cost to GTr..yAG, but we pay half of the cost
if there is no r or y. This improves the junction accuracy when mapping to
human and mouse and decreases the accuacy when mapping to SIRV. My guess is
that SIRV does not honor this trend. Need to investigate in future.
Also in this commit, --cost-non-gt-ag is aliased to -C. The default is changed
to 9 instead of 5. I also added --splice-flank to enable the above model. This
may become the default once I confirm my hypothesis on SIRV.
2017-10-28 00:25:01 -04:00
Heng Li
d4b5dfc297
r533: added --no-pairing
...
to prevent the use of any pairing information for paired-end reads.
2017-10-23 14:09:32 -04:00
Heng Li
306e4541f8
Released minimap2-2.3 (r531)
2017-10-22 23:13:35 -04:00
Heng Li
4683da2455
r520: added option -L to write long cigar to CG
2017-10-17 17:32:44 -04:00
Heng Li
adf6cd7f52
r513: merged pre- and post-cigar blen and mlen
...
This saves a bit memory and is cleaner.
2017-10-16 10:55:18 -04:00
Heng Li
e6f525edaf
r512: option to filter poorly aligned reads
2017-10-16 10:38:22 -04:00
Heng Li
7c555f9b7e
r508: use two I/O threads for mapping
...
-x sr applies this option by default
2017-10-12 14:56:01 -04:00
Heng Li
7345621759
r499: end bonus working; DP region needs improve!
2017-10-11 00:14:25 -04:00
Heng Li
61e56c941d
r488: parameter to control max fragment length
2017-10-07 23:54:32 -04:00
Heng Li
9c5767f9ed
r477: renamed multi_seg to frag_mode
2017-10-05 15:48:17 -04:00
Heng Li
ae2adf04d4
r476: multi-file fragment mode working
2017-10-05 15:39:26 -04:00
Heng Li
f4a5d3a692
r474: replaced -S and --cs-no-equal with --cs
2017-10-05 15:03:03 -04:00
Heng Li
5ab99eb26e
more accurate SAM flag
2017-10-05 10:59:38 -04:00
Heng Li
9aba11769c
r467: added : (equal length) and ^ (intron) ops
2017-10-04 21:55:37 -04:00
Heng Li
7d50e646dd
r466: detect multi-part index more smartly
...
though it might not work in an extremely rare case: the end of a sequence ends
at X*16384 and it is the last sequence in a batch. This can be resolved by
never letting the kstream_t buffer empty.
2017-10-04 17:32:58 -04:00
Heng Li
2581c44a21
r463: optionally disable secondary hits
2017-10-04 13:24:41 -04:00
Heng Li
2a1e738a94
r461: randomize repetitive hits
2017-10-04 13:05:18 -04:00
Heng Li
cf55c84056
r460: added option --no-long-join
2017-10-04 12:08:44 -04:00
Heng Li
04fb2c2ec0
r454: rechain with higher max_occ if no good chain
2017-09-29 19:24:32 -04:00
Heng Li
7e0d70bfd3
r445: pair coordinate adjustment working
...
Next: mapq adjustment, which will be tricky...
2017-09-27 15:38:18 -04:00
Heng Li
a349d85280
r444: changed the way orientation is specified
...
The old model doesn't work with RF or RR orientation. The new model only works
with paired-end reads. For >2 segments, only FF is supported.
2017-09-27 12:33:10 -04:00
Heng Li
f611edf6f2
r443: don't filter small cm for split seg
2017-09-26 16:17:58 -04:00
Heng Li
3bb66e1ed3
multi-seg working on toy examples
2017-09-25 13:42:04 -04:00
Heng Li
f0951141a1
allow to read multiple files interleaved
2017-09-24 14:33:05 -04:00
Heng Li
645db3350e
Merge branch 'master' into sr
2017-09-20 11:15:14 -04:00
Heng Li
75e6bbc9f6
r421: removed the MM_F_SPLICE_BOTH mode
...
In the default splice mode, minimap2 applies two rounds of spliced alignment:
first assuming GT-AG to be the splice signal across all splicing sites and then
assuming CT-AC to be the signal. This is the idea strategy.
In the MM_F_SPLICE_BOTH mode, minimap2 applies one round of spliced alignment,
assuming GT-AG and CT-AC to be the splice signals AT THE SAME TIME. This will
be faster but less accurate. I don't think anyone would like to run minimap2 in
this mode, so I am removing it for clarity.
2017-09-20 11:11:53 -04:00
Heng Li
7a9b4db874
replaced --approx-ext with --sr
...
--sr disables Z-drop and may come with other heurstics
2017-09-20 10:51:18 -04:00
Heng Li
fb1bcc0084
early exploration
2017-09-19 16:18:28 -04:00
Heng Li
75ff7ceec5
r368: API documentation
2017-09-14 22:23:04 -04:00
Heng Li
e2823d4aee
r367: index reader optionally writes index
2017-09-14 21:18:13 -04:00
Heng Li
eb00521d9b
redesigned indexing and option APIs
2017-09-14 17:02:01 -04:00
Heng Li
0f7455cefa
r365: documented the "sr" preset
2017-09-14 12:57:21 -04:00
Heng Li
3c91d652dd
r360: allow to set integer max occ
2017-09-13 11:37:00 -04:00
Heng Li
d7f2ac1d4f
better parameters for short reads
...
It turns out the key problem is not the minimizer density. It is the max
occurrence that tends to affect results more, especially sensitivity. There is
still lots of work to do, but for now, it seems a good start.
2017-09-12 16:11:23 -04:00
Heng Li
0fe1a224ab
r309: improved SAM header output
2017-08-25 10:35:58 +08:00
Heng Li
2cde8d257c
r297: bidirectional RNA alignment
2017-08-17 06:02:44 -04:00
Heng Li
b5f5929bf9
r296: expose splicing related options to CLI
2017-08-13 21:37:51 -04:00
Heng Li
43506edbc5
backup: preliminary boundary alignment
2017-08-12 23:10:14 -04:00
Heng Li
d240318741
r287: refined CLI options and manpage
2017-08-12 12:26:04 -04:00
Heng Li
1a7d782131
r273: cdna mapping mode for testing
...
Differences from the typical mapping mode:
* banded alignment disabled
* log gap cost during chaining
* zero long-gap extension during alignment
* up to 100kb (by default) reference gap
* bad seeding not filtered (to tune later)
2017-08-08 11:31:49 -04:00
Heng Li
4c0713ee14
r235: optionally output tag cs in PAF
...
cs encodes the query, the reference sequence and CIGAR.
2017-07-31 12:06:49 -04:00
Heng Li
19d6ec885e
r224: inversion alignment around Z-drop break
2017-07-29 13:09:10 -04:00
Heng Li
f81f37fef1
r197: allocate index seq names from kalloc
...
to reduce malloc() overhead.
2017-07-24 19:36:05 -04:00
Heng Li
5c4d040b13
r191: warning if CLI index opt diff from prebuilt
...
Also added index testing API (moved from main.c to index.c)
2017-07-19 10:25:11 -04:00
Heng Li
71c988f6ab
r188: renamed bseq* to mm_bseq*
...
to avoid naming collisions between minimap2 and bwa/fermi-lite/etc
2017-07-19 09:26:46 -04:00
Heng Li
b4280d186f
r176: removed seedcov_ratio; changed default opt
...
min_seedcov_ratio is not used
2017-07-12 12:47:46 -04:00
Heng Li
801bc84b01
r169: output more accurate col. 10&11 to PAF
...
In r168, col.10 is smaller than what it should be. This confuses miniasm.
2017-07-11 14:09:51 -04:00
Heng Li
cc554aee43
r159: use two-piece gap penalty
2017-07-08 10:26:00 -04:00
Heng Li
9823317e8f
r158: optionally ignore base quality
2017-07-05 18:23:50 -04:00
Heng Li
53c4bf5e4f
r149: introduced debugging flags on CLI
2017-07-03 11:02:32 -04:00
Heng Li
632b8638d2
r144: adjust primary aln after cigar
2017-07-02 22:43:02 -04:00
Heng Li
74d306a596
fixed bug when retaining 2ndary aln; still buggy
2017-07-02 19:08:30 -04:00
Heng Li
426c2975f6
r126: filter by fraction of seed coverage
...
otherwise we may get too many poor overlap mappings.
2017-06-30 22:15:45 -04:00
Heng Li
d11049eb32
r120: use max-scoring seg to control output
...
much better now
2017-06-30 14:21:44 -04:00
Heng Li
52b4d8e2c9
r115: set primary tag; still buggy
2017-06-29 23:48:35 -04:00
Heng Li
11167f511b
r112: output z-drop
2017-06-29 22:08:46 -04:00
Heng Li
c8d122bcdb
backup
2017-06-29 11:11:15 -04:00
Heng Li
bcd9b1c621
r93: fixed various small issues
2017-06-28 10:35:21 -04:00
Heng Li
fa80177e58
r89: added minimal number of minimizer counts
2017-06-27 18:43:15 -04:00
Heng Li
640b1a1727
command-line option to control CIGAR output
2017-06-26 11:41:09 -04:00
Heng Li
b1077ff14c
sam output
2017-06-25 22:05:20 -04:00
Heng Li
aa5881e7bb
backup
2017-06-24 22:51:31 -04:00
Heng Li
35b84f88c6
backup
2017-06-23 22:42:15 -04:00
Heng Li
4fea3d778a
backup
2017-06-23 18:57:00 -04:00
Heng Li
6c8368c24c
get the left-extension sequence correctly
2017-06-23 18:25:47 -04:00
Heng Li
990f7b0b71
backup
2017-06-23 15:13:53 -04:00
Heng Li
4ae0b46972
min_ksw_len
2017-06-23 14:38:28 -04:00
Heng Li
9cd313eae1
sequence retrieval working
2017-06-23 14:11:56 -04:00
Heng Li
326d91deb0
backup
2017-06-23 14:06:00 -04:00
Heng Li
44cdd18de0
start to work on alignment
2017-06-23 13:44:45 -04:00
Heng Li
b04e4b9215
r36: bring back primary; don't output all mappings
2017-06-08 15:28:19 -04:00
Heng Li
19e43571c1
r34: removed a bit unused code
2017-06-07 14:35:57 -04:00
Heng Li
8ad5cfde42
output PAF
2017-06-07 14:18:32 -04:00
Heng Li
6d4348db44
dp chaining mostly works, but fails sometimes
...
which means there are bugs that need to be fixed
2017-06-06 14:19:50 -04:00
Heng Li
1a9fc04cf0
backup
2017-06-06 10:16:33 -04:00
Heng Li
acc7382a30
backup
2017-06-04 16:09:45 -04:00
Heng Li
7b7fabef4d
added idx_stat
2017-04-26 22:52:28 +08:00
Heng Li
de367a340c
compilable again
2017-04-26 19:36:46 +08:00
Heng Li
56723ad580
moved `sum_len` out of the index
...
as it can be inferred.
2017-04-19 11:06:24 -04:00
Heng Li
f5cdd3f72f
is_hpc is a property of the index
2017-04-07 15:42:33 -04:00
Heng Li
b3bc4911ba
index can be compiled; not tested yet
2017-04-07 15:30:30 -04:00
Heng Li
01baa847a1
Homopolymer-compressed k-mer sketch
2017-04-06 15:37:34 -04:00