Commit Graph

103 Commits (08bd2123b66353e502f1d51fa9bd4d2a96fb5321)

Author SHA1 Message Date
Heng Li 08bd2123b6 r752: option to copy comments to output (#136) 2018-03-23 10:04:33 -04:00
Heng Li 8766d286df r751: optionally output MD (#118) 2018-03-22 14:15:33 -04:00
Heng Li bdc615c1d4 r741: added --min-occ-floor to improve #107 2018-03-12 14:32:27 -04:00
Heng Li 24a4808826 r718: retrieve sequence from the index 2018-02-23 10:18:26 -05:00
Heng Li 1372977a37 r708: implemented double Z-drop thresholds (#112)
When aligning long reads, we would prefer to align through low-quality
regions. This requires a large Z-drop threshold. However, to find small
inversions, we need to use a small Z-drop. This commit address this
conflict with two Z-drop thresholds. When Z-drop exceeds the smaller
threshold, we perform a local alignment to check if there is a potential
inversion. If there is one, we break the alignment; otherwise we break
the alignment only if Z-drop excess the larger threshold.

This commit also fixes a bug that reported wrong coordinates when the
inversion is on the forward strand (#112).
2018-02-15 10:50:49 -05:00
Heng Li 7ef5490884 r703: added --max-clip-ratio
still testing the option
2018-02-12 13:29:18 -05:00
Heng Li 29b4a1786c r685: tune end seed filter again 2018-02-05 11:48:22 -05:00
Heng Li dbf284b2d9 r684: separate end score from min_chain_score 2018-02-05 11:40:38 -05:00
Heng Li da6947cfa3 r671: cleanup command line options 2018-01-31 13:59:52 -05:00
Heng Li 46d6349af4 r670: added PE support to mappy
and minor code cleanup
2018-01-31 11:33:08 -05:00
Heng Li 123bc1d91d put option operations in another file 2018-01-26 08:38:37 -05:00
Heng Li 33f8157961 r655: options to map to one strand of the ref #91 2018-01-16 10:34:30 -05:00
Heng Li e420b17496 r629: API to construct index from strings 2017-12-18 22:29:46 -05:00
Heng Li ab345e600b r626: function to check incorrect scoring system 2017-12-13 12:23:43 -05:00
Heng Li 98a6e52c06 r618: heuristics to avoid tiny terminal exons 2017-12-11 00:57:55 -05:00
Heng Li 704ff9f4c6 r607: estimate sequence divergence
Currently using the simplest method. There may be a more accurate estimate.
2017-12-06 16:14:39 -05:00
Heng Li 2f463b1db0 r573: prepare to generalize index 2017-11-11 19:54:06 -05:00
mvdbeek 1cb0bf4bef Implement -Y for soft clipping of supp. alignments
I tried to base this on bwa-mem and it seems to work for sam alignments.
2017-11-09 19:22:36 +01:00
Heng Li b24d68ae9f r557: fixed another mapq underestimate
When a chain is split during base-level alignment, its chaining score is
reduced. However, the chaining score of its suboptimal chain remains the same.
This leads to underestimated mapping quality.
2017-11-07 23:20:49 -05:00
Heng Li fa5a645ca5 r552: fixed a tiny typo on struct packing
The old packing wastes memory, thought very small.
2017-11-05 08:27:26 -05:00
Heng Li cd24dc8834 r545: removed option -i, not working well 2017-10-31 22:23:27 -04:00
Heng Li 79b0caca95 r537: model the next base to GT/AG
[PMID:18688272] shows that the base following GT tends to be A or G (i.e. R) in
both human and yeast, and that the base preceeding AG tends to be C or T (i.e.
Y). In the new model, we pay no cost to GTr..yAG, but we pay half of the cost
if there is no r or y. This improves the junction accuracy when mapping to
human and mouse and decreases the accuacy when mapping to SIRV. My guess is
that SIRV does not honor this trend. Need to investigate in future.

Also in this commit, --cost-non-gt-ag is aliased to -C. The default is changed
to 9 instead of 5. I also added --splice-flank to enable the above model. This
may become the default once I confirm my hypothesis on SIRV.
2017-10-28 00:25:01 -04:00
Heng Li d4b5dfc297 r533: added --no-pairing
to prevent the use of any pairing information for paired-end reads.
2017-10-23 14:09:32 -04:00
Heng Li 306e4541f8 Released minimap2-2.3 (r531) 2017-10-22 23:13:35 -04:00
Heng Li 4683da2455 r520: added option -L to write long cigar to CG 2017-10-17 17:32:44 -04:00
Heng Li adf6cd7f52 r513: merged pre- and post-cigar blen and mlen
This saves a bit memory and is cleaner.
2017-10-16 10:55:18 -04:00
Heng Li e6f525edaf r512: option to filter poorly aligned reads 2017-10-16 10:38:22 -04:00
Heng Li 7c555f9b7e r508: use two I/O threads for mapping
-x sr applies this option by default
2017-10-12 14:56:01 -04:00
Heng Li 7345621759 r499: end bonus working; DP region needs improve! 2017-10-11 00:14:25 -04:00
Heng Li 61e56c941d r488: parameter to control max fragment length 2017-10-07 23:54:32 -04:00
Heng Li 9c5767f9ed r477: renamed multi_seg to frag_mode 2017-10-05 15:48:17 -04:00
Heng Li ae2adf04d4 r476: multi-file fragment mode working 2017-10-05 15:39:26 -04:00
Heng Li f4a5d3a692 r474: replaced -S and --cs-no-equal with --cs 2017-10-05 15:03:03 -04:00
Heng Li 5ab99eb26e more accurate SAM flag 2017-10-05 10:59:38 -04:00
Heng Li 9aba11769c r467: added : (equal length) and ^ (intron) ops 2017-10-04 21:55:37 -04:00
Heng Li 7d50e646dd r466: detect multi-part index more smartly
though it might not work in an extremely rare case: the end of a sequence ends
at X*16384 and it is the last sequence in a batch. This can be resolved by
never letting the kstream_t buffer empty.
2017-10-04 17:32:58 -04:00
Heng Li 2581c44a21 r463: optionally disable secondary hits 2017-10-04 13:24:41 -04:00
Heng Li 2a1e738a94 r461: randomize repetitive hits 2017-10-04 13:05:18 -04:00
Heng Li cf55c84056 r460: added option --no-long-join 2017-10-04 12:08:44 -04:00
Heng Li 04fb2c2ec0 r454: rechain with higher max_occ if no good chain 2017-09-29 19:24:32 -04:00
Heng Li 7e0d70bfd3 r445: pair coordinate adjustment working
Next: mapq adjustment, which will be tricky...
2017-09-27 15:38:18 -04:00
Heng Li a349d85280 r444: changed the way orientation is specified
The old model doesn't work with RF or RR orientation. The new model only works
with paired-end reads. For >2 segments, only FF is supported.
2017-09-27 12:33:10 -04:00
Heng Li f611edf6f2 r443: don't filter small cm for split seg 2017-09-26 16:17:58 -04:00
Heng Li 3bb66e1ed3 multi-seg working on toy examples 2017-09-25 13:42:04 -04:00
Heng Li f0951141a1 allow to read multiple files interleaved 2017-09-24 14:33:05 -04:00
Heng Li 645db3350e Merge branch 'master' into sr 2017-09-20 11:15:14 -04:00
Heng Li 75e6bbc9f6 r421: removed the MM_F_SPLICE_BOTH mode
In the default splice mode, minimap2 applies two rounds of spliced alignment:
first assuming GT-AG to be the splice signal across all splicing sites and then
assuming CT-AC to be the signal. This is the idea strategy.

In the MM_F_SPLICE_BOTH mode, minimap2 applies one round of spliced alignment,
assuming GT-AG and CT-AC to be the splice signals AT THE SAME TIME. This will
be faster but less accurate. I don't think anyone would like to run minimap2 in
this mode, so I am removing it for clarity.
2017-09-20 11:11:53 -04:00
Heng Li 7a9b4db874 replaced --approx-ext with --sr
--sr disables Z-drop and may come with other heurstics
2017-09-20 10:51:18 -04:00
Heng Li fb1bcc0084 early exploration 2017-09-19 16:18:28 -04:00
Heng Li 75ff7ceec5 r368: API documentation 2017-09-14 22:23:04 -04:00