2018-06-21 12:04:08 +08:00
|
|
|
.TH minimap2 1 "20 June 2018" "minimap2-2.11 (r797)" "Bioinformatics tools"
|
2017-07-01 11:30:17 +08:00
|
|
|
.SH NAME
|
|
|
|
|
.PP
|
|
|
|
|
minimap2 - mapping and alignment between collections of DNA sequences
|
|
|
|
|
.SH SYNOPSIS
|
2017-07-01 11:48:00 +08:00
|
|
|
* Indexing the target sequences (optional):
|
2017-07-01 11:30:17 +08:00
|
|
|
.RS 4
|
|
|
|
|
minimap2
|
2017-07-01 23:25:54 +08:00
|
|
|
.RB [ -x
|
|
|
|
|
.IR preset ]
|
|
|
|
|
.B -d
|
|
|
|
|
.I target.mmi
|
|
|
|
|
.I target.fa
|
|
|
|
|
.br
|
|
|
|
|
minimap2
|
2017-07-01 11:30:17 +08:00
|
|
|
.RB [ -H ]
|
|
|
|
|
.RB [ -k
|
|
|
|
|
.IR kmer ]
|
|
|
|
|
.RB [ -w
|
|
|
|
|
.IR miniWinSize ]
|
2017-07-01 11:48:00 +08:00
|
|
|
.RB [ -I
|
|
|
|
|
.IR batchSize ]
|
2017-07-01 11:30:17 +08:00
|
|
|
.B -d
|
|
|
|
|
.I target.mmi
|
|
|
|
|
.I target.fa
|
|
|
|
|
.RE
|
|
|
|
|
|
2017-07-01 11:48:00 +08:00
|
|
|
* Long-read alignment with CIGAR:
|
2017-07-01 11:30:17 +08:00
|
|
|
.RS 4
|
|
|
|
|
minimap2
|
2017-07-02 04:54:59 +08:00
|
|
|
.B -a
|
2017-07-01 11:30:17 +08:00
|
|
|
.RB [ -x
|
|
|
|
|
.IR preset ]
|
|
|
|
|
.I target.mmi
|
|
|
|
|
.I query.fa
|
|
|
|
|
>
|
|
|
|
|
.I output.sam
|
|
|
|
|
.br
|
|
|
|
|
minimap2
|
|
|
|
|
.B -c
|
|
|
|
|
.RB [ -H ]
|
|
|
|
|
.RB [ -k
|
|
|
|
|
.IR kmer ]
|
|
|
|
|
.RB [ -w
|
|
|
|
|
.IR miniWinSize ]
|
|
|
|
|
.RB [ ... ]
|
|
|
|
|
.I target.fa
|
|
|
|
|
.I query.fa
|
|
|
|
|
>
|
|
|
|
|
.I output.paf
|
|
|
|
|
.RE
|
|
|
|
|
|
2017-07-01 12:23:46 +08:00
|
|
|
* Long-read overlap without CIGAR:
|
2017-07-01 11:48:00 +08:00
|
|
|
.RS 4
|
|
|
|
|
minimap2
|
|
|
|
|
.B -x
|
2017-07-12 03:12:35 +08:00
|
|
|
ava-ont
|
2017-07-01 11:48:00 +08:00
|
|
|
.RB [ -t
|
|
|
|
|
.IR nThreads ]
|
|
|
|
|
.I target.fa
|
|
|
|
|
.I query.fa
|
|
|
|
|
>
|
|
|
|
|
.I output.paf
|
|
|
|
|
.RE
|
2017-07-01 11:30:17 +08:00
|
|
|
.SH DESCRIPTION
|
|
|
|
|
.PP
|
|
|
|
|
Minimap2 is a fast sequence mapping and alignment program that can find
|
|
|
|
|
overlaps between long noisy reads, or map long reads or their assemblies to a
|
2017-07-01 12:23:46 +08:00
|
|
|
reference genome optionally with detailed alignment (i.e. CIGAR). At present,
|
|
|
|
|
it works efficiently with query sequences from a few kilobases to ~100
|
|
|
|
|
megabases in length at a error rate ~15%. Minimap2 outputs in the PAF or the
|
|
|
|
|
SAM format.
|
|
|
|
|
.SH OPTIONS
|
|
|
|
|
.SS Indexing options
|
|
|
|
|
.TP 10
|
|
|
|
|
.BI -k \ INT
|
2017-07-19 22:11:14 +08:00
|
|
|
Minimizer k-mer length [15]
|
2017-07-01 12:23:46 +08:00
|
|
|
.TP
|
|
|
|
|
.BI -w \ INT
|
|
|
|
|
Minimizer window size [2/3 of k-mer length]. A minimizer is the smallest k-mer
|
|
|
|
|
in a window of w consecutive k-mers.
|
|
|
|
|
.TP
|
|
|
|
|
.B -H
|
|
|
|
|
Use homopolymer-compressed (HPC) minimizers. An HPC sequence is constructed by
|
|
|
|
|
contracting homopolymer runs to a single base. An HPC minimizer is a minimizer
|
|
|
|
|
on the HPC sequence.
|
|
|
|
|
.TP
|
|
|
|
|
.BI -I \ NUM
|
|
|
|
|
Load at most
|
|
|
|
|
.I NUM
|
|
|
|
|
target bases into RAM for indexing [4G]. If there are more than
|
|
|
|
|
.I NUM
|
|
|
|
|
bases in
|
|
|
|
|
.IR target.fa ,
|
|
|
|
|
minimap2 needs to read
|
|
|
|
|
.I query.fa
|
|
|
|
|
multiple times to map it against each batch of target sequences.
|
|
|
|
|
.I NUM
|
|
|
|
|
may be ending with k/K/m/M/g/G. NB: mapping quality is incorrect given a
|
|
|
|
|
multi-part index.
|
|
|
|
|
.TP
|
2017-12-09 02:16:18 +08:00
|
|
|
.B --idx-no-seq
|
|
|
|
|
Don't store target sequences in the index. It saves disk space and memory but
|
|
|
|
|
the index generated with this option will not work with
|
|
|
|
|
.B -a
|
|
|
|
|
or
|
|
|
|
|
.BR -c .
|
|
|
|
|
When base-level alignment is not requested, this option is automatically applied.
|
|
|
|
|
.TP
|
2017-07-01 12:23:46 +08:00
|
|
|
.BI -d \ FILE
|
|
|
|
|
Save the minimizer index of
|
|
|
|
|
.I target.fa
|
|
|
|
|
to
|
|
|
|
|
.I FILE
|
2017-07-01 23:48:37 +08:00
|
|
|
[no dump]. Minimap2 indexing is fast. It can index the human genome in a couple
|
|
|
|
|
of minutes. If even shorter startup time is desired, use this option to save
|
|
|
|
|
the index. Indexing options are fixed in the index file. When an index file is
|
|
|
|
|
provided as the target sequences, options
|
|
|
|
|
.BR -H ,
|
|
|
|
|
.BR -k ,
|
|
|
|
|
.BR -w ,
|
|
|
|
|
.B -I
|
|
|
|
|
will be effectively overridden by the options stored in the index file.
|
2017-07-01 12:23:46 +08:00
|
|
|
.SS Mapping options
|
|
|
|
|
.TP 10
|
2018-03-16 09:59:57 +08:00
|
|
|
.BI -f \ FLOAT | INT1 [, INT2 ]
|
|
|
|
|
If fraction, ignore top
|
2017-07-01 12:23:46 +08:00
|
|
|
.I FLOAT
|
2018-03-16 09:59:57 +08:00
|
|
|
fraction of most frequent minimizers [0.0002]. If integer,
|
|
|
|
|
ignore minimizers occuring more than
|
|
|
|
|
.I INT1
|
|
|
|
|
times.
|
|
|
|
|
.I INT2
|
|
|
|
|
is only effective in the
|
|
|
|
|
.B --sr
|
|
|
|
|
or
|
|
|
|
|
.B -xsr
|
|
|
|
|
mode, which sets the threshold for a second round of seeding.
|
|
|
|
|
.TP
|
|
|
|
|
.BI --min-occ-floor \ INT
|
|
|
|
|
Force minimap2 to always use k-mers occurring
|
|
|
|
|
.I INT
|
|
|
|
|
times or less [0]. In effect, the max occurrence threshold is set to
|
|
|
|
|
the
|
|
|
|
|
.RI max{ INT ,
|
|
|
|
|
.BR -f }.
|
2017-07-01 23:25:54 +08:00
|
|
|
.TP
|
|
|
|
|
.BI -g \ INT
|
2018-02-01 02:59:52 +08:00
|
|
|
Stop chain enlongation if there are no minimizers within
|
2017-07-01 23:25:54 +08:00
|
|
|
.IR INT -bp
|
|
|
|
|
[10000].
|
|
|
|
|
.TP
|
|
|
|
|
.BI -r \ INT
|
2017-10-23 11:13:35 +08:00
|
|
|
Bandwidth used in chaining and DP-based alignment [500]. This option
|
2017-07-01 23:25:54 +08:00
|
|
|
approximately controls the maximum gap size.
|
|
|
|
|
.TP
|
|
|
|
|
.BI -n \ INT
|
|
|
|
|
Discard chains consisting of
|
|
|
|
|
.RI < INT
|
|
|
|
|
number of minimizers [3]
|
|
|
|
|
.TP
|
|
|
|
|
.BI -m \ INT
|
|
|
|
|
Discard chains with chaining score
|
|
|
|
|
.RI < INT
|
2017-07-28 06:50:39 +08:00
|
|
|
[40]. Chaining score equals the approximate number of matching bases minus a
|
2017-07-29 02:33:15 +08:00
|
|
|
concave gap penalty. It is computed with dynamic programming.
|
2017-07-01 23:25:54 +08:00
|
|
|
.TP
|
2018-02-01 02:59:52 +08:00
|
|
|
.B -D
|
|
|
|
|
If query sequence name/length are identical to the target name/length, ignore
|
|
|
|
|
diagonal anchors. This option also reduces DP-based extension along the
|
|
|
|
|
diagonal.
|
|
|
|
|
.TP
|
|
|
|
|
.B -P
|
|
|
|
|
Retain all chains and don't attempt to set primary chains. Options
|
|
|
|
|
.B -p
|
|
|
|
|
and
|
|
|
|
|
.B -N
|
|
|
|
|
have no effect when this option is in use.
|
|
|
|
|
.TP
|
|
|
|
|
.BR --dual = yes | no
|
2018-02-02 04:11:51 +08:00
|
|
|
If
|
|
|
|
|
.BR no ,
|
|
|
|
|
skip query-target pairs wherein the query name is lexicographically greater
|
|
|
|
|
than the target name [yes]
|
2018-02-01 02:59:52 +08:00
|
|
|
.TP
|
2017-07-04 00:11:07 +08:00
|
|
|
.B -X
|
2018-02-01 02:59:52 +08:00
|
|
|
Equivalent to
|
|
|
|
|
.RB ' -DP
|
|
|
|
|
.BR --dual = no
|
|
|
|
|
.BR --no-long-join '.
|
|
|
|
|
Primarily used for all-vs-all read overlapping.
|
2017-07-01 23:25:54 +08:00
|
|
|
.TP
|
|
|
|
|
.BI -p \ FLOAT
|
2017-07-03 10:51:45 +08:00
|
|
|
Minimal secondary-to-primary score ratio to output secondary mappings [0.8].
|
2017-10-23 11:13:35 +08:00
|
|
|
Between two chains overlaping over half of the shorter chain (controlled by
|
2017-07-01 23:25:54 +08:00
|
|
|
.BR --mask-level ),
|
|
|
|
|
the chain with a lower score is secondary to the chain with a higher score.
|
|
|
|
|
If the ratio of the scores is below
|
|
|
|
|
.IR FLOAT ,
|
|
|
|
|
the secondary chain will not be outputted or extended with DP alignment later.
|
2018-02-01 02:59:52 +08:00
|
|
|
This option has no effect when
|
|
|
|
|
.B -X
|
|
|
|
|
is applied.
|
2017-07-03 10:51:45 +08:00
|
|
|
.TP
|
|
|
|
|
.BI -N \ INT
|
|
|
|
|
Output at most
|
|
|
|
|
.I INT
|
2017-07-04 00:11:07 +08:00
|
|
|
secondary alignments [5]. This option has no effect when
|
|
|
|
|
.B -X
|
|
|
|
|
is applied.
|
2017-07-01 23:25:54 +08:00
|
|
|
.TP
|
2017-08-13 00:26:04 +08:00
|
|
|
.BI -G \ NUM
|
2017-10-23 11:13:35 +08:00
|
|
|
Maximum gap on the reference (effective with
|
|
|
|
|
.BR -xsplice / --splice ).
|
|
|
|
|
This option also changes the chaining and alignment band width to
|
2017-08-13 00:26:04 +08:00
|
|
|
.IR NUM .
|
2017-10-23 11:13:35 +08:00
|
|
|
Increasing this option slows down spliced alignment. [200k]
|
|
|
|
|
.TP
|
|
|
|
|
.BI -F \ NUM
|
|
|
|
|
Maximum fragment length (aka insert size; effective with
|
2018-02-01 02:59:52 +08:00
|
|
|
.BR -xsr / --frag = yes )
|
2017-10-23 11:13:35 +08:00
|
|
|
[800]
|
2017-08-13 00:26:04 +08:00
|
|
|
.TP
|
2018-01-19 00:47:11 +08:00
|
|
|
.BI -M \ FLOAT
|
|
|
|
|
Mark as secondary a chain that overlaps with a better chain by
|
|
|
|
|
.I FLOAT
|
|
|
|
|
or more of the shorter chain [0.5]
|
|
|
|
|
.TP
|
2017-07-12 22:08:06 +08:00
|
|
|
.BI --max-chain-skip \ INT
|
|
|
|
|
A heuristics that stops chaining early [50]. Minimap2 uses dynamic programming
|
|
|
|
|
for chaining. The time complexity is quadratic in the number of seeds. This
|
|
|
|
|
option makes minimap2 exits the inner loop if it repeatedly sees seeds already
|
|
|
|
|
on chains. Set
|
|
|
|
|
.I INT
|
|
|
|
|
to a large number to switch off this heurstics.
|
2017-10-23 11:13:35 +08:00
|
|
|
.TP
|
|
|
|
|
.B --no-long-join
|
|
|
|
|
Disable the long gap patching heuristic. When this option is applied, the
|
|
|
|
|
maximum alignment gap is mostly controlled by
|
|
|
|
|
.BR -r .
|
|
|
|
|
.TP
|
2018-06-21 12:04:08 +08:00
|
|
|
.B --lj-min-ratio \ FLOAT
|
|
|
|
|
Fraction of query sequence length required to bridge a long gap [0.5]. A
|
|
|
|
|
smaller value helps to recover longer gaps, at the cost of more false gaps.
|
|
|
|
|
.TP
|
2017-10-23 11:13:35 +08:00
|
|
|
.B --splice
|
|
|
|
|
Enable the splice alignment mode.
|
|
|
|
|
.TP
|
|
|
|
|
.B --sr
|
|
|
|
|
Enable short-read alignment heuristics. In the short-read mode, minimap2
|
|
|
|
|
applies a second round of chaining with a higher minimizer occurrence threshold
|
|
|
|
|
if no good chain is found. In addition, minimap2 attempts to patch gaps between
|
|
|
|
|
seeds with ungapped alignment.
|
|
|
|
|
.TP
|
2018-02-01 02:59:52 +08:00
|
|
|
.BR --frag = no | yes
|
2017-10-23 11:13:35 +08:00
|
|
|
Whether to enable the fragment mode [no]
|
2018-01-16 23:34:30 +08:00
|
|
|
.TP
|
|
|
|
|
.B --for-only
|
|
|
|
|
Only map to the forward strand of the reference sequences. For paired-end
|
|
|
|
|
reads in the forward-reverse orientation, the first read is mapped to forward
|
|
|
|
|
strand of the reference and the second read to the reverse stand.
|
|
|
|
|
.TP
|
|
|
|
|
.B --rev-only
|
|
|
|
|
Only map to the reverse complement strand of the reference sequences.
|
2018-01-27 04:15:40 +08:00
|
|
|
.TP
|
2018-02-01 02:59:52 +08:00
|
|
|
.BR --heap-sort = no | yes
|
2018-01-27 04:15:40 +08:00
|
|
|
If yes, sort anchors with heap merge, instead of radix sort. Heap merge is
|
|
|
|
|
faster for short reads, but slower for long reads. [no]
|
2017-07-01 23:25:54 +08:00
|
|
|
.SS Alignment options
|
|
|
|
|
.TP 10
|
|
|
|
|
.BI -A \ INT
|
2017-07-08 22:26:00 +08:00
|
|
|
Matching score [2]
|
2017-07-01 23:25:54 +08:00
|
|
|
.TP
|
|
|
|
|
.BI -B \ INT
|
2017-07-08 22:26:00 +08:00
|
|
|
Mismatching penalty [4]
|
2017-07-01 23:25:54 +08:00
|
|
|
.TP
|
2017-07-08 22:26:00 +08:00
|
|
|
.BI -O \ INT1[,INT2]
|
|
|
|
|
Gap open penalty [4,24]. If
|
|
|
|
|
.I INT2
|
|
|
|
|
is not specified, it is set to
|
|
|
|
|
.IR INT1 .
|
2017-07-01 23:25:54 +08:00
|
|
|
.TP
|
2017-07-08 22:26:00 +08:00
|
|
|
.BI -E \ INT1[,INT2]
|
|
|
|
|
Gap extension penalty [2,1]. A gap of length
|
|
|
|
|
.I k
|
2017-07-01 23:25:54 +08:00
|
|
|
costs
|
2017-07-08 22:26:00 +08:00
|
|
|
.RI min{ O1 + k * E1 , O2 + k * E2 }.
|
2017-10-23 11:13:35 +08:00
|
|
|
In the splice mode, the second gap penalties are not used.
|
2017-07-01 23:25:54 +08:00
|
|
|
.TP
|
2017-10-28 12:25:01 +08:00
|
|
|
.BI -C \ INT
|
2017-10-29 10:29:55 +08:00
|
|
|
Cost for a non-canonical GT-AG splicing (effective with
|
|
|
|
|
.BR --splice )
|
|
|
|
|
[0]
|
2017-10-28 12:25:01 +08:00
|
|
|
.TP
|
2018-02-16 06:21:09 +08:00
|
|
|
.BI -z \ INT1[,INT2]
|
|
|
|
|
Truncate an alignment if the running alignment score drops too quickly along
|
|
|
|
|
the diagonal of the DP matrix (diagonal X-drop, or Z-drop) [400,200]. If the
|
|
|
|
|
drop of score is above
|
|
|
|
|
.IR INT2 ,
|
|
|
|
|
minimap2 will reverse complement the query in the related region and align
|
|
|
|
|
again to test small inversions. Minimap2 truncates alignment if there is an
|
|
|
|
|
inversion or the drop of score is greater than
|
|
|
|
|
.IR INT1 .
|
|
|
|
|
Decrease
|
|
|
|
|
.I INT2
|
|
|
|
|
to find small inversions at the cost of performance and false positives.
|
|
|
|
|
Increase
|
|
|
|
|
.I INT1
|
|
|
|
|
to improves the contiguity of alignment at the cost of poor alignment in the
|
|
|
|
|
middle.
|
2017-07-01 23:25:54 +08:00
|
|
|
.TP
|
|
|
|
|
.BI -s \ INT
|
|
|
|
|
Minimal peak DP alignment score to output [40]. The peak score is computed from
|
|
|
|
|
the final CIGAR. It is the score of the max scoring segment in the alignment
|
|
|
|
|
and may be different from the total alignment score.
|
2017-08-14 09:37:51 +08:00
|
|
|
.TP
|
|
|
|
|
.BI -u \ CHAR
|
|
|
|
|
How to find canonical splicing sites GT-AG -
|
|
|
|
|
.BR f :
|
|
|
|
|
transcript strand;
|
|
|
|
|
.BR b :
|
|
|
|
|
both strands;
|
|
|
|
|
.BR n :
|
|
|
|
|
no attempt to match GT-AG [n]
|
|
|
|
|
.TP
|
2017-10-23 11:13:35 +08:00
|
|
|
.BI --end-bonus \ INT
|
2017-10-29 10:29:55 +08:00
|
|
|
Score bonus when alignment extends to the end of the query sequence [0].
|
|
|
|
|
.TP
|
2018-05-01 07:55:23 +08:00
|
|
|
.BI --score-N \ INT
|
|
|
|
|
Score of a mismatch involving ambiguous bases [1].
|
|
|
|
|
.TP
|
2018-02-01 02:59:52 +08:00
|
|
|
.BR --splice-flank = yes | no
|
2017-10-29 10:29:55 +08:00
|
|
|
Assume the next base to a
|
|
|
|
|
.B GT
|
|
|
|
|
donor site tends to be A/G (91% in human and 92% in mouse) and the preceding
|
|
|
|
|
base to a
|
|
|
|
|
.B AG
|
2018-02-01 02:59:52 +08:00
|
|
|
acceptor tends to be C/T [no].
|
2017-10-29 10:29:55 +08:00
|
|
|
This trend is evolutionarily conservative, all the way to S. cerevisiae
|
|
|
|
|
(PMID:18688272). Specifying this option generally leads to higher junction
|
|
|
|
|
accuracy by several percents, so it is applied by default with
|
|
|
|
|
.BR --splice .
|
|
|
|
|
However, the SIRV control does not honor this trend
|
|
|
|
|
(only ~60%). This option reduces accuracy. If you are benchmarking minimap2
|
|
|
|
|
on SIRV data, please add
|
|
|
|
|
.B --splice-flank=no
|
|
|
|
|
to the command line.
|
2017-12-11 10:52:07 +08:00
|
|
|
.TP
|
|
|
|
|
.BI --end-seed-pen \ INT
|
|
|
|
|
Drop a terminal anchor if
|
|
|
|
|
.IR s <log( g )+ INT ,
|
|
|
|
|
where
|
|
|
|
|
.I s
|
|
|
|
|
is the local alignment score around the anchor and
|
|
|
|
|
.I g
|
|
|
|
|
the length of the terminal gap in the chain. This option is only effective
|
|
|
|
|
with
|
|
|
|
|
.BR --splice .
|
|
|
|
|
It helps to avoid tiny terminal exons. [6]
|
2017-07-01 23:25:54 +08:00
|
|
|
.SS Input/output options
|
|
|
|
|
.TP 10
|
2017-07-02 04:54:59 +08:00
|
|
|
.B -a
|
2017-07-01 23:25:54 +08:00
|
|
|
Generate CIGAR and output alignments in the SAM format. Minimap2 outputs in PAF
|
|
|
|
|
by default.
|
|
|
|
|
.TP
|
2017-08-17 23:34:09 +08:00
|
|
|
.B -Q
|
|
|
|
|
Ignore base quality in the input file.
|
|
|
|
|
.TP
|
2017-10-23 11:13:35 +08:00
|
|
|
.B -L
|
|
|
|
|
Write CIGAR with >65535 operators at the CG tag. Older tools are unable to
|
|
|
|
|
convert alignments with >65535 CIGAR ops to BAM. This option makes minimap2 SAM
|
|
|
|
|
compatible with older tools. Newer tools recognizes this tag and reconstruct
|
|
|
|
|
the real CIGAR in memory.
|
|
|
|
|
.TP
|
2017-08-17 23:34:09 +08:00
|
|
|
.BI -R \ STR
|
|
|
|
|
SAM read group line in a format like
|
2017-10-29 10:29:55 +08:00
|
|
|
.B @RG\\\\tID:foo\\\\tSM:bar
|
2017-08-17 23:34:09 +08:00
|
|
|
[].
|
|
|
|
|
.TP
|
2018-03-23 22:04:33 +08:00
|
|
|
.B -y
|
|
|
|
|
Copy input FASTA/Q comments to output.
|
|
|
|
|
.TP
|
2017-07-01 23:25:54 +08:00
|
|
|
.B -c
|
|
|
|
|
Generate CIGAR. In PAF, the CIGAR is written to the `cg' custom tag.
|
|
|
|
|
.TP
|
2017-10-06 03:27:37 +08:00
|
|
|
.BI --cs[= STR ]
|
|
|
|
|
Output the
|
|
|
|
|
.B cs
|
|
|
|
|
tag.
|
|
|
|
|
.I STR
|
|
|
|
|
can be either
|
|
|
|
|
.I short
|
|
|
|
|
or
|
|
|
|
|
.IR long .
|
|
|
|
|
If no
|
|
|
|
|
.I STR
|
|
|
|
|
is given,
|
|
|
|
|
.I short
|
|
|
|
|
is assumed. [none]
|
|
|
|
|
.TP
|
2018-03-23 02:15:33 +08:00
|
|
|
.B --MD
|
|
|
|
|
Output the MD tag (see the SAM spec).
|
|
|
|
|
.TP
|
2018-05-31 04:11:22 +08:00
|
|
|
.B --eqx
|
|
|
|
|
Output =/X CIGAR operators for sequence match/mismatch.
|
|
|
|
|
.TP
|
2017-11-10 08:17:45 +08:00
|
|
|
.B -Y
|
|
|
|
|
In SAM output, use soft clipping for supplementary alignments.
|
|
|
|
|
.TP
|
2017-10-23 11:13:35 +08:00
|
|
|
.BI --seed \ INT
|
|
|
|
|
Integer seed for randomizing equally best hits. Minimap2 hashes
|
|
|
|
|
.I INT
|
|
|
|
|
and read name when choosing between equally best hits. [11]
|
|
|
|
|
.TP
|
2017-07-01 23:25:54 +08:00
|
|
|
.BI -t \ INT
|
2017-07-04 00:11:07 +08:00
|
|
|
Number of threads [3]. Minimap2 uses at most three threads when indexing target
|
|
|
|
|
sequences, and uses up to
|
2017-07-01 23:25:54 +08:00
|
|
|
.IR INT +1
|
|
|
|
|
threads when mapping (the extra thread is for I/O, which is frequently idle and
|
|
|
|
|
takes little CPU time).
|
|
|
|
|
.TP
|
2017-10-13 02:56:01 +08:00
|
|
|
.B -2
|
|
|
|
|
Use two I/O threads during mapping. By default, minimap2 uses one I/O thread.
|
|
|
|
|
When I/O is slow (e.g. piping to gzip, or reading from a slow pipe), the I/O
|
|
|
|
|
thread may become the bottleneck. Apply this option to use one thread for input
|
|
|
|
|
and another thread for output, at the cost of increased peak RAM.
|
|
|
|
|
.TP
|
2017-07-13 00:47:46 +08:00
|
|
|
.BI -K \ NUM
|
2017-10-18 01:21:29 +08:00
|
|
|
Number of bases loaded into memory to process in a mini-batch [500M].
|
2017-07-12 22:08:06 +08:00
|
|
|
Similar to option
|
|
|
|
|
.BR -I ,
|
2017-07-13 00:47:46 +08:00
|
|
|
K/M/G/k/m/g suffix is accepted. A large
|
|
|
|
|
.I NUM
|
|
|
|
|
helps load balancing in the multi-threading mode, at the cost of increased
|
2017-10-18 01:21:29 +08:00
|
|
|
memory.
|
2017-07-13 00:47:46 +08:00
|
|
|
.TP
|
2018-02-01 02:59:52 +08:00
|
|
|
.BR --secondary = yes | no
|
2017-10-23 11:13:35 +08:00
|
|
|
Whether to output secondary alignments [yes]
|
|
|
|
|
.TP
|
2017-08-25 10:35:58 +08:00
|
|
|
.B --version
|
2017-07-13 00:47:46 +08:00
|
|
|
Print version number to stdout
|
2017-07-19 22:11:14 +08:00
|
|
|
.SS Preset options
|
|
|
|
|
.TP 10
|
|
|
|
|
.BI -x \ STR
|
|
|
|
|
Preset []. This option applies multiple options at the same time. It should be
|
|
|
|
|
applied before other options because options applied later will overwrite the
|
|
|
|
|
values set by
|
|
|
|
|
.BR -x .
|
|
|
|
|
Available
|
|
|
|
|
.I STR
|
|
|
|
|
are:
|
|
|
|
|
.RS
|
|
|
|
|
.TP 8
|
|
|
|
|
.B map-pb
|
2017-08-09 09:16:25 +08:00
|
|
|
PacBio/Oxford Nanopore read to reference mapping
|
|
|
|
|
.RB ( -Hk19 )
|
2017-07-19 22:11:14 +08:00
|
|
|
.TP
|
|
|
|
|
.B map-ont
|
2017-08-09 09:16:25 +08:00
|
|
|
Slightly more sensitive for Oxford Nanopore to reference mapping
|
|
|
|
|
.RB ( -k15 ).
|
|
|
|
|
For PacBio reads, HPC minimizers consistently leads to faster performance and
|
|
|
|
|
more sensitive results in comparison to normal minimizers. For Oxford Nanopore
|
|
|
|
|
data, normal minimizers are better, though not much. The effectiveness of HPC
|
|
|
|
|
is determined by the sequencing error mode.
|
2017-07-19 22:11:14 +08:00
|
|
|
.TP
|
|
|
|
|
.B asm5
|
2017-08-09 09:16:25 +08:00
|
|
|
Long assembly to reference mapping
|
|
|
|
|
.RB ( -k19
|
2018-03-13 02:32:27 +08:00
|
|
|
.B -w19 -A1 -B19 -O39,81 -E3,1 -s200 -z200
|
|
|
|
|
.BR --min-occ-floor=100 ).
|
2017-07-19 22:11:14 +08:00
|
|
|
Typically, the alignment will not extend to regions with 5% or higher sequence
|
|
|
|
|
divergence. Only use this preset if the average divergence is far below 5%.
|
|
|
|
|
.TP
|
|
|
|
|
.B asm10
|
2017-08-09 09:16:25 +08:00
|
|
|
Long assembly to reference mapping
|
|
|
|
|
.RB ( -k19
|
2018-03-13 02:32:27 +08:00
|
|
|
.B -w19 -A1 -B9 -O16,41 -E2,1 -s200 -z200
|
|
|
|
|
.BR --min-occ-floor=100 ).
|
2017-08-09 09:16:25 +08:00
|
|
|
Up to 10% sequence divergence.
|
2018-03-13 02:32:27 +08:00
|
|
|
.TP
|
|
|
|
|
.B asm20
|
|
|
|
|
Long assembly to reference mapping
|
|
|
|
|
.RB ( -k19
|
|
|
|
|
.B -w10 -A1 -B6 -O6,26 -E2,1 -s200 -z200
|
|
|
|
|
.BR --min-occ-floor=100 ).
|
|
|
|
|
Up to 20% sequence divergence.
|
2017-08-13 00:26:04 +08:00
|
|
|
.TP
|
2017-07-19 22:11:14 +08:00
|
|
|
.B ava-pb
|
2017-08-09 09:16:25 +08:00
|
|
|
PacBio all-vs-all overlap mapping
|
|
|
|
|
.RB ( -Hk19
|
2018-02-01 02:59:52 +08:00
|
|
|
.B -Xw5 -m100 -g10000 --max-chain-skip
|
2017-08-09 09:16:25 +08:00
|
|
|
.BR 25 ).
|
2017-08-13 00:26:04 +08:00
|
|
|
.TP
|
2017-07-19 22:11:14 +08:00
|
|
|
.B ava-ont
|
2017-08-09 09:16:25 +08:00
|
|
|
Oxford Nanopore all-vs-all overlap mapping
|
|
|
|
|
.RB ( -k15
|
2018-03-23 22:15:23 +08:00
|
|
|
.B -Xw5 -m100 -g10000 -r2000 --max-chain-skip
|
2017-08-09 09:16:25 +08:00
|
|
|
.BR 25 ).
|
|
|
|
|
Similarly, the major difference from
|
2017-07-19 22:11:14 +08:00
|
|
|
.B ava-pb
|
|
|
|
|
is that this preset is not using HPC minimizers.
|
2017-08-13 00:26:04 +08:00
|
|
|
.TP
|
|
|
|
|
.B splice
|
|
|
|
|
Long-read spliced alignment
|
|
|
|
|
.RB ( -k15
|
2017-10-29 10:29:55 +08:00
|
|
|
.B -w5 --splice -g2000 -G200k -A1 -B2 -O2,32 -E1,0 -C9 -z200 -ub
|
|
|
|
|
.BR --splice-flank=yes ).
|
2017-08-14 09:37:51 +08:00
|
|
|
In the splice mode, 1) long deletions are taken as introns and represented as
|
|
|
|
|
the
|
2017-08-13 00:26:04 +08:00
|
|
|
.RB ` N '
|
|
|
|
|
CIGAR operator; 2) long insertions are disabled; 3) deletion and insertion gap
|
|
|
|
|
costs are different during chaining; 4) the computation of the
|
|
|
|
|
.RB ` ms '
|
|
|
|
|
tag ignores introns to demote hits to pseudogenes.
|
2017-09-15 00:57:21 +08:00
|
|
|
.TP
|
|
|
|
|
.B sr
|
|
|
|
|
Short single-end reads without splicing
|
|
|
|
|
.RB ( -k21
|
2018-02-01 02:59:52 +08:00
|
|
|
.B -w11 --sr --frag=yes -A2 -B8 -O12,32 -E2,1 -r50 -p.5 -N20 -f1000,5000 -n2 -m20
|
|
|
|
|
.B -s40 -g200 -2K50m --heap-sort=yes
|
2017-10-23 11:13:35 +08:00
|
|
|
.BR --secondary=no ).
|
2017-07-19 22:11:14 +08:00
|
|
|
.RE
|
2017-07-04 00:11:07 +08:00
|
|
|
.SS Miscellaneous options
|
|
|
|
|
.TP 10
|
|
|
|
|
.B --no-kalloc
|
|
|
|
|
Use the libc default allocator instead of the kalloc thread-local allocator.
|
|
|
|
|
This debugging option is mostly used with Valgrind to detect invalid memory
|
|
|
|
|
accesses. Minimap2 runs slower with this option, especially in the
|
|
|
|
|
multi-threading mode.
|
|
|
|
|
.TP
|
|
|
|
|
.B --print-qname
|
|
|
|
|
Print query names to stderr, mostly to see which query is crashing minimap2.
|
2017-07-12 22:08:06 +08:00
|
|
|
.TP
|
2017-10-23 11:13:35 +08:00
|
|
|
.B --print-seeds
|
2017-07-12 22:08:06 +08:00
|
|
|
Print seed positions to stderr, for debugging only.
|
2017-07-01 12:23:46 +08:00
|
|
|
.SH OUTPUT FORMAT
|
2017-07-01 23:25:54 +08:00
|
|
|
.PP
|
|
|
|
|
Minimap2 outputs mapping positions in the Pairwise mApping Format (PAF) by
|
|
|
|
|
default. PAF is a TAB-delimited text format with each line consisting of at
|
|
|
|
|
least 12 fields as are described in the following table:
|
2017-07-01 12:23:46 +08:00
|
|
|
.TS
|
|
|
|
|
center box;
|
|
|
|
|
cb | cb | cb
|
|
|
|
|
r | c | l .
|
|
|
|
|
Col Type Description
|
|
|
|
|
_
|
|
|
|
|
1 string Query sequence name
|
|
|
|
|
2 int Query sequence length
|
|
|
|
|
3 int Query start coordinate (0-based)
|
|
|
|
|
4 int Query end coordinate (0-based)
|
2017-07-01 23:54:07 +08:00
|
|
|
5 char `+' if query/target on the same strand; `-' if opposite
|
2017-07-01 12:23:46 +08:00
|
|
|
6 string Target sequence name
|
|
|
|
|
7 int Target sequence length
|
|
|
|
|
8 int Target start coordinate on the original strand
|
|
|
|
|
9 int Target end coordinate on the original strand
|
|
|
|
|
10 int Number of matching bases in the mapping
|
|
|
|
|
11 int Number bases, including gaps, in the mapping
|
|
|
|
|
12 int Mapping quality (0-255 with 255 for missing)
|
|
|
|
|
.TE
|
2017-07-01 23:32:49 +08:00
|
|
|
|
2017-07-01 23:25:54 +08:00
|
|
|
.PP
|
|
|
|
|
When alignment is available, column 11 gives the total number of sequence
|
|
|
|
|
matches, mismatches and gaps in the alignment; column 10 divided by column 11
|
|
|
|
|
gives the BLAST-like alignment identity. When alignment is unavailable,
|
|
|
|
|
these two columns are approximate. PAF may optionally have additional fields in
|
|
|
|
|
the SAM-like typed key-value format. Minimap2 may output the following tags:
|
|
|
|
|
.TS
|
|
|
|
|
center box;
|
|
|
|
|
cb | cb | cb
|
|
|
|
|
r | c | l .
|
|
|
|
|
Tag Type Description
|
|
|
|
|
_
|
2017-12-09 02:16:18 +08:00
|
|
|
tp A Type of aln: P/primary, S/secondary and I,i/inversion
|
2017-07-01 23:25:54 +08:00
|
|
|
cm i Number of minimizers on the chain
|
|
|
|
|
s1 i Chaining score
|
|
|
|
|
s2 i Chaining score of the best secondary chain
|
|
|
|
|
NM i Total number of mismatches and gaps in the alignment
|
2018-03-23 02:15:33 +08:00
|
|
|
MD Z To generate the ref sequence in the alignment
|
2017-07-01 23:25:54 +08:00
|
|
|
AS i DP alignment score
|
|
|
|
|
ms i DP score of the max scoring segment in the alignment
|
|
|
|
|
nn i Number of ambiguous bases in the alignment
|
2017-09-18 05:06:39 +08:00
|
|
|
ts A Transcript strand (splice mode only)
|
2017-07-01 23:25:54 +08:00
|
|
|
cg Z CIGAR string (only in PAF)
|
2017-10-06 03:27:37 +08:00
|
|
|
cs Z Difference string
|
2018-04-25 00:48:54 +08:00
|
|
|
dv f Approximate per-base sequence divergence
|
2017-10-06 03:27:37 +08:00
|
|
|
.TE
|
|
|
|
|
|
|
|
|
|
.PP
|
|
|
|
|
The
|
|
|
|
|
.B cs
|
|
|
|
|
tag encodes difference sequences in the short form or the entire query
|
|
|
|
|
.I AND
|
|
|
|
|
reference sequences in the long form. It consists of a series of operations:
|
|
|
|
|
.TS
|
|
|
|
|
center box;
|
|
|
|
|
cb | cb |cb
|
|
|
|
|
r | l | l .
|
|
|
|
|
Op Regex Description
|
|
|
|
|
_
|
|
|
|
|
= [ACGTN]+ Identical sequence (long form)
|
|
|
|
|
: [0-9]+ Identical sequence length
|
|
|
|
|
* [acgtn][acgtn] Substitution: ref to query
|
|
|
|
|
+ [acgtn]+ Insertion to the reference
|
|
|
|
|
- [acgtn]+ Deletion from the reference
|
|
|
|
|
~ [acgtn]{2}[0-9]+[acgtn]{2} Intron length and splice signal
|
2017-07-01 23:25:54 +08:00
|
|
|
.TE
|
2017-07-01 23:32:49 +08:00
|
|
|
|
2017-07-02 00:11:56 +08:00
|
|
|
.SH LIMITATIONS
|
2017-07-18 23:04:09 +08:00
|
|
|
.TP 2
|
|
|
|
|
*
|
2017-07-29 02:33:15 +08:00
|
|
|
Minimap2 may produce suboptimal alignments through long low-complexity regions
|
2017-07-30 01:09:10 +08:00
|
|
|
where seed positions may be suboptimal. This should not be a big concern
|
|
|
|
|
because even the optimal alignment may be wrong in such regions.
|
2017-07-18 23:04:09 +08:00
|
|
|
.TP
|
|
|
|
|
*
|
2018-02-01 02:59:52 +08:00
|
|
|
Minimap2 requires SSE2 or NEON instructions to compile. It is possible to add
|
|
|
|
|
non-SSE2/NEON support, but it would make minimap2 slower by several times.
|
2017-07-01 12:23:46 +08:00
|
|
|
.SH SEE ALSO
|
|
|
|
|
.PP
|
|
|
|
|
miniasm(1), minimap(1), bwa(1).
|