332 lines
8.8 KiB
Groff
332 lines
8.8 KiB
Groff
.TH minimap2 1 "12 July 2017" "minimap2-2.0-r174-pre" "Bioinformatics tools"
|
|
.SH NAME
|
|
.PP
|
|
minimap2 - mapping and alignment between collections of DNA sequences
|
|
.SH SYNOPSIS
|
|
* Indexing the target sequences (optional):
|
|
.RS 4
|
|
minimap2
|
|
.RB [ -x
|
|
.IR preset ]
|
|
.B -d
|
|
.I target.mmi
|
|
.I target.fa
|
|
.br
|
|
minimap2
|
|
.RB [ -H ]
|
|
.RB [ -k
|
|
.IR kmer ]
|
|
.RB [ -w
|
|
.IR miniWinSize ]
|
|
.RB [ -I
|
|
.IR batchSize ]
|
|
.B -d
|
|
.I target.mmi
|
|
.I target.fa
|
|
.RE
|
|
|
|
* Long-read alignment with CIGAR:
|
|
.RS 4
|
|
minimap2
|
|
.B -a
|
|
.RB [ -x
|
|
.IR preset ]
|
|
.I target.mmi
|
|
.I query.fa
|
|
>
|
|
.I output.sam
|
|
.br
|
|
minimap2
|
|
.B -c
|
|
.RB [ -H ]
|
|
.RB [ -k
|
|
.IR kmer ]
|
|
.RB [ -w
|
|
.IR miniWinSize ]
|
|
.RB [ ... ]
|
|
.I target.fa
|
|
.I query.fa
|
|
>
|
|
.I output.paf
|
|
.RE
|
|
|
|
* Long-read overlap without CIGAR:
|
|
.RS 4
|
|
minimap2
|
|
.B -x
|
|
ava-ont
|
|
.RB [ -t
|
|
.IR nThreads ]
|
|
.I target.fa
|
|
.I query.fa
|
|
>
|
|
.I output.paf
|
|
.RE
|
|
.SH DESCRIPTION
|
|
.PP
|
|
Minimap2 is a fast sequence mapping and alignment program that can find
|
|
overlaps between long noisy reads, or map long reads or their assemblies to a
|
|
reference genome optionally with detailed alignment (i.e. CIGAR). At present,
|
|
it works efficiently with query sequences from a few kilobases to ~100
|
|
megabases in length at a error rate ~15%. Minimap2 outputs in the PAF or the
|
|
SAM format.
|
|
.SH OPTIONS
|
|
.SS Indexing options
|
|
.TP 10
|
|
.BI -k \ INT
|
|
Minimizer k-mer length [17]
|
|
.TP
|
|
.BI -w \ INT
|
|
Minimizer window size [2/3 of k-mer length]. A minimizer is the smallest k-mer
|
|
in a window of w consecutive k-mers.
|
|
.TP
|
|
.B -H
|
|
Use homopolymer-compressed (HPC) minimizers. An HPC sequence is constructed by
|
|
contracting homopolymer runs to a single base. An HPC minimizer is a minimizer
|
|
on the HPC sequence.
|
|
.TP
|
|
.BI -I \ NUM
|
|
Load at most
|
|
.I NUM
|
|
target bases into RAM for indexing [4G]. If there are more than
|
|
.I NUM
|
|
bases in
|
|
.IR target.fa ,
|
|
minimap2 needs to read
|
|
.I query.fa
|
|
multiple times to map it against each batch of target sequences.
|
|
.I NUM
|
|
may be ending with k/K/m/M/g/G. NB: mapping quality is incorrect given a
|
|
multi-part index.
|
|
.TP
|
|
.BI -d \ FILE
|
|
Save the minimizer index of
|
|
.I target.fa
|
|
to
|
|
.I FILE
|
|
[no dump]. Minimap2 indexing is fast. It can index the human genome in a couple
|
|
of minutes. If even shorter startup time is desired, use this option to save
|
|
the index. Indexing options are fixed in the index file. When an index file is
|
|
provided as the target sequences, options
|
|
.BR -H ,
|
|
.BR -k ,
|
|
.BR -w ,
|
|
.B -I
|
|
will be effectively overridden by the options stored in the index file.
|
|
.SS Mapping options
|
|
.TP 10
|
|
.BI -f \ FLOAT
|
|
Ignore top
|
|
.I FLOAT
|
|
fraction of most frequent minimizers [0.0002]
|
|
.TP
|
|
.BI -g \ INT
|
|
Stop chain enlongation if there are no minimizers in
|
|
.IR INT -bp
|
|
[10000].
|
|
.TP
|
|
.BI -r \ INT
|
|
Bandwidth used in chaining and DP-based alignment [1000]. This option
|
|
approximately controls the maximum gap size.
|
|
.TP
|
|
.BI -n \ INT
|
|
Discard chains consisting of
|
|
.RI < INT
|
|
number of minimizers [3]
|
|
.TP
|
|
.BI -m \ INT
|
|
Discard chains with chaining score
|
|
.RI < INT
|
|
[40]. Chaining score equals the approximate number of matching bases (exact if
|
|
not using
|
|
.BR -H )
|
|
minus base-2 logarithm gap penalty. It is computed with dynamic programming.
|
|
.TP
|
|
.B -X
|
|
Perform all-vs-all mapping. In this mode, if the query sequence name is
|
|
lexicographically larger than the target sequence name, the hits between them
|
|
will be suppressed; if the query sequence name is the same as the target name,
|
|
diagonal minimizer hits will also be suppressed.
|
|
.TP
|
|
.BI -p \ FLOAT
|
|
Minimal secondary-to-primary score ratio to output secondary mappings [0.8].
|
|
Between two chains overlaping over half of the shorter chain (controled by
|
|
.BR --mask-level ),
|
|
the chain with a lower score is secondary to the chain with a higher score.
|
|
If the ratio of the scores is below
|
|
.IR FLOAT ,
|
|
the secondary chain will not be outputted or extended with DP alignment later.
|
|
.TP
|
|
.BI -N \ INT
|
|
Output at most
|
|
.I INT
|
|
secondary alignments [5]. This option has no effect when
|
|
.B -X
|
|
is applied.
|
|
.TP
|
|
.BI -D \ FLOAT
|
|
Discard a chain if the fraction of matching bases over the length of
|
|
query/target sequences in the chain is
|
|
.RI < FLOAT
|
|
[0].
|
|
.TP
|
|
.BI -x \ STR
|
|
Preset []. This option applies multiple options at the same time. It should be
|
|
applied before other options because options applied later will overwrite the
|
|
values set by
|
|
.BR -x .
|
|
Available
|
|
.I STR
|
|
are:
|
|
.RS
|
|
.TP 8
|
|
.B ava-pb
|
|
PacBio all-vs-all overlap mapping (-Hk19 -w5 -Xp0 -m100 -D.05 -g10000)
|
|
.TP 8
|
|
.B ava-ont
|
|
Oxford Nanopore all-vs-all overlap mapping (-k15 -w5 -Xp0 -m100 -D.05 -g10000)
|
|
.TP
|
|
.B map10k
|
|
PacBio/Oxford Nanopore read to reference mapping (-Hk19)
|
|
.TP
|
|
.B asm1m
|
|
Long assembly to reference mapping (-k19 -w19)
|
|
.RE
|
|
.TP
|
|
.BI --max-chain-skip \ INT
|
|
A heuristics that stops chaining early [50]. Minimap2 uses dynamic programming
|
|
for chaining. The time complexity is quadratic in the number of seeds. This
|
|
option makes minimap2 exits the inner loop if it repeatedly sees seeds already
|
|
on chains. Set
|
|
.I INT
|
|
to a large number to switch off this heurstics.
|
|
.SS Alignment options
|
|
.TP 10
|
|
.BI -A \ INT
|
|
Matching score [2]
|
|
.TP
|
|
.BI -B \ INT
|
|
Mismatching penalty [4]
|
|
.TP
|
|
.BI -O \ INT1[,INT2]
|
|
Gap open penalty [4,24]. If
|
|
.I INT2
|
|
is not specified, it is set to
|
|
.IR INT1 .
|
|
.TP
|
|
.BI -E \ INT1[,INT2]
|
|
Gap extension penalty [2,1]. A gap of length
|
|
.I k
|
|
costs
|
|
.RI min{ O1 + k * E1 , O2 + k * E2 }.
|
|
.TP
|
|
.BI -z \ INT
|
|
Break an alignment if the running score drops too quickly along the diagonal of
|
|
the DP matrix (diagonal X-drop, or Z-drop) [400]. Increasing the value improves
|
|
the contiguity of the alignment at the cost of poor alignment in the middle
|
|
(e.g. caused by a long inversion).
|
|
.TP
|
|
.BI -s \ INT
|
|
Minimal peak DP alignment score to output [40]. The peak score is computed from
|
|
the final CIGAR. It is the score of the max scoring segment in the alignment
|
|
and may be different from the total alignment score.
|
|
.SS Input/output options
|
|
.TP 10
|
|
.B -Q
|
|
Ignore base quality in the input file.
|
|
.TP
|
|
.B -a
|
|
Generate CIGAR and output alignments in the SAM format. Minimap2 outputs in PAF
|
|
by default.
|
|
.TP
|
|
.B -c
|
|
Generate CIGAR. In PAF, the CIGAR is written to the `cg' custom tag.
|
|
.TP
|
|
.BI -t \ INT
|
|
Number of threads [3]. Minimap2 uses at most three threads when indexing target
|
|
sequences, and uses up to
|
|
.IR INT +1
|
|
threads when mapping (the extra thread is for I/O, which is frequently idle and
|
|
takes little CPU time).
|
|
.TP
|
|
.B -V
|
|
Print version number to stdout
|
|
.TP
|
|
.BI --mb-size \ STR
|
|
Number of bases loaded into memory to process in a mini-batch [200M].
|
|
Similar to option
|
|
.BR -I ,
|
|
K/M/G/k/m/g suffix is accepted. This option affects both indexing and mapping.
|
|
.SS Miscellaneous options
|
|
.TP 10
|
|
.B --no-kalloc
|
|
Use the libc default allocator instead of the kalloc thread-local allocator.
|
|
This debugging option is mostly used with Valgrind to detect invalid memory
|
|
accesses. Minimap2 runs slower with this option, especially in the
|
|
multi-threading mode.
|
|
.TP
|
|
.B --print-qname
|
|
Print query names to stderr, mostly to see which query is crashing minimap2.
|
|
.TP
|
|
.B --print-seed
|
|
Print seed positions to stderr, for debugging only.
|
|
.SH OUTPUT FORMAT
|
|
.PP
|
|
Minimap2 outputs mapping positions in the Pairwise mApping Format (PAF) by
|
|
default. PAF is a TAB-delimited text format with each line consisting of at
|
|
least 12 fields as are described in the following table:
|
|
.TS
|
|
center box;
|
|
cb | cb | cb
|
|
r | c | l .
|
|
Col Type Description
|
|
_
|
|
1 string Query sequence name
|
|
2 int Query sequence length
|
|
3 int Query start coordinate (0-based)
|
|
4 int Query end coordinate (0-based)
|
|
5 char `+' if query/target on the same strand; `-' if opposite
|
|
6 string Target sequence name
|
|
7 int Target sequence length
|
|
8 int Target start coordinate on the original strand
|
|
9 int Target end coordinate on the original strand
|
|
10 int Number of matching bases in the mapping
|
|
11 int Number bases, including gaps, in the mapping
|
|
12 int Mapping quality (0-255 with 255 for missing)
|
|
.TE
|
|
|
|
.PP
|
|
When alignment is available, column 11 gives the total number of sequence
|
|
matches, mismatches and gaps in the alignment; column 10 divided by column 11
|
|
gives the BLAST-like alignment identity. When alignment is unavailable,
|
|
these two columns are approximate. PAF may optionally have additional fields in
|
|
the SAM-like typed key-value format. Minimap2 may output the following tags:
|
|
.TS
|
|
center box;
|
|
cb | cb | cb
|
|
r | c | l .
|
|
Tag Type Description
|
|
_
|
|
cm i Number of minimizers on the chain
|
|
s1 i Chaining score
|
|
s2 i Chaining score of the best secondary chain
|
|
NM i Total number of mismatches and gaps in the alignment
|
|
AS i DP alignment score
|
|
ms i DP score of the max scoring segment in the alignment
|
|
nn i Number of ambiguous bases in the alignment
|
|
cg Z CIGAR string (only in PAF)
|
|
.TE
|
|
|
|
.SH LIMITATIONS
|
|
.PP
|
|
At the alignment phase, minimap2 performs global alignments between minimizer
|
|
hits. If the positions of these minimizer hits are incorrect, the final alignment
|
|
may be suboptimal or broken due to the Z-drop heuristics. In addition, in the
|
|
event of a long insertion/deletion, minimap2 may split the long event into
|
|
a few smaller events. We will address these issues in future.
|
|
.PP
|
|
Minimap2 does not work well with Illumina short reads as of now.
|
|
.SH SEE ALSO
|
|
.PP
|
|
miniasm(1), minimap(1), bwa(1).
|