712 lines
19 KiB
Groff
712 lines
19 KiB
Groff
.TH minimap2 1 "6 July 2021" "minimap2-2.21 (r1071)" "Bioinformatics tools"
|
|
.SH NAME
|
|
.PP
|
|
minimap2 - mapping and alignment between collections of DNA sequences
|
|
.SH SYNOPSIS
|
|
* Indexing the target sequences (optional):
|
|
.RS 4
|
|
minimap2
|
|
.RB [ -x
|
|
.IR preset ]
|
|
.B -d
|
|
.I target.mmi
|
|
.I target.fa
|
|
.br
|
|
minimap2
|
|
.RB [ -H ]
|
|
.RB [ -k
|
|
.IR kmer ]
|
|
.RB [ -w
|
|
.IR miniWinSize ]
|
|
.RB [ -I
|
|
.IR batchSize ]
|
|
.B -d
|
|
.I target.mmi
|
|
.I target.fa
|
|
.RE
|
|
|
|
* Long-read alignment with CIGAR:
|
|
.RS 4
|
|
minimap2
|
|
.B -a
|
|
.RB [ -x
|
|
.IR preset ]
|
|
.I target.mmi
|
|
.I query.fa
|
|
>
|
|
.I output.sam
|
|
.br
|
|
minimap2
|
|
.B -c
|
|
.RB [ -H ]
|
|
.RB [ -k
|
|
.IR kmer ]
|
|
.RB [ -w
|
|
.IR miniWinSize ]
|
|
.RB [ ... ]
|
|
.I target.fa
|
|
.I query.fa
|
|
>
|
|
.I output.paf
|
|
.RE
|
|
|
|
* Long-read overlap without CIGAR:
|
|
.RS 4
|
|
minimap2
|
|
.B -x
|
|
ava-ont
|
|
.RB [ -t
|
|
.IR nThreads ]
|
|
.I target.fa
|
|
.I query.fa
|
|
>
|
|
.I output.paf
|
|
.RE
|
|
.SH DESCRIPTION
|
|
.PP
|
|
Minimap2 is a fast sequence mapping and alignment program that can find
|
|
overlaps between long noisy reads, or map long reads or their assemblies to a
|
|
reference genome optionally with detailed alignment (i.e. CIGAR). At present,
|
|
it works efficiently with query sequences from a few kilobases to ~100
|
|
megabases in length at a error rate ~15%. Minimap2 outputs in the PAF or the
|
|
SAM format.
|
|
.SH OPTIONS
|
|
.SS Indexing options
|
|
.TP 10
|
|
.BI -k \ INT
|
|
Minimizer k-mer length [15]
|
|
.TP
|
|
.BI -w \ INT
|
|
Minimizer window size [2/3 of k-mer length]. A minimizer is the smallest k-mer
|
|
in a window of w consecutive k-mers.
|
|
.TP
|
|
.B -H
|
|
Use homopolymer-compressed (HPC) minimizers. An HPC sequence is constructed by
|
|
contracting homopolymer runs to a single base. An HPC minimizer is a minimizer
|
|
on the HPC sequence.
|
|
.TP
|
|
.BI -I \ NUM
|
|
Load at most
|
|
.I NUM
|
|
target bases into RAM for indexing [4G]. If there are more than
|
|
.I NUM
|
|
bases in
|
|
.IR target.fa ,
|
|
minimap2 needs to read
|
|
.I query.fa
|
|
multiple times to map it against each batch of target sequences.
|
|
.I NUM
|
|
may be ending with k/K/m/M/g/G. NB: mapping quality is incorrect given a
|
|
multi-part index.
|
|
.TP
|
|
.B --idx-no-seq
|
|
Don't store target sequences in the index. It saves disk space and memory but
|
|
the index generated with this option will not work with
|
|
.B -a
|
|
or
|
|
.BR -c .
|
|
When base-level alignment is not requested, this option is automatically applied.
|
|
.TP
|
|
.BI -d \ FILE
|
|
Save the minimizer index of
|
|
.I target.fa
|
|
to
|
|
.I FILE
|
|
[no dump]. Minimap2 indexing is fast. It can index the human genome in a couple
|
|
of minutes. If even shorter startup time is desired, use this option to save
|
|
the index. Indexing options are fixed in the index file. When an index file is
|
|
provided as the target sequences, options
|
|
.BR -H ,
|
|
.BR -k ,
|
|
.BR -w ,
|
|
.B -I
|
|
will be effectively overridden by the options stored in the index file.
|
|
.TP
|
|
.BI --alt \ FILE
|
|
List of ALT contigs [null]
|
|
.TP
|
|
.BI --alt-drop \ FLOAT
|
|
Drop ALT hits by
|
|
.I FLOAT
|
|
fraction when ranking and computing mapping quality [0.15]
|
|
.SS Mapping options
|
|
.TP 10
|
|
.BI -f \ FLOAT | INT1 [, INT2 ]
|
|
If fraction, ignore top
|
|
.I FLOAT
|
|
fraction of most frequent minimizers [0.0002]. If integer,
|
|
ignore minimizers occuring more than
|
|
.I INT1
|
|
times.
|
|
.I INT2
|
|
is only effective in the
|
|
.B --sr
|
|
or
|
|
.B -xsr
|
|
mode, which sets the threshold for a second round of seeding.
|
|
.TP
|
|
.BI -U \ INT1 [, INT2 ]
|
|
Lower and upper bounds of k-mer occurrences [10,1000000]. The final k-mer occurrence threshold is
|
|
.RI max{ INT1 ,\ min{ INT2 ,
|
|
.BR -f }}.
|
|
This option prevents excessively small or large
|
|
.B -f
|
|
estimated from the input reference. It deprecates
|
|
.B --min-occ-floor
|
|
in earlier versions of minimap2.
|
|
.TP
|
|
.BI -e \ INT
|
|
Sample a high-frequency minimizer every
|
|
.I INT
|
|
basepairs [500].
|
|
.TP
|
|
.BI -g \ NUM
|
|
Stop chain enlongation if there are no minimizers within
|
|
.IR NUM -bp
|
|
[10k].
|
|
.TP
|
|
.BI -r \ NUM1 [, NUM2 ]
|
|
Bandwidth for chaining and base alignment [500,20k].
|
|
.I NUM1
|
|
is used for initial chaining and alignment extension;
|
|
.I NUM2
|
|
for RMQ-based re-chaining and closing gaps in alignments.
|
|
.TP
|
|
.BI -n \ INT
|
|
Discard chains consisting of
|
|
.RI < INT
|
|
number of minimizers [3]
|
|
.TP
|
|
.BI -m \ INT
|
|
Discard chains with chaining score
|
|
.RI < INT
|
|
[40]. Chaining score equals the approximate number of matching bases minus a
|
|
concave gap penalty. It is computed with dynamic programming.
|
|
.TP
|
|
.B -D
|
|
If query sequence name/length are identical to the target name/length, ignore
|
|
diagonal anchors. This option also reduces DP-based extension along the
|
|
diagonal.
|
|
.TP
|
|
.B -P
|
|
Retain all chains and don't attempt to set primary chains. Options
|
|
.B -p
|
|
and
|
|
.B -N
|
|
have no effect when this option is in use.
|
|
.TP
|
|
.BR --dual = yes | no
|
|
If
|
|
.BR no ,
|
|
skip query-target pairs wherein the query name is lexicographically greater
|
|
than the target name [yes]
|
|
.TP
|
|
.B -X
|
|
Equivalent to
|
|
.RB ' -DP
|
|
.BR --dual = no
|
|
.BR --no-long-join '.
|
|
Primarily used for all-vs-all read overlapping.
|
|
.TP
|
|
.BI -p \ FLOAT
|
|
Minimal secondary-to-primary score ratio to output secondary mappings [0.8].
|
|
Between two chains overlaping over half of the shorter chain (controlled by
|
|
.BR -M ),
|
|
the chain with a lower score is secondary to the chain with a higher score.
|
|
If the ratio of the scores is below
|
|
.IR FLOAT ,
|
|
the secondary chain will not be outputted or extended with DP alignment later.
|
|
This option has no effect when
|
|
.B -X
|
|
is applied.
|
|
.TP
|
|
.BI -N \ INT
|
|
Output at most
|
|
.I INT
|
|
secondary alignments [5]. This option has no effect when
|
|
.B -X
|
|
is applied.
|
|
.TP
|
|
.BI -G \ NUM
|
|
Maximum gap on the reference (effective with
|
|
.BR -xsplice / --splice ).
|
|
This option also changes the chaining and alignment band width to
|
|
.IR NUM .
|
|
Increasing this option slows down spliced alignment. [200k]
|
|
.TP
|
|
.BI -F \ NUM
|
|
Maximum fragment length (aka insert size; effective with
|
|
.BR -xsr / --frag = yes )
|
|
[800]
|
|
.TP
|
|
.BI -M \ FLOAT
|
|
Mark as secondary a chain that overlaps with a better chain by
|
|
.I FLOAT
|
|
or more of the shorter chain [0.5]
|
|
.TP
|
|
.BR --rmq = no | yes
|
|
Use the minigraph chaining algorithm [no]. The minigraph algorithm is better
|
|
for aligning contigs through long INDELs.
|
|
.TP
|
|
.B --hard-mask-level
|
|
Honor option
|
|
.B -M
|
|
and disable a heurstic to save unmapped subsequences and disables
|
|
.BR --mask-len .
|
|
.TP
|
|
.BI --mask-len \ NUM
|
|
Keep an alignment if dropping it leaves an unaligned region on query longer than
|
|
.IR INT
|
|
[inf]. Effective without
|
|
.BR --hard-mask-level .
|
|
.TP
|
|
.BI --max-chain-skip \ INT
|
|
A heuristics that stops chaining early [25]. Minimap2 uses dynamic programming
|
|
for chaining. The time complexity is quadratic in the number of seeds. This
|
|
option makes minimap2 exits the inner loop if it repeatedly sees seeds already
|
|
on chains. Set
|
|
.I INT
|
|
to a large number to switch off this heurstics.
|
|
.TP
|
|
.BI --max-chain-iter \ INT
|
|
Check up to
|
|
.I INT
|
|
partial chains during chaining [5000]. This is a heuristic to avoid quadratic
|
|
time complexity in the worst case.
|
|
.TP
|
|
.BI --chain-gap-scale \ FLOAT
|
|
Scale of gap cost during chaining [1.0]
|
|
.TP
|
|
.B --no-long-join
|
|
Disable the long gap patching heuristic. When this option is applied, the
|
|
maximum alignment gap is mostly controlled by
|
|
.BR -r .
|
|
.TP
|
|
.B --splice
|
|
Enable the splice alignment mode.
|
|
.TP
|
|
.B --sr
|
|
Enable short-read alignment heuristics. In the short-read mode, minimap2
|
|
applies a second round of chaining with a higher minimizer occurrence threshold
|
|
if no good chain is found. In addition, minimap2 attempts to patch gaps between
|
|
seeds with ungapped alignment.
|
|
.TP
|
|
.BI --split-prefix \ STR
|
|
Prefix to create temporary files. Typically used for a multi-part index.
|
|
.TP
|
|
.BR --frag = no | yes
|
|
Whether to enable the fragment mode [no]
|
|
.TP
|
|
.B --for-only
|
|
Only map to the forward strand of the reference sequences. For paired-end
|
|
reads in the forward-reverse orientation, the first read is mapped to forward
|
|
strand of the reference and the second read to the reverse stand.
|
|
.TP
|
|
.B --rev-only
|
|
Only map to the reverse complement strand of the reference sequences.
|
|
.TP
|
|
.BR --heap-sort = no | yes
|
|
If yes, sort anchors with heap merge, instead of radix sort. Heap merge is
|
|
faster for short reads, but slower for long reads. [no]
|
|
.TP
|
|
.B --no-pairing
|
|
Treat two reads in a pair as independent reads. The mate related fields in SAM
|
|
are still properly populated.
|
|
.SS Alignment options
|
|
.TP 10
|
|
.BI -A \ INT
|
|
Matching score [2]
|
|
.TP
|
|
.BI -B \ INT
|
|
Mismatching penalty [4]
|
|
.TP
|
|
.BI -O \ INT1[,INT2]
|
|
Gap open penalty [4,24]. If
|
|
.I INT2
|
|
is not specified, it is set to
|
|
.IR INT1 .
|
|
.TP
|
|
.BI -E \ INT1[,INT2]
|
|
Gap extension penalty [2,1]. A gap of length
|
|
.I k
|
|
costs
|
|
.RI min{ O1 + k * E1 , O2 + k * E2 }.
|
|
In the splice mode, the second gap penalties are not used.
|
|
.TP
|
|
.BI -C \ INT
|
|
Cost for a non-canonical GT-AG splicing (effective with
|
|
.BR --splice )
|
|
[0]
|
|
.TP
|
|
.BI -z \ INT1[,INT2]
|
|
Truncate an alignment if the running alignment score drops too quickly along
|
|
the diagonal of the DP matrix (diagonal X-drop, or Z-drop) [400,200]. If the
|
|
drop of score is above
|
|
.IR INT2 ,
|
|
minimap2 will reverse complement the query in the related region and align
|
|
again to test small inversions. Minimap2 truncates alignment if there is an
|
|
inversion or the drop of score is greater than
|
|
.IR INT1 .
|
|
Decrease
|
|
.I INT2
|
|
to find small inversions at the cost of performance and false positives.
|
|
Increase
|
|
.I INT1
|
|
to improves the contiguity of alignment at the cost of poor alignment in the
|
|
middle.
|
|
.TP
|
|
.BI -s \ INT
|
|
Minimal peak DP alignment score to output [40]. The peak score is computed from
|
|
the final CIGAR. It is the score of the max scoring segment in the alignment
|
|
and may be different from the total alignment score.
|
|
.TP
|
|
.BI -u \ CHAR
|
|
How to find canonical splicing sites GT-AG -
|
|
.BR f :
|
|
transcript strand;
|
|
.BR b :
|
|
both strands;
|
|
.BR n :
|
|
no attempt to match GT-AG [n]
|
|
.TP
|
|
.BI --end-bonus \ INT
|
|
Score bonus when alignment extends to the end of the query sequence [0].
|
|
.TP
|
|
.BI --score-N \ INT
|
|
Score of a mismatch involving ambiguous bases [1].
|
|
.TP
|
|
.BR --splice-flank = yes | no
|
|
Assume the next base to a
|
|
.B GT
|
|
donor site tends to be A/G (91% in human and 92% in mouse) and the preceding
|
|
base to a
|
|
.B AG
|
|
acceptor tends to be C/T [no].
|
|
This trend is evolutionarily conservative, all the way to S. cerevisiae
|
|
(PMID:18688272). Specifying this option generally leads to higher junction
|
|
accuracy by several percents, so it is applied by default with
|
|
.BR --splice .
|
|
However, the SIRV control does not honor this trend
|
|
(only ~60%). This option reduces accuracy. If you are benchmarking minimap2
|
|
on SIRV data, please add
|
|
.B --splice-flank=no
|
|
to the command line.
|
|
.TP
|
|
.BR --junc-bed \ FILE
|
|
Gene annotations in the BED12 format (aka 12-column BED), or intron positions
|
|
in 5-column BED. With this option, minimap2 prefers splicing in annotations.
|
|
BED12 file can be converted from GTF/GFF3 with `paftools.js gff2bed anno.gtf'
|
|
[].
|
|
.TP
|
|
.BR --junc-bonus \ INT
|
|
Score bonus for a splice donor or acceptor found in annotation (effective with
|
|
.BR --junc-bed )
|
|
[9].
|
|
.TP
|
|
.BI --end-seed-pen \ INT
|
|
Drop a terminal anchor if
|
|
.IR s <log( g )+ INT ,
|
|
where
|
|
.I s
|
|
is the local alignment score around the anchor and
|
|
.I g
|
|
the length of the terminal gap in the chain. This option is only effective
|
|
with
|
|
.BR --splice .
|
|
It helps to avoid tiny terminal exons. [6]
|
|
.TP
|
|
.B --no-end-flt
|
|
Don't filter seeds towards the ends of chains before performing base-level
|
|
alignment.
|
|
.TP
|
|
.BI --cap-sw-mem \ NUM
|
|
Skip alignment if the DP matrix size is above
|
|
.IR NUM .
|
|
Set 0 to disable [100m].
|
|
.SS Input/output options
|
|
.TP 10
|
|
.B -a
|
|
Generate CIGAR and output alignments in the SAM format. Minimap2 outputs in PAF
|
|
by default.
|
|
.TP
|
|
.BI -o \ FILE
|
|
Output alignments to
|
|
.I FILE
|
|
[stdout].
|
|
.TP
|
|
.B -Q
|
|
Ignore base quality in the input file.
|
|
.TP
|
|
.B -L
|
|
Write CIGAR with >65535 operators at the CG tag. Older tools are unable to
|
|
convert alignments with >65535 CIGAR ops to BAM. This option makes minimap2 SAM
|
|
compatible with older tools. Newer tools recognizes this tag and reconstruct
|
|
the real CIGAR in memory.
|
|
.TP
|
|
.BI -R \ STR
|
|
SAM read group line in a format like
|
|
.B @RG\\\\tID:foo\\\\tSM:bar
|
|
[].
|
|
.TP
|
|
.B -y
|
|
Copy input FASTA/Q comments to output.
|
|
.TP
|
|
.B -c
|
|
Generate CIGAR. In PAF, the CIGAR is written to the `cg' custom tag.
|
|
.TP
|
|
.BI --cs[= STR ]
|
|
Output the
|
|
.B cs
|
|
tag.
|
|
.I STR
|
|
can be either
|
|
.I short
|
|
or
|
|
.IR long .
|
|
If no
|
|
.I STR
|
|
is given,
|
|
.I short
|
|
is assumed. [none]
|
|
.TP
|
|
.B --MD
|
|
Output the MD tag (see the SAM spec).
|
|
.TP
|
|
.B --eqx
|
|
Output =/X CIGAR operators for sequence match/mismatch.
|
|
.TP
|
|
.B -Y
|
|
In SAM output, use soft clipping for supplementary alignments.
|
|
.TP
|
|
.BI --seed \ INT
|
|
Integer seed for randomizing equally best hits. Minimap2 hashes
|
|
.I INT
|
|
and read name when choosing between equally best hits. [11]
|
|
.TP
|
|
.BI -t \ INT
|
|
Number of threads [3]. Minimap2 uses at most three threads when indexing target
|
|
sequences, and uses up to
|
|
.IR INT +1
|
|
threads when mapping (the extra thread is for I/O, which is frequently idle and
|
|
takes little CPU time).
|
|
.TP
|
|
.B -2
|
|
Use two I/O threads during mapping. By default, minimap2 uses one I/O thread.
|
|
When I/O is slow (e.g. piping to gzip, or reading from a slow pipe), the I/O
|
|
thread may become the bottleneck. Apply this option to use one thread for input
|
|
and another thread for output, at the cost of increased peak RAM.
|
|
.TP
|
|
.BI -K \ NUM
|
|
Number of bases loaded into memory to process in a mini-batch [500M].
|
|
Similar to option
|
|
.BR -I ,
|
|
K/M/G/k/m/g suffix is accepted. A large
|
|
.I NUM
|
|
helps load balancing in the multi-threading mode, at the cost of increased
|
|
memory.
|
|
.TP
|
|
.BR --secondary = yes | no
|
|
Whether to output secondary alignments [yes]
|
|
.TP
|
|
.BI --max-qlen \ NUM
|
|
Filter out query sequences longer than
|
|
.IR NUM .
|
|
.TP
|
|
.B --paf-no-hit
|
|
In PAF, output unmapped queries; the strand and the reference name fields are
|
|
set to `*'. Warning: some paftools.js commands may not work with such output
|
|
for the moment.
|
|
.TP
|
|
.B --sam-hit-only
|
|
In SAM, don't output unmapped reads.
|
|
.TP
|
|
.B --version
|
|
Print version number to stdout
|
|
.SS Preset options
|
|
.TP 10
|
|
.BI -x \ STR
|
|
Preset []. This option applies multiple options at the same time. It should be
|
|
applied before other options because options applied later will overwrite the
|
|
values set by
|
|
.BR -x .
|
|
Available
|
|
.I STR
|
|
are:
|
|
.RS
|
|
.TP 10
|
|
.B map-ont
|
|
Align noisy long reads of ~10% error rate to a reference genome. This is the
|
|
default mode.
|
|
.TP
|
|
.B map-hifi
|
|
Align PacBio high-fidelity (HiFi) reads to a reference genome
|
|
.RB ( -k19
|
|
.B -w19 -U50,500 -g10k -A1 -B4 -O6,26 -E2,1
|
|
.BR -s200 ).
|
|
.TP
|
|
.B map-pb
|
|
Align older PacBio continuous long (CLR) reads to a reference genome
|
|
.RB ( -Hk19 ).
|
|
.TP
|
|
.B asm5
|
|
Long assembly to reference mapping
|
|
.RB ( -k19
|
|
.B -w19 -U50,500 --rmq -r100k -g10k -A1 -B19 -O39,81 -E3,1 -s200 -z200
|
|
.BR -N50 ).
|
|
Typically, the alignment will not extend to regions with 5% or higher sequence
|
|
divergence. Only use this preset if the average divergence is far below 5%.
|
|
.TP
|
|
.B asm10
|
|
Long assembly to reference mapping
|
|
.RB ( -k19
|
|
.B -w19 -U50,500 --rmq -r100k -g10k -A1 -B9 -O16,41 -E2,1 -s200 -z200
|
|
.BR -N50 ).
|
|
Up to 10% sequence divergence.
|
|
.TP
|
|
.B asm20
|
|
Long assembly to reference mapping
|
|
.RB ( -k19
|
|
.B -w10 -U50,500 --rmq -r100k -g10k -A1 -B4 -O6,26 -E2,1 -s200 -z200
|
|
.BR -N50 ).
|
|
Up to 20% sequence divergence.
|
|
.TP
|
|
.B splice
|
|
Long-read spliced alignment
|
|
.RB ( -k15
|
|
.B -w5 --splice -g2k -G200k -A1 -B2 -O2,32 -E1,0 -b0 -C9 -z200 -ub --junc-bonus=9 --cap-sw-mem=0
|
|
.BR --splice-flank=yes ).
|
|
In the splice mode, 1) long deletions are taken as introns and represented as
|
|
the
|
|
.RB ` N '
|
|
CIGAR operator; 2) long insertions are disabled; 3) deletion and insertion gap
|
|
costs are different during chaining; 4) the computation of the
|
|
.RB ` ms '
|
|
tag ignores introns to demote hits to pseudogenes.
|
|
.TP
|
|
.B splice:hq
|
|
Long-read splice alignment for PacBio CCS reads
|
|
.RB ( -xsplice
|
|
.B -C5 -O6,24
|
|
.BR -B4 ).
|
|
.TP
|
|
.B sr
|
|
Short single-end reads without splicing
|
|
.RB ( -k21
|
|
.B -w11 --sr --frag=yes -A2 -B8 -O12,32 -E2,1 -b0 -r100 -p.5 -N20 -f1000,5000 -n2 -m20
|
|
.B -s40 -g100 -2K50m --heap-sort=yes
|
|
.BR --secondary=no ).
|
|
.TP
|
|
.B ava-pb
|
|
PacBio CLR all-vs-all overlap mapping
|
|
.RB ( -Hk19
|
|
.B -Xw5 -e0
|
|
.BR -m100 ).
|
|
.TP
|
|
.B ava-ont
|
|
Oxford Nanopore all-vs-all overlap mapping
|
|
.RB ( -k15
|
|
.B -Xw5 -e0 -m100
|
|
.BR -r2k ).
|
|
.RE
|
|
.SS Miscellaneous options
|
|
.TP 10
|
|
.B --no-kalloc
|
|
Use the libc default allocator instead of the kalloc thread-local allocator.
|
|
This debugging option is mostly used with Valgrind to detect invalid memory
|
|
accesses. Minimap2 runs slower with this option, especially in the
|
|
multi-threading mode.
|
|
.TP
|
|
.B --print-qname
|
|
Print query names to stderr, mostly to see which query is crashing minimap2.
|
|
.TP
|
|
.B --print-seeds
|
|
Print seed positions to stderr, for debugging only.
|
|
.SH OUTPUT FORMAT
|
|
.PP
|
|
Minimap2 outputs mapping positions in the Pairwise mApping Format (PAF) by
|
|
default. PAF is a TAB-delimited text format with each line consisting of at
|
|
least 12 fields as are described in the following table:
|
|
.TS
|
|
center box;
|
|
cb | cb | cb
|
|
r | c | l .
|
|
Col Type Description
|
|
_
|
|
1 string Query sequence name
|
|
2 int Query sequence length
|
|
3 int Query start coordinate (0-based)
|
|
4 int Query end coordinate (0-based)
|
|
5 char `+' if query/target on the same strand; `-' if opposite
|
|
6 string Target sequence name
|
|
7 int Target sequence length
|
|
8 int Target start coordinate on the original strand
|
|
9 int Target end coordinate on the original strand
|
|
10 int Number of matching bases in the mapping
|
|
11 int Number bases, including gaps, in the mapping
|
|
12 int Mapping quality (0-255 with 255 for missing)
|
|
.TE
|
|
|
|
.PP
|
|
When alignment is available, column 11 gives the total number of sequence
|
|
matches, mismatches and gaps in the alignment; column 10 divided by column 11
|
|
gives the BLAST-like alignment identity. When alignment is unavailable,
|
|
these two columns are approximate. PAF may optionally have additional fields in
|
|
the SAM-like typed key-value format. Minimap2 may output the following tags:
|
|
.TS
|
|
center box;
|
|
cb | cb | cb
|
|
r | c | l .
|
|
Tag Type Description
|
|
_
|
|
tp A Type of aln: P/primary, S/secondary and I,i/inversion
|
|
cm i Number of minimizers on the chain
|
|
s1 i Chaining score
|
|
s2 i Chaining score of the best secondary chain
|
|
NM i Total number of mismatches and gaps in the alignment
|
|
MD Z To generate the ref sequence in the alignment
|
|
AS i DP alignment score
|
|
SA Z List of other supplementary alignments
|
|
ms i DP score of the max scoring segment in the alignment
|
|
nn i Number of ambiguous bases in the alignment
|
|
ts A Transcript strand (splice mode only)
|
|
cg Z CIGAR string (only in PAF)
|
|
cs Z Difference string
|
|
dv f Approximate per-base sequence divergence
|
|
de f Gap-compressed per-base sequence divergence
|
|
rl i Length of query regions harboring repetitive seeds
|
|
.TE
|
|
|
|
.PP
|
|
The
|
|
.B cs
|
|
tag encodes difference sequences in the short form or the entire query
|
|
.I AND
|
|
reference sequences in the long form. It consists of a series of operations:
|
|
.TS
|
|
center box;
|
|
cb | cb |cb
|
|
r | l | l .
|
|
Op Regex Description
|
|
_
|
|
= [ACGTN]+ Identical sequence (long form)
|
|
: [0-9]+ Identical sequence length
|
|
* [acgtn][acgtn] Substitution: ref to query
|
|
+ [acgtn]+ Insertion to the reference
|
|
- [acgtn]+ Deletion from the reference
|
|
~ [acgtn]{2}[0-9]+[acgtn]{2} Intron length and splice signal
|
|
.TE
|
|
|
|
.SH LIMITATIONS
|
|
.TP 2
|
|
*
|
|
Minimap2 may produce suboptimal alignments through long low-complexity regions
|
|
where seed positions may be suboptimal. This should not be a big concern
|
|
because even the optimal alignment may be wrong in such regions.
|
|
.TP
|
|
*
|
|
Minimap2 requires SSE2 or NEON instructions to compile. It is possible to add
|
|
non-SSE2/NEON support, but it would make minimap2 slower by several times.
|
|
.SH SEE ALSO
|
|
.PP
|
|
miniasm(1), minimap(1), bwa(1).
|