backup
This commit is contained in:
parent
6c9390b54a
commit
28ab4d1f72
|
|
@ -20,19 +20,19 @@
|
||||||
\begin{document}
|
\begin{document}
|
||||||
\firstpage{1}
|
\firstpage{1}
|
||||||
|
|
||||||
\title[Long sequence alignment with minimap2]{Minimap2: fast pairwise alignment for long noisy sequences}
|
\title[Long DNA sequence alignment with minimap2]{Minimap2: fast pairwise alignment for long DNA sequences}
|
||||||
\author[Li]{Heng Li}
|
\author[Li]{Heng Li}
|
||||||
\address{Broad Institute, 415 Main Street, Cambridge, MA 02142, USA}
|
\address{Broad Institute, 415 Main Street, Cambridge, MA 02142, USA}
|
||||||
|
|
||||||
\maketitle
|
\maketitle
|
||||||
|
|
||||||
\begin{abstract}
|
\begin{abstract}
|
||||||
\section{Summary:} Minimap2 is a program to align long noisy sequences against
|
\section{Summary:} Minimap2 is a general-purpose mapper to align long noisy DNA
|
||||||
a large reference database. It targets query sequences of 1kb--100Mb in length
|
sequences against a large reference database. It targets query sequences of
|
||||||
with sequence divergence typically below 25\%. Minimap2 is $\sim$30 times
|
1kb--100Mb in length with per-base divergence typically below 25\%. Minimap2 is
|
||||||
faster than many mainstream long-read aligners and achieves higher accuracy on
|
$\sim$30 times faster than many mainstream long-read aligners and achieves
|
||||||
simulated data. It also employs concave gap cost and rescues inversions for
|
higher accuracy on simulated data. It also employs concave gap cost and rescues
|
||||||
improved alignment around potential structural variations.
|
inversions for improved alignment around potential structural variations.
|
||||||
|
|
||||||
\section{Availability and implementation:}
|
\section{Availability and implementation:}
|
||||||
\href{https://github.com/lh3/minimap2}{https://github.com/lh3/minimap2}
|
\href{https://github.com/lh3/minimap2}{https://github.com/lh3/minimap2}
|
||||||
|
|
@ -50,21 +50,18 @@ They are usually five times as slow as mainstream short-read
|
||||||
aligners~\citep{Langmead:2012fk,Li:2013aa}. We speculated there could be
|
aligners~\citep{Langmead:2012fk,Li:2013aa}. We speculated there could be
|
||||||
substantial room for speedup on the thought that 10kb long sequences should be
|
substantial room for speedup on the thought that 10kb long sequences should be
|
||||||
easier to map than 100bp reads because we can more effectively skip repetitive
|
easier to map than 100bp reads because we can more effectively skip repetitive
|
||||||
regions and dramatically reduce computation. We confirmed our speculation by
|
regions, which are often the bottleneck of short-read alignment. We confirmed
|
||||||
achieving approximate mapping 50 times faster than BWA-MEM~\citep{Li:2016aa}.
|
our speculation by achieving approximate mapping 50 times faster than
|
||||||
\citet{Suzuki:2016} extended our work with with a fast and novel algorithm on
|
BWA-MEM~\citep{Li:2016aa}. \citet{Suzuki:2016} extended our work with a fast
|
||||||
generating detailed alignment, which in turn inspired us to develop minimap2
|
and novel algorithm on generating detailed alignment, which in turn inspired us
|
||||||
towards higher accuracy.
|
to develop minimap2 towards higher accuracy and more practical functionality.
|
||||||
|
|
||||||
\begin{methods}
|
\begin{methods}
|
||||||
\section{Methods}
|
\section{Methods}
|
||||||
|
|
||||||
Minimap2 is the successor of minimap~\citep{Li:2016aa}. It uses similar
|
Minimap2 is the successor of minimap~\citep{Li:2016aa}. It uses similar
|
||||||
indexing and seeding algorithms except that minimap2 optionally uses
|
indexing and seeding algorithms, and further a more accurate chaining algorithm
|
||||||
homopolymer-compressed (HPC; \citealp{Ruan:2016,Lau:2016aa}) $k$-mers in
|
and adds the ability to produce detailed alignment.
|
||||||
addition to normal $k$-mers. Indexing with HPC $k$-mers leads to higher
|
|
||||||
mapping sensitivity for SMRT reads. Minimap2 further implements a more
|
|
||||||
accurate chaining algorithm and adds the ability to produce detailed alignment.
|
|
||||||
|
|
||||||
\subsection{Chaining}
|
\subsection{Chaining}
|
||||||
|
|
||||||
|
|
@ -107,6 +104,15 @@ find its predecessor and mark each visited $i$ as `used'. This process stops at
|
||||||
$P(j)=0$ or at a `used' $j$. This way we find all chains with no anchors used
|
$P(j)=0$ or at a `used' $j$. This way we find all chains with no anchors used
|
||||||
in more than one chains.
|
in more than one chains.
|
||||||
|
|
||||||
|
\subsubsection{Identifying primary chains}
|
||||||
|
Primary chains are chains that do not greatly overlap on the query sequence.
|
||||||
|
Minimap2 uses a greedy algorithm to identify them. Let $Q$ be the set of
|
||||||
|
primary chains, which is an empty set initially. For each chain from the best
|
||||||
|
to the worst according to their chaining scores: if on the query, the chain
|
||||||
|
overlaps with a chain in $Q$ by 50\% (by default) or higher fraction of the
|
||||||
|
shorter chain, mark the chain as secondary to the chain in $Q$; otherwise, add
|
||||||
|
the chain to $Q$.
|
||||||
|
|
||||||
\subsection{Alignment}
|
\subsection{Alignment}
|
||||||
|
|
||||||
Minimap2 performs global alignment between adjacent anchors in a chain. It
|
Minimap2 performs global alignment between adjacent anchors in a chain. It
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue