From 2191ac58ad6cb147e5b67652525bc5e2ec84bf7a Mon Sep 17 00:00:00 2001 From: Heng Li Date: Sun, 5 Nov 2017 12:27:52 -0500 Subject: [PATCH] two discussion paragraphs; need one more --- tex/minimap2.tex | 31 ++++++++++++++++++++++++------- tex/mm2.approx.eval | 42 ++++++++++++------------------------------ 2 files changed, 36 insertions(+), 37 deletions(-) diff --git a/tex/minimap2.tex b/tex/minimap2.tex index 9ad06ab..aa6483a 100644 --- a/tex/minimap2.tex +++ b/tex/minimap2.tex @@ -558,16 +558,33 @@ of small variant calling. \subsection{Other applications} Minimap2 retains minimap's functionality to find overlaps between long reads -and to search against huge multi-species databases such as \emph{nt} from NCBI. -Minimap2 can also align similar genomes or different assemblies of the same -species. It took 7 wall-clock minutes over 8 CPU cores to align a human SMRT -assembly (AC:GCA\_001297185.1) to GRCh38, over 20 times as fast as +and to search against large multi-species databases such as \emph{nt} from +NCBI. Minimap2 can also align similar genomes or different assemblies of the +same species. It took 7 wall-clock minutes over 8 CPU cores to align a human +SMRT assembly (AC:GCA\_001297185.1) to GRCh38, over 20 times as fast as MUMmer4~\citep{Kurtz:2004zr}. -\section{Conclusion} +\section{Discussions} -Minimap2 is a fast, accurate and versatile aligner for long nucleotide -sequences. +Minimap2 is a versatile mapper and pairwise aligner for nucleotide sequences. +It works with short reads, assembly contigs and long noisy genomic and RNA-seq +reads. It can be used as a read mapper, long-read overlapper or a full-genome +aligner. Minimap2 is also accurate and efficient, often outperforming other +domain-specific alignment tools in terms of both speed and accuracy. + +The capability of minimap2 comes from a fast base-level alignment algorithm and +an accurate chaining algorithm. When aligning long query sequences, base-level +alignment is often the performance bottleneck. The Suzuki-Kasahara algorithm +greatly alleviates the bottleneck and enables DP-based splice alignment +involving $>$100kb introns, which was impractically slow ten years ago. The +minimap2 chaining algorithm is fast and highly accurate by itself. In fact, +chaining alone is more accurate than all the other long-read mappers in +Fig.~\ref{fig:eval}a (data not shown). This accuracy helps to reduce downstream +base-level alignment of candidate chains, which is still times slower than +chaining even with the Suzuki-Kasahara improvement. In addition, taking a +general form, minimap2 chaining can be adapted to non-typical data types such +spliced reads and multiple reads per fragment. This gives us the opportunity to +extend the same base algorithm to a variety of use cases. \section*{Acknowledgements} We owe a debt of gratitude to H. Suzuki and M. Kasahara for releasing their diff --git a/tex/mm2.approx.eval b/tex/mm2.approx.eval index 60b9c9f..801be8b 100644 --- a/tex/mm2.approx.eval +++ b/tex/mm2.approx.eval @@ -1,30 +1,12 @@ -Q 60 32066 0 0.000000000 -Q 40 32 1 0.000031155 -Q 38 19 1 0.000062272 -Q 36 11 1 0.000093376 -Q 35 32 1 0.000124378 -Q 33 15 1 0.000155400 -Q 32 58 1 0.000186145 -Q 27 11 1 0.000217095 -Q 26 80 1 0.000247494 -Q 21 19 2 0.000309186 -Q 20 16 1 0.000339936 -Q 19 19 1 0.000370622 -Q 18 22 2 0.000432099 -Q 17 37 5 0.000585751 -Q 15 24 2 0.000646930 -Q 14 18 3 0.000738939 -Q 13 30 6 0.000922821 -Q 12 18 1 0.000953054 -Q 11 29 2 0.001013638 -Q 10 30 1 0.001043393 -Q 9 20 5 0.001196099 -Q 8 25 8 0.001440348 -Q 7 28 6 0.001622830 -Q 6 35 12 0.001988132 -Q 5 34 12 0.002352725 -Q 4 29 8 0.002594865 -Q 3 36 14 0.003018937 -Q 2 46 15 0.003471482 -Q 1 69 36 0.004558162 -Q 0 167 94 0.007377173 +Q 60 32084 0 0.000000000 32084 +Q 24 318 2 0.000061725 32402 +Q 11 98 2 0.000123077 32500 +Q 8 37 2 0.000184405 32537 +Q 7 37 3 0.000276294 32574 +Q 6 40 3 0.000367940 32614 +Q 5 34 2 0.000428816 32648 +Q 4 37 5 0.000581306 32685 +Q 3 28 6 0.000764222 32713 +Q 2 38 6 0.000946536 32751 +Q 1 50 21 0.001585318 32801 +Q 0 286 150 0.006105117 33087