diff --git a/tex/minimap2.bib b/tex/minimap2.bib index 4fed63c..948cb4b 100644 --- a/tex/minimap2.bib +++ b/tex/minimap2.bib @@ -329,3 +329,11 @@ Title = {Fast and accurate long-read alignment with {Burrows-Wheeler} transform}, Volume = {26}, Year = {2010}} + +@article{Marcais:2018aa, + Author = {Mar{\c c}ais, Guillaume and others}, + Journal = {PLoS Comput Biol}, + Pages = {e1005944}, + Title = {{MUMmer4}: A fast and versatile genome alignment system}, + Volume = {14}, + Year = {2018}} diff --git a/tex/minimap2.tex b/tex/minimap2.tex index 25f4f2a..51297fc 100644 --- a/tex/minimap2.tex +++ b/tex/minimap2.tex @@ -479,7 +479,10 @@ commercial and academic uses. Minimap2 uses the same base algorithm for all applications, but it has to apply different sets of parameters depending on input data types. Similar to BWA-MEM, minimap2 introduces `presets' that modify multiple parameters with a simple invokation. Detailed settings -and command-line options can be found in the minimap2 manpage. +and command-line options can be found in the minimap2 manpage. In addition to +the applications described in the following sections, minimap2 also retains +minimap's functionality to find overlaps between long reads and to search +against large multi-species databases such as \emph{nt} from NCBI. \subsection{Aligning long genomic reads}\label{sec:long-genomic} @@ -512,7 +515,7 @@ BLASR~(v1.MC.rc64; \citealp{Chaisson:2012aa}), BWA-MEM~(v0.7.15; \citealp{Li:2013aa}), GraphMap~(v0.5.2; \citealp{Sovic:2016aa}), Kart~(v2.2.5; \citealp{Lin:2017aa}), -minialign~(v0.5.3; \citealp{Suzuki:2016}) and +minialign~(v0.5.3; \href{https://github.com/ocxtal/minialign}{https://github.com/ocxtal/minialign}) and NGMLR~(v0.2.5; \citealp{Sedlazeck169557}). We excluded rHAT~\citep{Liu:2016ab} and LAMSA~\citep{Liu:2017aa} because they either crashed or produced malformatted output. In this evaluation, minimap2 has @@ -649,14 +652,24 @@ million bases (FPPM; 3.0 vs 3.9), lower 2--50bp INDEL FNR (7.3\% vs 7.5\%) and similar INDEL FPPM (both 1.0). Minimap2 is broadly similar to BWA-MEM in the context of small variant calling. -\subsection{Other applications} +\subsection{Aligning long-read assemblies} -Minimap2 retains minimap's functionality to find overlaps between long reads -and to search against large multi-species databases such as \emph{nt} from -NCBI. Minimap2 also aligns similar genomes or different assemblies of the -same species. It can map a human SMRT assembly (AC:GCA\_001297185.1) to -GRCh38 in 7 minutes using 8 CPU cores. QUAST~\citep{Gurevich:2013aa} v5.0 will -use minimap2 to evaluate the quality of assemblies against a reference genome. +Minimap2 can align a human SMRT assembly (AC:GCA\_001297185.1) against +GRCh38 in 7 minutes using 8 CPU cores, over 20 times faster than +MUMmer4~\citep{Marcais:2018aa}. With the paftools.js script from the minimap2 +package, we called 2.67 million single-base substitutions out of 2.78Gbp +genomic regions. The transition-to-transversion ratio (ts/tv) is 2.01. In +comparison, using MUMmer4's dnadiff pipeline, we called 2.86 million +substitutions in 2.83Gbp at ts/tv=1.87. Given that ts/tv averaged across the +human genome is about 2 but ts/tv averaged over random errors is 0.5, the +minimap2 callset arguably has higher accuracy. + +The sample being assembled is a female. Minimap2 still called 201 substitutions +on the Y chromosome. These substitutions all come from one contig aligned at +96.8\% sequence identity. The contig could be a diverged segmental duplication +absent from GRCh38. In constrast, on the Y chromosome, MUMmer4 called 9070 +substitutions across 73 SMRT contigs. The accuracy of the MUMmer4 pipeline is +probably lower than our minimap2-based pipeline. \section{Discussions}