extend the section on genome alignment

This commit is contained in:
Heng Li 2018-02-09 13:59:00 -05:00
parent 42dab6319b
commit a58b05a61b
2 changed files with 30 additions and 9 deletions

View File

@ -329,3 +329,11 @@
Title = {Fast and accurate long-read alignment with {Burrows-Wheeler} transform},
Volume = {26},
Year = {2010}}
@article{Marcais:2018aa,
Author = {Mar{\c c}ais, Guillaume and others},
Journal = {PLoS Comput Biol},
Pages = {e1005944},
Title = {{MUMmer4}: A fast and versatile genome alignment system},
Volume = {14},
Year = {2018}}

View File

@ -479,7 +479,10 @@ commercial and academic uses. Minimap2 uses the same base algorithm for all
applications, but it has to apply different sets of parameters depending on
input data types. Similar to BWA-MEM, minimap2 introduces `presets' that
modify multiple parameters with a simple invokation. Detailed settings
and command-line options can be found in the minimap2 manpage.
and command-line options can be found in the minimap2 manpage. In addition to
the applications described in the following sections, minimap2 also retains
minimap's functionality to find overlaps between long reads and to search
against large multi-species databases such as \emph{nt} from NCBI.
\subsection{Aligning long genomic reads}\label{sec:long-genomic}
@ -512,7 +515,7 @@ BLASR~(v1.MC.rc64; \citealp{Chaisson:2012aa}),
BWA-MEM~(v0.7.15; \citealp{Li:2013aa}),
GraphMap~(v0.5.2; \citealp{Sovic:2016aa}),
Kart~(v2.2.5; \citealp{Lin:2017aa}),
minialign~(v0.5.3; \citealp{Suzuki:2016}) and
minialign~(v0.5.3; \href{https://github.com/ocxtal/minialign}{https://github.com/ocxtal/minialign}) and
NGMLR~(v0.2.5; \citealp{Sedlazeck169557}). We excluded rHAT~\citep{Liu:2016ab}
and LAMSA~\citep{Liu:2017aa} because they either
crashed or produced malformatted output. In this evaluation, minimap2 has
@ -649,14 +652,24 @@ million bases (FPPM; 3.0 vs 3.9), lower 2--50bp INDEL FNR (7.3\% vs 7.5\%) and
similar INDEL FPPM (both 1.0). Minimap2 is broadly similar to BWA-MEM in the
context of small variant calling.
\subsection{Other applications}
\subsection{Aligning long-read assemblies}
Minimap2 retains minimap's functionality to find overlaps between long reads
and to search against large multi-species databases such as \emph{nt} from
NCBI. Minimap2 also aligns similar genomes or different assemblies of the
same species. It can map a human SMRT assembly (AC:GCA\_001297185.1) to
GRCh38 in 7 minutes using 8 CPU cores. QUAST~\citep{Gurevich:2013aa} v5.0 will
use minimap2 to evaluate the quality of assemblies against a reference genome.
Minimap2 can align a human SMRT assembly (AC:GCA\_001297185.1) against
GRCh38 in 7 minutes using 8 CPU cores, over 20 times faster than
MUMmer4~\citep{Marcais:2018aa}. With the paftools.js script from the minimap2
package, we called 2.67 million single-base substitutions out of 2.78Gbp
genomic regions. The transition-to-transversion ratio (ts/tv) is 2.01. In
comparison, using MUMmer4's dnadiff pipeline, we called 2.86 million
substitutions in 2.83Gbp at ts/tv=1.87. Given that ts/tv averaged across the
human genome is about 2 but ts/tv averaged over random errors is 0.5, the
minimap2 callset arguably has higher accuracy.
The sample being assembled is a female. Minimap2 still called 201 substitutions
on the Y chromosome. These substitutions all come from one contig aligned at
96.8\% sequence identity. The contig could be a diverged segmental duplication
absent from GRCh38. In constrast, on the Y chromosome, MUMmer4 called 9070
substitutions across 73 SMRT contigs. The accuracy of the MUMmer4 pipeline is
probably lower than our minimap2-based pipeline.
\section{Discussions}