updated cookbook

This commit is contained in:
Heng Li 2018-03-26 14:37:02 -04:00
parent 0bf97a367b
commit 1776311a9b
2 changed files with 82 additions and 21 deletions

View File

@ -137,7 +137,7 @@ Nanopore reads.
#### <a name="map-long-splice"></a>Map long mRNA/cDNA reads
```sh
minimap2 -ax splice -uf ref.fa iso-seq.fq > aln.sam # PacBio Iso-seq/traditional cDNA
minimap2 -ax splice -uf -C5 ref.fa iso-seq.fq > aln.sam # PacBio Iso-seq/traditional cDNA
minimap2 -ax splice ref.fa nanopore-cdna.fa > aln.sam # Nanopore 2D cDNA-seq
minimap2 -ax splice -uf -k14 ref.fa direct-rna.fq > aln.sam # Nanopore Direct RNA-seq
minimap2 -ax splice --splice-flank=no SIRV.fa SIRV-seq.fa # mapping against SIRV control

View File

@ -1,29 +1,41 @@
## Table of Contents
- [Installation](#install)
- [Introduction & Installation](#intro)
- [Mapping Genomic Reads](#map-reads)
* [Mapping long reads](#map-pb)
* [Mapping Illumina paired-end reads](#map-sr)
* [Evaluating mapping accuracy with simulated reads (for developers)](#mapeval)
- [Mapping Long RNA-seq Reads](#map-rna)
* [Mapping Nanopore 2D cDNA reads](#map-ont-cdna-2d)
* [Mapping Nanopore direct-RNA reads](#map-direct-rna)
* [Mapping PacBio Iso-seq reads](#map-iso-seq)
- [Full-Genome Alignment](#genome-aln)
* [Intra-species assembly alignment](#asm-to-ref)
* [Cross-species full-genome alignment](#x-species)
* [Eyeballing alignment](#view-aln)
* [Calling variants from assembly-to-reference alignment](#asm-var)
* [Lift Over](#liftover)
* [Constructing self-homology map](#hom-map)
* [Lift Over (for developers)](#liftover)
- [Read Overlap](#read-overlap)
* [Long-read overlap](#long-read-overlap)
* [Evaluating overlap sensitivity (for developers)](#ov-eval)
## <a name="install"></a>Installation
## <a name="intro"></a>Introduction & Installation
This cookbook walks you through a variety of applications of minimap2 and its
companion script `paftools.js`. All data here are freely available from the
minimap2 release page at version tag [v2.10][v2.10]. Some examples only work
with v2.10 or later.
To acquire the data used in this cookbook and to install minimap2 and paftools,
please follow the command lines below:
```sh
# install minimap2 executables
curl -L https://github.com/lh3/minimap2/releases/download/v2.9/minimap2-2.9_x64-linux.tar.bz2 | tar jxf -
cp minimap2-2.9_x64-linux/{minimap2,k8,paftools.js} . # copy executables
export PATH="$PATH:"`pwd` # put the current directory on PATH
curl -L https://github.com/lh3/minimap2/releases/download/v2.10/minimap2-2.10_x64-linux.tar.bz2 | tar jxf -
cp minimap2-2.10_x64-linux/{minimap2,k8,paftools.js} . # copy executables
export PATH="$PATH:"`pwd` # put the current directory on PATH
# download example datasets
curl -L https://github.com/lh3/minimap2/releases/download/v2.0/cookbook-data.tgz | tar zxf -
curl -L https://github.com/lh3/minimap2/releases/download/v2.10/cookbook-data.tgz | tar zxf -
```
## <a name="map-reads"></a>Mapping Genomic Reads
@ -84,6 +96,56 @@ paftools.js mason2fq tmp.sam | seqtk seq -2 > ecoli_mason_2.fq
## <a name="map-rna"></a>Mapping Long RNA-seq Reads
### <a name="map-ont-cdna-2d"></a>Mapping Nanopore 2D cDNA reads
```sh
minimap2 -ax splice SIRV_E2.fa SIRV_ont-cdna.fa > aln.sam
```
You can compare the alignment to the true annotations with:
```sh
paftools.js junceval SIRV_E2C.gtf aln.sam
```
It gives the percentage of introns found in the annotation. For SIRV data, it
is possible to achieve higher junction accuracy with
```sh
minimap2 -ax splice --splice-flank=no SIRV_E2.fa SIRV_ont-cdna.fa | paftools.js junceval SIRV_E2C.gtf
```
This is because minimap2 models one additional evolutionarily conserved base
around a canonical junction, but SIRV doesn't honor this signal. Option
`--splice-flank=no` asks minimap2 no to model this additional base.
In the output a tag `ts:A:+` indicates that the read strand is the same as the
transcript strand; `ts:A:-` indicates the read strand is opposite to the
transcript strand. This tag is inferred from the GT-AG signal and is thus only
available to spliced reads.
### <a name="map-direct-rna"></a>Mapping Nanopore direct-RNA reads
```sh
minimap2 -ax splice -k14 -uf SIRV_E2.fa SIRV_ont-drna.fa > aln.sam
```
Direct-RNA reads are noisier, so we use a shorter k-mer for improved
sensitivity. Here, option `-uf` forces minimap2 to map reads to the forward
transcript strand only because direct-RNA reads are stranded. Again, applying
`--splice-flank=no` helps junction accuracy for SIRV data.
### <a name="map-iso-seq"></a>Mapping PacBio Iso-seq reads
```sh
minimap2 -ax splice -uf -C5 SIRV_E2.fa SIRV_iso-seq.fq > aln.sam
```
Option `-C5` reduces the penalty on non-canonical splicing sites. It helps
to align such sites correctly for data with low error rate such as Iso-seq
reads and traditional cDNAs. On this example, minimap2 makes one junction
error. Applying `--splice-flank=no` fixes this alignment error.
Note that the command line above is optimized for the final Iso-seq reads.
PacBio's Iso-seq pipeline produces intermediate sequences at varying quality.
For example, some intermediate reads are not stranded. For these reads, option
`-uf` will lead to more errors. Please revise the minimap2 command line
accordingly.
## <a name="genome-aln"></a>Full-Genome Alignment
### <a name="asm-to-ref"></a>Intra-species assembly alignment
@ -124,7 +186,17 @@ Without option `-f`, `paftools.js call` outputs in a custom format. In this
format, lines starting with `R` give the regions covered by one contig only.
This information is not available in the VCF output.
### <a name="liftover"></a>Lift over
### <a name="hom-map"></a>Constructing self-homology map
```sh
minimap2 -DP -k19 -w19 -m200 ecoli_ref.fa ecoli_ref.fa > out.paf
```
Option `-D` asks minimap2 to ignore anchors from perfect self match and `-P`
outputs all chains. For large nomes, we don't recommend to perform base-level
alignment (with `-c`, `-a` or `--cs`) when `-P` is applied. This is because
base-alignment is slow and occasionally gives wrong alignments close to the
diagonal of a dotter plot. For E. coli, though, base-alignment is still fast.
### <a name="liftover"></a>Lift over (for developers)
```sh
minimap2 -cx asm5 --cs ecoli_ref.fa ecoli_canu.fa > ecoli_canu.paf
echo -e 'tig00000001\t200000\t300000' | paftools.js liftover ecoli_canu.paf -
@ -165,18 +237,7 @@ with `-x ava-pb` (99% vs 93% with `-x ava-ont`).
## <a name="map-rna"></a>Mapping Long RNA-seq Reads
(Unfinished section. Data to come later...)
### <a name="map-direct-rna"></a>Mapping Nanopore direct-RNA reads
```sh
minimap2 -ax splice -k14 -uf ref.fa reads.fa > aln.sam
```
[pbsim]: https://github.com/pfaucon/PBSIM-PacBio-Simulator
[mason2]: https://github.com/seqan/seqan/tree/master/apps/mason2
[paf]: https://github.com/lh3/miniasm/blob/master/PAF.md
[v2.10]: https://github.com/lh3/minimap2/releases/tag/v2.10