updated cookbook
This commit is contained in:
parent
0bf97a367b
commit
1776311a9b
|
|
@ -137,7 +137,7 @@ Nanopore reads.
|
|||
#### <a name="map-long-splice"></a>Map long mRNA/cDNA reads
|
||||
|
||||
```sh
|
||||
minimap2 -ax splice -uf ref.fa iso-seq.fq > aln.sam # PacBio Iso-seq/traditional cDNA
|
||||
minimap2 -ax splice -uf -C5 ref.fa iso-seq.fq > aln.sam # PacBio Iso-seq/traditional cDNA
|
||||
minimap2 -ax splice ref.fa nanopore-cdna.fa > aln.sam # Nanopore 2D cDNA-seq
|
||||
minimap2 -ax splice -uf -k14 ref.fa direct-rna.fq > aln.sam # Nanopore Direct RNA-seq
|
||||
minimap2 -ax splice --splice-flank=no SIRV.fa SIRV-seq.fa # mapping against SIRV control
|
||||
|
|
|
|||
101
cookbook.md
101
cookbook.md
|
|
@ -1,29 +1,41 @@
|
|||
## Table of Contents
|
||||
|
||||
- [Installation](#install)
|
||||
- [Introduction & Installation](#intro)
|
||||
- [Mapping Genomic Reads](#map-reads)
|
||||
* [Mapping long reads](#map-pb)
|
||||
* [Mapping Illumina paired-end reads](#map-sr)
|
||||
* [Evaluating mapping accuracy with simulated reads (for developers)](#mapeval)
|
||||
- [Mapping Long RNA-seq Reads](#map-rna)
|
||||
* [Mapping Nanopore 2D cDNA reads](#map-ont-cdna-2d)
|
||||
* [Mapping Nanopore direct-RNA reads](#map-direct-rna)
|
||||
* [Mapping PacBio Iso-seq reads](#map-iso-seq)
|
||||
- [Full-Genome Alignment](#genome-aln)
|
||||
* [Intra-species assembly alignment](#asm-to-ref)
|
||||
* [Cross-species full-genome alignment](#x-species)
|
||||
* [Eyeballing alignment](#view-aln)
|
||||
* [Calling variants from assembly-to-reference alignment](#asm-var)
|
||||
* [Lift Over](#liftover)
|
||||
* [Constructing self-homology map](#hom-map)
|
||||
* [Lift Over (for developers)](#liftover)
|
||||
- [Read Overlap](#read-overlap)
|
||||
* [Long-read overlap](#long-read-overlap)
|
||||
* [Evaluating overlap sensitivity (for developers)](#ov-eval)
|
||||
|
||||
## <a name="install"></a>Installation
|
||||
## <a name="intro"></a>Introduction & Installation
|
||||
|
||||
This cookbook walks you through a variety of applications of minimap2 and its
|
||||
companion script `paftools.js`. All data here are freely available from the
|
||||
minimap2 release page at version tag [v2.10][v2.10]. Some examples only work
|
||||
with v2.10 or later.
|
||||
|
||||
To acquire the data used in this cookbook and to install minimap2 and paftools,
|
||||
please follow the command lines below:
|
||||
```sh
|
||||
# install minimap2 executables
|
||||
curl -L https://github.com/lh3/minimap2/releases/download/v2.9/minimap2-2.9_x64-linux.tar.bz2 | tar jxf -
|
||||
cp minimap2-2.9_x64-linux/{minimap2,k8,paftools.js} . # copy executables
|
||||
export PATH="$PATH:"`pwd` # put the current directory on PATH
|
||||
curl -L https://github.com/lh3/minimap2/releases/download/v2.10/minimap2-2.10_x64-linux.tar.bz2 | tar jxf -
|
||||
cp minimap2-2.10_x64-linux/{minimap2,k8,paftools.js} . # copy executables
|
||||
export PATH="$PATH:"`pwd` # put the current directory on PATH
|
||||
# download example datasets
|
||||
curl -L https://github.com/lh3/minimap2/releases/download/v2.0/cookbook-data.tgz | tar zxf -
|
||||
curl -L https://github.com/lh3/minimap2/releases/download/v2.10/cookbook-data.tgz | tar zxf -
|
||||
```
|
||||
|
||||
## <a name="map-reads"></a>Mapping Genomic Reads
|
||||
|
|
@ -84,6 +96,56 @@ paftools.js mason2fq tmp.sam | seqtk seq -2 > ecoli_mason_2.fq
|
|||
|
||||
|
||||
|
||||
## <a name="map-rna"></a>Mapping Long RNA-seq Reads
|
||||
|
||||
### <a name="map-ont-cdna-2d"></a>Mapping Nanopore 2D cDNA reads
|
||||
```sh
|
||||
minimap2 -ax splice SIRV_E2.fa SIRV_ont-cdna.fa > aln.sam
|
||||
```
|
||||
You can compare the alignment to the true annotations with:
|
||||
```sh
|
||||
paftools.js junceval SIRV_E2C.gtf aln.sam
|
||||
```
|
||||
It gives the percentage of introns found in the annotation. For SIRV data, it
|
||||
is possible to achieve higher junction accuracy with
|
||||
```sh
|
||||
minimap2 -ax splice --splice-flank=no SIRV_E2.fa SIRV_ont-cdna.fa | paftools.js junceval SIRV_E2C.gtf
|
||||
```
|
||||
This is because minimap2 models one additional evolutionarily conserved base
|
||||
around a canonical junction, but SIRV doesn't honor this signal. Option
|
||||
`--splice-flank=no` asks minimap2 no to model this additional base.
|
||||
|
||||
In the output a tag `ts:A:+` indicates that the read strand is the same as the
|
||||
transcript strand; `ts:A:-` indicates the read strand is opposite to the
|
||||
transcript strand. This tag is inferred from the GT-AG signal and is thus only
|
||||
available to spliced reads.
|
||||
|
||||
### <a name="map-direct-rna"></a>Mapping Nanopore direct-RNA reads
|
||||
```sh
|
||||
minimap2 -ax splice -k14 -uf SIRV_E2.fa SIRV_ont-drna.fa > aln.sam
|
||||
```
|
||||
Direct-RNA reads are noisier, so we use a shorter k-mer for improved
|
||||
sensitivity. Here, option `-uf` forces minimap2 to map reads to the forward
|
||||
transcript strand only because direct-RNA reads are stranded. Again, applying
|
||||
`--splice-flank=no` helps junction accuracy for SIRV data.
|
||||
|
||||
### <a name="map-iso-seq"></a>Mapping PacBio Iso-seq reads
|
||||
```sh
|
||||
minimap2 -ax splice -uf -C5 SIRV_E2.fa SIRV_iso-seq.fq > aln.sam
|
||||
```
|
||||
Option `-C5` reduces the penalty on non-canonical splicing sites. It helps
|
||||
to align such sites correctly for data with low error rate such as Iso-seq
|
||||
reads and traditional cDNAs. On this example, minimap2 makes one junction
|
||||
error. Applying `--splice-flank=no` fixes this alignment error.
|
||||
|
||||
Note that the command line above is optimized for the final Iso-seq reads.
|
||||
PacBio's Iso-seq pipeline produces intermediate sequences at varying quality.
|
||||
For example, some intermediate reads are not stranded. For these reads, option
|
||||
`-uf` will lead to more errors. Please revise the minimap2 command line
|
||||
accordingly.
|
||||
|
||||
|
||||
|
||||
## <a name="genome-aln"></a>Full-Genome Alignment
|
||||
|
||||
### <a name="asm-to-ref"></a>Intra-species assembly alignment
|
||||
|
|
@ -124,7 +186,17 @@ Without option `-f`, `paftools.js call` outputs in a custom format. In this
|
|||
format, lines starting with `R` give the regions covered by one contig only.
|
||||
This information is not available in the VCF output.
|
||||
|
||||
### <a name="liftover"></a>Lift over
|
||||
### <a name="hom-map"></a>Constructing self-homology map
|
||||
```sh
|
||||
minimap2 -DP -k19 -w19 -m200 ecoli_ref.fa ecoli_ref.fa > out.paf
|
||||
```
|
||||
Option `-D` asks minimap2 to ignore anchors from perfect self match and `-P`
|
||||
outputs all chains. For large nomes, we don't recommend to perform base-level
|
||||
alignment (with `-c`, `-a` or `--cs`) when `-P` is applied. This is because
|
||||
base-alignment is slow and occasionally gives wrong alignments close to the
|
||||
diagonal of a dotter plot. For E. coli, though, base-alignment is still fast.
|
||||
|
||||
### <a name="liftover"></a>Lift over (for developers)
|
||||
```sh
|
||||
minimap2 -cx asm5 --cs ecoli_ref.fa ecoli_canu.fa > ecoli_canu.paf
|
||||
echo -e 'tig00000001\t200000\t300000' | paftools.js liftover ecoli_canu.paf -
|
||||
|
|
@ -165,18 +237,7 @@ with `-x ava-pb` (99% vs 93% with `-x ava-ont`).
|
|||
|
||||
|
||||
|
||||
## <a name="map-rna"></a>Mapping Long RNA-seq Reads
|
||||
|
||||
(Unfinished section. Data to come later...)
|
||||
|
||||
### <a name="map-direct-rna"></a>Mapping Nanopore direct-RNA reads
|
||||
|
||||
```sh
|
||||
minimap2 -ax splice -k14 -uf ref.fa reads.fa > aln.sam
|
||||
```
|
||||
|
||||
|
||||
|
||||
[pbsim]: https://github.com/pfaucon/PBSIM-PacBio-Simulator
|
||||
[mason2]: https://github.com/seqan/seqan/tree/master/apps/mason2
|
||||
[paf]: https://github.com/lh3/miniasm/blob/master/PAF.md
|
||||
[v2.10]: https://github.com/lh3/minimap2/releases/tag/v2.10
|
||||
|
|
|
|||
Loading…
Reference in New Issue