2018-02-03 02:57:08 +08:00
|
|
|
## <a name="started"></a>Getting Started
|
2018-02-03 10:45:46 +08:00
|
|
|
|
2018-02-03 02:57:08 +08:00
|
|
|
```sh
|
2018-02-13 11:06:05 +08:00
|
|
|
# install minimap2
|
|
|
|
|
git clone https://github.com/lh3/minimap2
|
|
|
|
|
cd minimap2 && make
|
|
|
|
|
# install the k8 javascript shell
|
2018-02-03 02:57:08 +08:00
|
|
|
curl -L https://github.com/attractivechaos/k8/releases/download/v0.2.4/k8-0.2.4.tar.bz2 | tar -jxf -
|
2018-02-13 11:06:05 +08:00
|
|
|
cp k8-0.2.4/k8-`uname -s` k8 # or copy it to a directory on your $PATH
|
|
|
|
|
# export PATH="$PATH:`pwd`:`pwd`/misc" # run this if k8, minimap2 or paftools.js not on your $PATH
|
|
|
|
|
minimap2 --cs test/MT-human.fa test/MT-orang.fa | paftools.js view - # view alignment
|
|
|
|
|
minimap2 -c test/MT-human.fa test/MT-orang.fa | paftools.js stat - # basic alignment statistics
|
|
|
|
|
minimap2 -c --cs test/MT-human.fa test/MT-orang.fa \
|
|
|
|
|
| sort -k6,6 -k8,8n | paftools.js call -L15000 - # calling variants from asm-to-ref alignment
|
|
|
|
|
minimap2 -c test/MT-human.fa test/MT-orang.fa \
|
|
|
|
|
| paftools.js liftover -l10000 - <(echo -e "MT_orang\t2000\t5000") # liftOver
|
|
|
|
|
# no test data for the following examples
|
|
|
|
|
paftools.js junceval -e anno.gtf splice.sam > out.txt # compare splice junctions to annotations
|
|
|
|
|
paftools.js splice2bed anno.gtf > anno.bed # convert GTF/GFF3 to BED12
|
2018-02-03 02:57:08 +08:00
|
|
|
```
|
2017-08-05 02:46:31 +08:00
|
|
|
|
2018-02-03 03:07:17 +08:00
|
|
|
## Table of Contents
|
|
|
|
|
|
|
|
|
|
- [Getting Started](#started)
|
|
|
|
|
- [Introduction](#intro)
|
2018-02-03 03:43:58 +08:00
|
|
|
- [Evaluation](#eval)
|
2018-02-03 03:18:18 +08:00
|
|
|
- [Evaluating mapping accuracy with simulated reads](#mapeval)
|
|
|
|
|
- [Evaluating read overlap sensitivity](#oveval)
|
2018-02-03 10:45:46 +08:00
|
|
|
- [Calling Variants from Assemblies](#asmvar)
|
2018-02-03 03:07:17 +08:00
|
|
|
|
|
|
|
|
## <a name="intro"></a>Introduction
|
2017-08-05 02:46:31 +08:00
|
|
|
|
2018-02-13 11:06:05 +08:00
|
|
|
paftools.js is a script that processes alignments in the [PAF format][paf],
|
|
|
|
|
such as converting between formats, evaluating mapping accuracy, lifting over
|
|
|
|
|
BED files based on alignment, and calling variants from assembly-to-assembly
|
|
|
|
|
alignment. This script *requires* the [k8 Javascript shell][k8] to run. On
|
|
|
|
|
Linux or Mac, you can download the precompiled k8 binary with:
|
2018-02-03 10:45:46 +08:00
|
|
|
|
2018-02-03 02:57:08 +08:00
|
|
|
```sh
|
|
|
|
|
curl -L https://github.com/attractivechaos/k8/releases/download/v0.2.4/k8-0.2.4.tar.bz2 | tar -jxf -
|
2018-02-03 10:45:46 +08:00
|
|
|
cp k8-0.2.4/k8-`uname -s` $HOME/bin/k8 # assuming $HOME/bin in your $PATH
|
2018-02-03 02:57:08 +08:00
|
|
|
```
|
2018-02-13 11:27:02 +08:00
|
|
|
|
2018-02-03 10:45:46 +08:00
|
|
|
It is highly recommended to copy the executable `k8` to a directory on your
|
2018-02-13 11:06:05 +08:00
|
|
|
`$PATH` such as `/usr/bin/env` can find it. Like python scripts, once you
|
2018-02-13 11:27:02 +08:00
|
|
|
install `k8`, you can launch paftools.js in one of the two ways:
|
2018-02-03 03:43:58 +08:00
|
|
|
|
2018-02-03 10:45:46 +08:00
|
|
|
```sh
|
2018-02-13 11:27:02 +08:00
|
|
|
path/to/paftools.js # only if k8 is on your $PATH
|
|
|
|
|
k8 path/to/paftools.js
|
2018-02-03 10:45:46 +08:00
|
|
|
```
|
2017-08-05 02:46:31 +08:00
|
|
|
|
2018-02-13 11:27:02 +08:00
|
|
|
In a nutshell, paftools.js has the following commands:
|
2017-08-05 02:46:31 +08:00
|
|
|
|
2018-02-13 11:27:02 +08:00
|
|
|
```
|
|
|
|
|
Usage: paftools.js <command> [arguments]
|
|
|
|
|
Commands:
|
|
|
|
|
view convert PAF to BLAST-like (for eyeballing) or MAF
|
|
|
|
|
splice2bed convert spliced alignment in PAF/SAM to BED12
|
|
|
|
|
sam2paf convert SAM to PAF
|
|
|
|
|
delta2paf convert MUMmer's delta to PAF
|
|
|
|
|
gff2bed convert GTF/GFF3 to BED12
|
|
|
|
|
|
|
|
|
|
stat collect basic mapping information in PAF/SAM
|
|
|
|
|
liftover simplistic liftOver
|
|
|
|
|
call call variants from asm-to-ref alignment with the cs tag
|
|
|
|
|
bedcov compute the number of bases covered
|
|
|
|
|
|
|
|
|
|
mapeval evaluate mapping accuracy using mason2/PBSIM-simulated FASTQ
|
|
|
|
|
mason2fq convert mason2-simulated SAM to FASTQ
|
|
|
|
|
pbsim2fq convert PBSIM-simulated MAF to FASTQ
|
|
|
|
|
junceval evaluate splice junction consistency with known annotations
|
|
|
|
|
ov-eval evaluate read overlap sensitivity using read-to-ref mapping
|
2018-02-03 10:45:46 +08:00
|
|
|
```
|
2017-08-05 02:46:31 +08:00
|
|
|
|
2018-02-13 11:06:05 +08:00
|
|
|
paftools.js seamlessly reads both plain text files and gzip'd text files.
|
2018-02-03 03:07:17 +08:00
|
|
|
|
2018-02-03 03:43:58 +08:00
|
|
|
## <a name="eval"></a>Evaluation
|
|
|
|
|
|
2018-02-03 03:18:18 +08:00
|
|
|
### <a name="mapeval"></a>Evaluating mapping accuracy with simulated reads
|
2018-02-03 02:57:08 +08:00
|
|
|
|
2018-02-13 11:06:05 +08:00
|
|
|
The **pbsim2fq** command of paftools.js converts the MAF output of [pbsim][pbsim]
|
2018-02-03 02:57:08 +08:00
|
|
|
to FASTQ and encodes the true mapping position in the read name in a format like
|
2018-02-13 11:06:05 +08:00
|
|
|
`S1_33!chr1!225258409!225267761!-`. Similarly, the **mason2fq** command
|
|
|
|
|
converts [mason2][mason2] simulated SAM to FASTQ.
|
|
|
|
|
|
|
|
|
|
Command **mapeval** evaluates mapped SAM/PAF. Here is example output:
|
2018-02-03 02:57:08 +08:00
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Q 60 32478 0 0.000000000 32478
|
|
|
|
|
Q 22 16 1 0.000030775 32494
|
|
|
|
|
Q 21 43 1 0.000061468 32537
|
|
|
|
|
Q 19 73 1 0.000091996 32610
|
|
|
|
|
Q 14 66 1 0.000122414 32676
|
|
|
|
|
Q 10 27 3 0.000214048 32703
|
|
|
|
|
Q 8 14 1 0.000244521 32717
|
|
|
|
|
Q 7 13 2 0.000305530 32730
|
|
|
|
|
Q 6 46 1 0.000335611 32776
|
|
|
|
|
Q 3 10 1 0.000366010 32786
|
|
|
|
|
Q 2 20 2 0.000426751 32806
|
|
|
|
|
Q 1 248 94 0.003267381 33054
|
|
|
|
|
Q 0 31 17 0.003778147 33085
|
|
|
|
|
U 3
|
|
|
|
|
```
|
2018-02-13 11:06:05 +08:00
|
|
|
|
2018-02-03 02:57:08 +08:00
|
|
|
where each Q-line gives the quality threshold, the number of reads mapped with
|
|
|
|
|
mapping quality equal to or greater than the threshold, number of wrong
|
|
|
|
|
mappings, accumulative mapping error rate and the accumulative number of
|
2018-02-13 11:06:05 +08:00
|
|
|
mapped reads. The U-line, if present, gives the number of unmapped reads if
|
|
|
|
|
they are present in the SAM file.
|
2018-02-03 02:57:08 +08:00
|
|
|
|
2018-02-03 10:45:46 +08:00
|
|
|
Suppose the reported mapping coordinate overlap with the true coordinate like
|
|
|
|
|
the following:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
truth: --------------------
|
|
|
|
|
mapper: ----------------------
|
|
|
|
|
|<- l1 ->|<-- o -->|<-- l2 -->|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Let `r=o/(l1+o+l2)`. The reported mapping is considered correct if `r>0.1` by
|
|
|
|
|
default.
|
|
|
|
|
|
2018-02-03 03:18:18 +08:00
|
|
|
### <a name="oveval"></a>Evaluating read overlap sensitivity
|
|
|
|
|
|
2018-02-13 11:27:02 +08:00
|
|
|
Command **ov-eval** takes *sorted* read-to-reference alignment and read
|
|
|
|
|
overlaps in PAF as input, and evaluates the sensitivity. For example:
|
2018-02-03 10:45:46 +08:00
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
minimap2 -cx map-pb ref.fa reads.fq.gz | sort -k6,6 -k8,8n > reads-to-ref.paf
|
|
|
|
|
minimap2 -x ava-pb reads.fq.gz reads.fq.gz > ovlp.paf
|
|
|
|
|
k8 ov-eval.js reads-to-ref.paf ovlp.paf
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## <a name="asmvar"></a>Calling Variants from Haploid Assemblies
|
|
|
|
|
|
2018-02-13 11:06:05 +08:00
|
|
|
The **call** command of paftools.js calls variants from coordinate-sorted
|
2018-02-03 10:45:46 +08:00
|
|
|
assembly-to-reference alignment. It calls variants from the [cs tag][cs] and
|
|
|
|
|
identifies confident/callable regions as those covered by exactly one contig.
|
|
|
|
|
Here are example command lines:
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
minimap2 -cx asm5 -t8 --cs ref.fa asm.fa > asm.paf # keeping this file is recommended; --cs required!
|
|
|
|
|
sort -k6,6 -k8,8n asm.paf > asm.srt.paf # sort by reference start coordinate
|
2018-02-09 23:47:12 +08:00
|
|
|
k8 paftools.js call asm.srt.paf > asm.var.txt
|
2018-02-03 10:45:46 +08:00
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Here is sample output:
|
|
|
|
|
|
|
|
|
|
```
|
2018-02-13 11:06:05 +08:00
|
|
|
V chr1 2276040 2276041 1 60 c g LJII01000171.1 1217409 1217410 +
|
|
|
|
|
V chr1 2280409 2280410 1 60 a g LJII01000171.1 1221778 1221779 +
|
|
|
|
|
V chr1 2280504 2280505 1 60 a g LJII01000171.1 1221873 1221874 +
|
|
|
|
|
R chr1 2325140 2436340
|
|
|
|
|
V chr1 2325287 2325287 1 60 - ct LJII01000171.1 1272894 1272896 +
|
|
|
|
|
V chr1 2325642 2325644 1 60 tt - LJII01000171.1 1273251 1273251 +
|
|
|
|
|
V chr1 2326051 2326052 1 60 c t LJII01000171.1 1273658 1273659 +
|
|
|
|
|
V chr1 2326287 2326288 1 60 c t LJII01000171.1 1273894 1273895 +
|
2018-02-03 10:45:46 +08:00
|
|
|
```
|
|
|
|
|
|
2018-02-13 11:06:05 +08:00
|
|
|
where a line starting with `R` gives regions covered by one query contig, and a
|
|
|
|
|
V-line encodes a variant in the following format: chr, start, end, query depth,
|
|
|
|
|
mapping quality, REF allele, ALT allele, query name, query start, end and the
|
|
|
|
|
query orientation. Generally, you should only look at variants where column 5
|
|
|
|
|
is one.
|
2018-02-03 10:45:46 +08:00
|
|
|
|
2018-02-13 11:06:05 +08:00
|
|
|
By default, when calling variants, "paftools.js call" ignores alignments 50kb
|
|
|
|
|
or shorter; when deriving callable regions, it ignores alignments 10kb or
|
|
|
|
|
shorter. It uses two thresholds to avoid edge effects. These defaults are
|
|
|
|
|
designed for long-read assemblies. For short reads, both should be reduced.
|
2018-02-03 10:45:46 +08:00
|
|
|
|
|
|
|
|
|
2018-02-03 03:18:18 +08:00
|
|
|
|
2018-02-13 11:06:05 +08:00
|
|
|
[paf]: https://github.com/lh3/miniasm/blob/master/PAF.md
|
2018-02-03 02:57:08 +08:00
|
|
|
[cs]: https://github.com/lh3/minimap2#cs
|
2017-08-05 02:46:31 +08:00
|
|
|
[k8]: https://github.com/attractivechaos/k8
|
|
|
|
|
[maf]: https://genome.ucsc.edu/FAQ/FAQformat#format5
|
|
|
|
|
[pbsim]: https://github.com/pfaucon/PBSIM-PacBio-Simulator
|
2018-02-03 02:57:08 +08:00
|
|
|
[mason2]: https://github.com/seqan/seqan/tree/master/apps/mason2
|