rework misc/README; progressing but unfinished
This commit is contained in:
parent
7ef5490884
commit
874b8c4795
118
misc/README.md
118
misc/README.md
|
|
@ -1,24 +1,28 @@
|
|||
## <a name="started"></a>Getting Started
|
||||
|
||||
```sh
|
||||
# install minimap2
|
||||
git clone https://github.com/lh3/minimap2
|
||||
cd minimap2 && make
|
||||
# install the k8 javascript shell
|
||||
curl -L https://github.com/attractivechaos/k8/releases/download/v0.2.4/k8-0.2.4.tar.bz2 | tar -jxf -
|
||||
cp k8-0.2.4/k8-`uname -s` $HOME/bin/k8 # assuming $HOME/bin in your $PATH
|
||||
gff2bed.js anno.gtf | less -S # convert GTF/GFF3 to BED12 (if k8 installed to $PATH)
|
||||
k8 gff2bed.js anno.gtf | less -S # convert GTF/GFF3 to BED12 (if k8 not installed)
|
||||
sam2paf.js aln.sam.gz | less -S # convert SAM to PAF
|
||||
minimap2 --cs test/MT-*.fa | paf2aln.js - | less # pretty print base alignment
|
||||
minimap2 -cx splice ref.fa rna-seq.fq | splice2bed.js - # convert splice aln to BED12
|
||||
cp k8-0.2.4/k8-`uname -s` k8 # or copy it to a directory on your $PATH
|
||||
# export PATH="$PATH:`pwd`:`pwd`/misc" # run this if k8, minimap2 or paftools.js not on your $PATH
|
||||
minimap2 --cs test/MT-human.fa test/MT-orang.fa | paftools.js view - # view alignment
|
||||
minimap2 -c test/MT-human.fa test/MT-orang.fa | paftools.js stat - # basic alignment statistics
|
||||
minimap2 -c --cs test/MT-human.fa test/MT-orang.fa \
|
||||
| sort -k6,6 -k8,8n | paftools.js call -L15000 - # calling variants from asm-to-ref alignment
|
||||
minimap2 -c test/MT-human.fa test/MT-orang.fa \
|
||||
| paftools.js liftover -l10000 - <(echo -e "MT_orang\t2000\t5000") # liftOver
|
||||
# no test data for the following examples
|
||||
paftools.js junceval -e anno.gtf splice.sam > out.txt # compare splice junctions to annotations
|
||||
paftools.js splice2bed anno.gtf > anno.bed # convert GTF/GFF3 to BED12
|
||||
```
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Getting Started](#started)
|
||||
- [Introduction](#intro)
|
||||
- [Format Conversion](#conv)
|
||||
- [Convert PAF to MAF or BLAST-like format](#paf2aln)
|
||||
- [Convert SAM to PAF](#sam2paf)
|
||||
- [Convert GTF/GFF3 to BED12](#gff2bed)
|
||||
- [Convert spliced alignment to BED12](#splice2bed)
|
||||
- [Evaluation](#eval)
|
||||
- [Evaluating mapping accuracy with simulated reads](#mapeval)
|
||||
- [Evaluating read overlap sensitivity](#oveval)
|
||||
|
|
@ -26,58 +30,43 @@ minimap2 -cx splice ref.fa rna-seq.fq | splice2bed.js - # convert splice aln t
|
|||
|
||||
## <a name="intro"></a>Introduction
|
||||
|
||||
This directory contains auxiliary scripts for format conversion, mapping
|
||||
accuracy evaluation and miscellaneous purposes. These scripts *require*
|
||||
the [k8 Javascript shell][k8] to run. On Linux or Mac, you can download
|
||||
the precompiled k8 binary with:
|
||||
paftools.js is a script that processes alignments in the [PAF format][paf],
|
||||
such as converting between formats, evaluating mapping accuracy, lifting over
|
||||
BED files based on alignment, and calling variants from assembly-to-assembly
|
||||
alignment. This script *requires* the [k8 Javascript shell][k8] to run. On
|
||||
Linux or Mac, you can download the precompiled k8 binary with:
|
||||
|
||||
```sh
|
||||
curl -L https://github.com/attractivechaos/k8/releases/download/v0.2.4/k8-0.2.4.tar.bz2 | tar -jxf -
|
||||
cp k8-0.2.4/k8-`uname -s` $HOME/bin/k8 # assuming $HOME/bin in your $PATH
|
||||
```
|
||||
|
||||
It is highly recommended to copy the executable `k8` to a directory on your
|
||||
`$PATH` such as `/usr/bin/env` can find it. Like python or perl scripts, once
|
||||
you install `k8`, you can launch these k8 scripts either with
|
||||
`$PATH` such as `/usr/bin/env` can find it. Like python scripts, once you
|
||||
install `k8`, you can launch these k8 scripts either with
|
||||
|
||||
```sh
|
||||
path/to/gff2bed.js anno.gtf.gz
|
||||
path/to/paftools.js
|
||||
```
|
||||
|
||||
or with
|
||||
|
||||
```sh
|
||||
k8 path/to/gff2bed.js anno.gtf
|
||||
k8 path/to/paftools.js
|
||||
```
|
||||
|
||||
All k8 scripts seamlessly work with both plain text files and gzip'd text files.
|
||||
|
||||
## <a name="conv"></a>Format Conversion
|
||||
|
||||
* <a name="paf2aln"></a>Script [paf2aln.js](paf2aln.js) converts PAF with the
|
||||
[cs tag][cs] to [MAF][maf] or BLAST-like output. It only works with minimap2
|
||||
output generated using the `--cs` tag.
|
||||
|
||||
* <a name="sam2paf"></a>Script [sam2paf.js](sam2paf.js) converts alignments in
|
||||
the SAM format to PAF.
|
||||
|
||||
* <a name="gff2bed"></a>Script [gff2bed.js](gff2bed.js) converts GFF format to
|
||||
12-column BED format. It seamlessly works with both GTF and GFF3.
|
||||
|
||||
* <a name="splice2bed"></a>Script [splice2bed.js](splice2bed.js) converts
|
||||
spliced alignment in SAM or PAF to 12-column BED format.
|
||||
paftools.js seamlessly reads both plain text files and gzip'd text files.
|
||||
|
||||
## <a name="eval"></a>Evaluation
|
||||
|
||||
### <a name="mapeval"></a>Evaluating mapping accuracy with simulated reads
|
||||
|
||||
Script [sim-pbsim.js](sim-pbsim.js) converts the MAF output of [pbsim][pbsim]
|
||||
The **pbsim2fq** command of paftools.js converts the MAF output of [pbsim][pbsim]
|
||||
to FASTQ and encodes the true mapping position in the read name in a format like
|
||||
`S1_33!chr1!225258409!225267761!-`. Similarly, script
|
||||
[sim-mason2.js](sim-mason2.js) converts [mason2][mason2] simulated SAM to
|
||||
FASTQ.
|
||||
`S1_33!chr1!225258409!225267761!-`. Similarly, the **mason2fq** command
|
||||
converts [mason2][mason2] simulated SAM to FASTQ.
|
||||
|
||||
Command **mapeval** evaluates mapped SAM/PAF. Here is example output:
|
||||
|
||||
Script [sim-eval.js](sim-eval.js) evaluates mapped SAM/PAF. Here is example output:
|
||||
```
|
||||
Q 60 32478 0 0.000000000 32478
|
||||
Q 22 16 1 0.000030775 32494
|
||||
|
|
@ -94,11 +83,12 @@ Q 1 248 94 0.003267381 33054
|
|||
Q 0 31 17 0.003778147 33085
|
||||
U 3
|
||||
```
|
||||
|
||||
where each Q-line gives the quality threshold, the number of reads mapped with
|
||||
mapping quality equal to or greater than the threshold, number of wrong
|
||||
mappings, accumulative mapping error rate and the accumulative number of
|
||||
mapped reads. The U-line gives the number of unmapped reads if they are present
|
||||
in the SAM file.
|
||||
mapped reads. The U-line, if present, gives the number of unmapped reads if
|
||||
they are present in the SAM file.
|
||||
|
||||
Suppose the reported mapping coordinate overlap with the true coordinate like
|
||||
the following:
|
||||
|
|
@ -125,7 +115,7 @@ k8 ov-eval.js reads-to-ref.paf ovlp.paf
|
|||
|
||||
## <a name="asmvar"></a>Calling Variants from Haploid Assemblies
|
||||
|
||||
Command `paftools.js call` calls variants from coordinate-sorted
|
||||
The **call** command of paftools.js calls variants from coordinate-sorted
|
||||
assembly-to-reference alignment. It calls variants from the [cs tag][cs] and
|
||||
identifies confident/callable regions as those covered by exactly one contig.
|
||||
Here are example command lines:
|
||||
|
|
@ -139,34 +129,30 @@ k8 paftools.js call asm.srt.paf > asm.var.txt
|
|||
Here is sample output:
|
||||
|
||||
```
|
||||
V chr1 3181702 3181703 1 60 c t
|
||||
V chr1 3181730 3181768 1 60 gtcttacacacggagtcttacacacggtcttacacaca -
|
||||
R chr1 3181796 3260557
|
||||
V chr1 3181818 3181822 1 60 tgcg -
|
||||
V chr1 3181831 3181832 1 60 a g
|
||||
V chr1 3181832 3181833 1 60 t c
|
||||
V chr1 3181833 3181834 1 60 t g
|
||||
V chr1 3181874 3181874 1 60 - ca
|
||||
V chr1 3181879 3181880 1 60 g a
|
||||
V chr1 3181886 3181887 1 60 c g
|
||||
V chr1 3181911 3181911 1 60 - agtcttacacatgcagtcttacacat
|
||||
V chr1 3181924 3181925 1 60 t c
|
||||
V chr1 3182079 3182080 1 60 g a
|
||||
V chr1 3182150 3182151 1 60 t c
|
||||
V chr1 3182336 3182337 1 60 t c
|
||||
V chr1 2276040 2276041 1 60 c g LJII01000171.1 1217409 1217410 +
|
||||
V chr1 2280409 2280410 1 60 a g LJII01000171.1 1221778 1221779 +
|
||||
V chr1 2280504 2280505 1 60 a g LJII01000171.1 1221873 1221874 +
|
||||
R chr1 2325140 2436340
|
||||
V chr1 2325287 2325287 1 60 - ct LJII01000171.1 1272894 1272896 +
|
||||
V chr1 2325642 2325644 1 60 tt - LJII01000171.1 1273251 1273251 +
|
||||
V chr1 2326051 2326052 1 60 c t LJII01000171.1 1273658 1273659 +
|
||||
V chr1 2326287 2326288 1 60 c t LJII01000171.1 1273894 1273895 +
|
||||
```
|
||||
|
||||
where a line starting with `R` gives regions covered by one contig, and a
|
||||
V-line encodes a variant in the following format: chr, start, end, contig
|
||||
depth, mapping quality, REF allele and ALT allele.
|
||||
where a line starting with `R` gives regions covered by one query contig, and a
|
||||
V-line encodes a variant in the following format: chr, start, end, query depth,
|
||||
mapping quality, REF allele, ALT allele, query name, query start, end and the
|
||||
query orientation. Generally, you should only look at variants where column 5
|
||||
is one.
|
||||
|
||||
By default, when calling variants, this script ignores alignments 50kb or
|
||||
shorter; when deriving callable regions, it ignores alignments 10kb or shorter.
|
||||
It uses two thresholds to avoid edge effects. These defaults are designed for
|
||||
long-read assemblies. For short reads, both should be reduced.
|
||||
By default, when calling variants, "paftools.js call" ignores alignments 50kb
|
||||
or shorter; when deriving callable regions, it ignores alignments 10kb or
|
||||
shorter. It uses two thresholds to avoid edge effects. These defaults are
|
||||
designed for long-read assemblies. For short reads, both should be reduced.
|
||||
|
||||
|
||||
|
||||
[paf]: https://github.com/lh3/miniasm/blob/master/PAF.md
|
||||
[cs]: https://github.com/lh3/minimap2#cs
|
||||
[k8]: https://github.com/attractivechaos/k8
|
||||
[maf]: https://genome.ucsc.edu/FAQ/FAQformat#format5
|
||||
|
|
|
|||
|
|
@ -175,7 +175,7 @@ function paf_liftover(args)
|
|||
{
|
||||
if (fn == null) return null;
|
||||
if (typeof to_merge == 'undefined') to_merge = true;
|
||||
var file = new File(fn);
|
||||
var file = fn == '-'? new File() : new File(fn);
|
||||
var buf = new Bytes();
|
||||
var bed = {};
|
||||
while (file.readline(buf) >= 0) {
|
||||
|
|
@ -214,7 +214,7 @@ function paf_liftover(args)
|
|||
}
|
||||
var bed = read_bed(args[getopt.ind+1], to_merge);
|
||||
|
||||
var file = new File(args[getopt.ind]);
|
||||
var file = args[getopt.ind] == '-'? new File() : new File(args[getopt.ind]);
|
||||
var buf = new Bytes();
|
||||
while (file.readline(buf) >= 0) {
|
||||
var t = buf.toString().split("\t");
|
||||
|
|
@ -485,7 +485,7 @@ function paf_stat(args)
|
|||
}
|
||||
|
||||
var buf = new Bytes();
|
||||
var file = new File(args[getopt.ind]);
|
||||
var file = args[getopt.ind] == '-'? new File() : new File(args[getopt.ind]);
|
||||
var re = /(\d+)([MIDSHNX=])/g;
|
||||
|
||||
var lineno = 0, n_pri = 0, n_2nd = 0, n_seq = 0, n_cigar_64k = 0, l_tot = 0, l_cov = 0;
|
||||
|
|
|
|||
Loading…
Reference in New Issue