General doc improvements

This commit is contained in:
Heng Li 2018-02-02 21:45:46 -05:00
parent 756379bf83
commit 3df5015668
2 changed files with 107 additions and 32 deletions

View File

@ -1,10 +1,12 @@
## <a name="started"></a>Getting Started ## <a name="started"></a>Getting Started
```sh ```sh
curl -L https://github.com/attractivechaos/k8/releases/download/v0.2.4/k8-0.2.4.tar.bz2 | tar -jxf - curl -L https://github.com/attractivechaos/k8/releases/download/v0.2.4/k8-0.2.4.tar.bz2 | tar -jxf -
cp k8-0.2.4/k8-`uname -s` k8 # or better copy to a directory on PATH cp k8-0.2.4/k8-`uname -s` $HOME/bin/k8 # assuming $HOME/bin in your $PATH
minimap2 --cs test/MT-*.fa | paf2aln.js - | less # pretty print base alignment gff2bed.js anno.gtf | less -S # convert GTF/GFF3 to BED12 (if k8 installed to $PATH)
sam2paf.js aln.sam.gz | less -S # convert SAM to PAF k8 gff2bed.js anno.gtf | less -S # convert GTF/GFF3 to BED12 (if k8 not installed)
gff2bed.js anno.gtf | less -S # convert GTF/GFF3 to BED12 sam2paf.js aln.sam.gz | less -S # convert SAM to PAF
minimap2 --cs test/MT-*.fa | paf2aln.js - | less # pretty print base alignment
minimap2 -cx splice ref.fa rna-seq.fq | splice2bed.js - # convert splice aln to BED12 minimap2 -cx splice ref.fa rna-seq.fq | splice2bed.js - # convert splice aln to BED12
``` ```
@ -12,15 +14,15 @@ minimap2 -cx splice ref.fa rna-seq.fq | splice2bed.js - # convert splice aln t
- [Getting Started](#started) - [Getting Started](#started)
- [Introduction](#intro) - [Introduction](#intro)
- [Calling Variants from Assemblies](#asmvar)
- [Format Conversion](#conv) - [Format Conversion](#conv)
- [Convert PAF to other formats](#paf2aln) - [Convert PAF to MAF or BLAST-like format](#paf2aln)
- [Convert SAM to PAF](#sam2paf) - [Convert SAM to PAF](#sam2paf)
- [Convert GTF/GFF3 to BED12 format](#gff2bed) - [Convert GTF/GFF3 to BED12](#gff2bed)
- [Convert spliced alignment to BED12](#splice2bed) - [Convert spliced alignment to BED12](#splice2bed)
- [Evaluation](#eval) - [Evaluation](#eval)
- [Evaluating mapping accuracy with simulated reads](#mapeval) - [Evaluating mapping accuracy with simulated reads](#mapeval)
- [Evaluating read overlap sensitivity](#oveval) - [Evaluating read overlap sensitivity](#oveval)
- [Calling Variants from Assemblies](#asmvar)
## <a name="intro"></a>Introduction ## <a name="intro"></a>Introduction
@ -28,40 +30,42 @@ This directory contains auxiliary scripts for format conversion, mapping
accuracy evaluation and miscellaneous purposes. These scripts *require* accuracy evaluation and miscellaneous purposes. These scripts *require*
the [k8 Javascript shell][k8] to run. On Linux or Mac, you can download the [k8 Javascript shell][k8] to run. On Linux or Mac, you can download
the precompiled k8 binary with: the precompiled k8 binary with:
```sh ```sh
curl -L https://github.com/attractivechaos/k8/releases/download/v0.2.4/k8-0.2.4.tar.bz2 | tar -jxf - curl -L https://github.com/attractivechaos/k8/releases/download/v0.2.4/k8-0.2.4.tar.bz2 | tar -jxf -
cp k8-0.2.4/k8-`uname -s` k8 cp k8-0.2.4/k8-`uname -s` $HOME/bin/k8 # assuming $HOME/bin in your $PATH
``` ```
It is highly recommended to copy the executable `k8` to a directory on your It is highly recommended to copy the executable `k8` to a directory on your
`PATH` such as `/usr/bin/env` can find it. `$PATH` such as `/usr/bin/env` can find it. Like python or perl scripts, once
you install `k8`, you can launch these k8 scripts either with
## <a name="asmvar"></a>Calling Variants from Assemblies ```sh
path/to/gff2bed.js anno.gtf.gz
```
Script [paf2diff.js](paf2diff.js) calls variants from coordinate-sorted or with
assembly-to-reference alignment having the [cs tag][cs] (requiring the `--cs`
minimap2 option). ```sh
k8 path/to/gff2bed.js anno.gtf
```
All k8 scripts seamlessly work with both plain text files and gzip'd text files.
## <a name="conv"></a>Format Conversion ## <a name="conv"></a>Format Conversion
### <a name="paf2aln"></a>Convert PAF to other formats * <a name="paf2aln"></a>Script [paf2aln.js](paf2aln.js) converts PAF with the
[cs tag][cs] to [MAF][maf] or BLAST-like output. It only works with minimap2
output generated using the `--cs` tag.
Script [paf2aln.js](paf2aln.js) converts PAF with the [cs tag][cs] to * <a name="sam2paf"></a>Script [sam2paf.js](sam2paf.js) converts alignments in
[MAF][maf] or BLAST-like output. It only works with minimap2 output generated the SAM format to PAF.
using the `--cs` tag.
### <a name="sam2paf"></a>Convert SAM to PAF * <a name="gff2bed"></a>Script [gff2bed.js](gff2bed.js) converts GFF format to
12-column BED format. It seamlessly works with both GTF and GFF3.
Script [sam2paf.js](sam2paf.js) converts alignments in the SAM format to PAF. * <a name="splice2bed"></a>Script [splice2bed.js](splice2bed.js) converts
spliced alignment in SAM or PAF to 12-column BED format.
### <a name="gff2bed"></a>Convert GTF/GFF3 to BED12 format
Script [gff2bed.js](gff2bed.js) converts GFF format to 12-column BED format. It
seamlessly works with both GTF and GFF3.
### <a name="splice2bed"></a>Convert spliced alignment to BED12
Script [splice2bed.js](splice2bed.js) converts spliced alignment in SAM or PAF
to 12-column BED format.
## <a name="eval"></a>Evaluation ## <a name="eval"></a>Evaluation
@ -96,10 +100,72 @@ mappings, accumulative mapping error rate and the accumulative number of
mapped reads. The U-line gives the number of unmapped reads if they are present mapped reads. The U-line gives the number of unmapped reads if they are present
in the SAM file. in the SAM file.
Suppose the reported mapping coordinate overlap with the true coordinate like
the following:
```
truth: --------------------
mapper: ----------------------
|<- l1 ->|<-- o -->|<-- l2 -->|
```
Let `r=o/(l1+o+l2)`. The reported mapping is considered correct if `r>0.1` by
default.
### <a name="oveval"></a>Evaluating read overlap sensitivity ### <a name="oveval"></a>Evaluating read overlap sensitivity
Script [ov-eval.js](ov-eval.js) takes read-to-reference alignment in PAF and Script [ov-eval.js](ov-eval.js) takes sorted read-to-reference alignment and
read overlaps in PAF and evaluates the sensitivity. read overlaps in PAF as input, and evaluates the sensitivity. For example:
```sh
minimap2 -cx map-pb ref.fa reads.fq.gz | sort -k6,6 -k8,8n > reads-to-ref.paf
minimap2 -x ava-pb reads.fq.gz reads.fq.gz > ovlp.paf
k8 ov-eval.js reads-to-ref.paf ovlp.paf
```
## <a name="asmvar"></a>Calling Variants from Haploid Assemblies
Script [paf2diff.js](paf2diff.js) calls variants from coordinate-sorted
assembly-to-reference alignment. It calls variants from the [cs tag][cs] and
identifies confident/callable regions as those covered by exactly one contig.
Here are example command lines:
```sh
minimap2 -cx asm5 -t8 --cs ref.fa asm.fa > asm.paf # keeping this file is recommended; --cs required!
sort -k6,6 -k8,8n asm.paf > asm.srt.paf # sort by reference start coordinate
k8 paf2diff.js asm.srt.paf > asm.var.txt
```
Here is sample output:
```
V chr1 3181702 3181703 1 60 c t
V chr1 3181730 3181768 1 60 gtcttacacacggagtcttacacacggtcttacacaca -
R chr1 3181796 3260557
V chr1 3181818 3181822 1 60 tgcg -
V chr1 3181831 3181832 1 60 a g
V chr1 3181832 3181833 1 60 t c
V chr1 3181833 3181834 1 60 t g
V chr1 3181874 3181874 1 60 - ca
V chr1 3181879 3181880 1 60 g a
V chr1 3181886 3181887 1 60 c g
V chr1 3181911 3181911 1 60 - agtcttacacatgcagtcttacacat
V chr1 3181924 3181925 1 60 t c
V chr1 3182079 3182080 1 60 g a
V chr1 3182150 3182151 1 60 t c
V chr1 3182336 3182337 1 60 t c
```
where a line starting with `R` gives regions covered by one contig, and a
V-line encodes a variant in the following format: chr, start, end, contig
depth, mapping quality, REF allele and ALT allele.
By default, when calling variants, this script ignores alignments 50kb or
shorter; when deriving callable regions, it ignores alignments 10kb or shorter.
It uses two thresholds to avoid edge effects. These defaults are designed for
long-read assemblies. For short reads, both should be reduced.
[cs]: https://github.com/lh3/minimap2#cs [cs]: https://github.com/lh3/minimap2#cs
[k8]: https://github.com/attractivechaos/k8 [k8]: https://github.com/attractivechaos/k8

View File

@ -46,7 +46,16 @@ while ((c = getopt(arguments, "Q:r:m:c")) != null) {
else if (c == 'c') cap_short_mapq = true; else if (c == 'c') cap_short_mapq = true;
} }
var file = arguments.length == getopt.ind || arguments[getopt.ind] == '-'? new File() : new File(arguments[getopt.ind]); if (arguments.length == getopt.ind) {
warn("Usage: k8 sim-eval.js [options] <in.paf>|<in.sam>");
warn("Options:");
warn(" -r FLOAT mapping correct if overlap_length/union_length>FLOAT [" + ovlp_ratio + "]");
warn(" -Q INT print wrong mappings with mapQ>INT [don't print]");
warn(" -m INT 0: eval the longest aln only; 1: first aln only; 2: all primary aln [0]");
exit(1);
}
var file = arguments[getopt.ind] == '-'? new File() : new File(arguments[getopt.ind]);
var buf = new Bytes(); var buf = new Bytes();
var tot = [], err = []; var tot = [], err = [];