General doc improvements
This commit is contained in:
parent
756379bf83
commit
3df5015668
128
misc/README.md
128
misc/README.md
|
|
@ -1,10 +1,12 @@
|
||||||
## <a name="started"></a>Getting Started
|
## <a name="started"></a>Getting Started
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
curl -L https://github.com/attractivechaos/k8/releases/download/v0.2.4/k8-0.2.4.tar.bz2 | tar -jxf -
|
curl -L https://github.com/attractivechaos/k8/releases/download/v0.2.4/k8-0.2.4.tar.bz2 | tar -jxf -
|
||||||
cp k8-0.2.4/k8-`uname -s` k8 # or better copy to a directory on PATH
|
cp k8-0.2.4/k8-`uname -s` $HOME/bin/k8 # assuming $HOME/bin in your $PATH
|
||||||
minimap2 --cs test/MT-*.fa | paf2aln.js - | less # pretty print base alignment
|
gff2bed.js anno.gtf | less -S # convert GTF/GFF3 to BED12 (if k8 installed to $PATH)
|
||||||
sam2paf.js aln.sam.gz | less -S # convert SAM to PAF
|
k8 gff2bed.js anno.gtf | less -S # convert GTF/GFF3 to BED12 (if k8 not installed)
|
||||||
gff2bed.js anno.gtf | less -S # convert GTF/GFF3 to BED12
|
sam2paf.js aln.sam.gz | less -S # convert SAM to PAF
|
||||||
|
minimap2 --cs test/MT-*.fa | paf2aln.js - | less # pretty print base alignment
|
||||||
minimap2 -cx splice ref.fa rna-seq.fq | splice2bed.js - # convert splice aln to BED12
|
minimap2 -cx splice ref.fa rna-seq.fq | splice2bed.js - # convert splice aln to BED12
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
@ -12,15 +14,15 @@ minimap2 -cx splice ref.fa rna-seq.fq | splice2bed.js - # convert splice aln t
|
||||||
|
|
||||||
- [Getting Started](#started)
|
- [Getting Started](#started)
|
||||||
- [Introduction](#intro)
|
- [Introduction](#intro)
|
||||||
- [Calling Variants from Assemblies](#asmvar)
|
|
||||||
- [Format Conversion](#conv)
|
- [Format Conversion](#conv)
|
||||||
- [Convert PAF to other formats](#paf2aln)
|
- [Convert PAF to MAF or BLAST-like format](#paf2aln)
|
||||||
- [Convert SAM to PAF](#sam2paf)
|
- [Convert SAM to PAF](#sam2paf)
|
||||||
- [Convert GTF/GFF3 to BED12 format](#gff2bed)
|
- [Convert GTF/GFF3 to BED12](#gff2bed)
|
||||||
- [Convert spliced alignment to BED12](#splice2bed)
|
- [Convert spliced alignment to BED12](#splice2bed)
|
||||||
- [Evaluation](#eval)
|
- [Evaluation](#eval)
|
||||||
- [Evaluating mapping accuracy with simulated reads](#mapeval)
|
- [Evaluating mapping accuracy with simulated reads](#mapeval)
|
||||||
- [Evaluating read overlap sensitivity](#oveval)
|
- [Evaluating read overlap sensitivity](#oveval)
|
||||||
|
- [Calling Variants from Assemblies](#asmvar)
|
||||||
|
|
||||||
## <a name="intro"></a>Introduction
|
## <a name="intro"></a>Introduction
|
||||||
|
|
||||||
|
|
@ -28,40 +30,42 @@ This directory contains auxiliary scripts for format conversion, mapping
|
||||||
accuracy evaluation and miscellaneous purposes. These scripts *require*
|
accuracy evaluation and miscellaneous purposes. These scripts *require*
|
||||||
the [k8 Javascript shell][k8] to run. On Linux or Mac, you can download
|
the [k8 Javascript shell][k8] to run. On Linux or Mac, you can download
|
||||||
the precompiled k8 binary with:
|
the precompiled k8 binary with:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
curl -L https://github.com/attractivechaos/k8/releases/download/v0.2.4/k8-0.2.4.tar.bz2 | tar -jxf -
|
curl -L https://github.com/attractivechaos/k8/releases/download/v0.2.4/k8-0.2.4.tar.bz2 | tar -jxf -
|
||||||
cp k8-0.2.4/k8-`uname -s` k8
|
cp k8-0.2.4/k8-`uname -s` $HOME/bin/k8 # assuming $HOME/bin in your $PATH
|
||||||
```
|
```
|
||||||
|
|
||||||
It is highly recommended to copy the executable `k8` to a directory on your
|
It is highly recommended to copy the executable `k8` to a directory on your
|
||||||
`PATH` such as `/usr/bin/env` can find it.
|
`$PATH` such as `/usr/bin/env` can find it. Like python or perl scripts, once
|
||||||
|
you install `k8`, you can launch these k8 scripts either with
|
||||||
|
|
||||||
## <a name="asmvar"></a>Calling Variants from Assemblies
|
```sh
|
||||||
|
path/to/gff2bed.js anno.gtf.gz
|
||||||
|
```
|
||||||
|
|
||||||
Script [paf2diff.js](paf2diff.js) calls variants from coordinate-sorted
|
or with
|
||||||
assembly-to-reference alignment having the [cs tag][cs] (requiring the `--cs`
|
|
||||||
minimap2 option).
|
```sh
|
||||||
|
k8 path/to/gff2bed.js anno.gtf
|
||||||
|
```
|
||||||
|
|
||||||
|
All k8 scripts seamlessly work with both plain text files and gzip'd text files.
|
||||||
|
|
||||||
## <a name="conv"></a>Format Conversion
|
## <a name="conv"></a>Format Conversion
|
||||||
|
|
||||||
### <a name="paf2aln"></a>Convert PAF to other formats
|
* <a name="paf2aln"></a>Script [paf2aln.js](paf2aln.js) converts PAF with the
|
||||||
|
[cs tag][cs] to [MAF][maf] or BLAST-like output. It only works with minimap2
|
||||||
|
output generated using the `--cs` tag.
|
||||||
|
|
||||||
Script [paf2aln.js](paf2aln.js) converts PAF with the [cs tag][cs] to
|
* <a name="sam2paf"></a>Script [sam2paf.js](sam2paf.js) converts alignments in
|
||||||
[MAF][maf] or BLAST-like output. It only works with minimap2 output generated
|
the SAM format to PAF.
|
||||||
using the `--cs` tag.
|
|
||||||
|
|
||||||
### <a name="sam2paf"></a>Convert SAM to PAF
|
* <a name="gff2bed"></a>Script [gff2bed.js](gff2bed.js) converts GFF format to
|
||||||
|
12-column BED format. It seamlessly works with both GTF and GFF3.
|
||||||
|
|
||||||
Script [sam2paf.js](sam2paf.js) converts alignments in the SAM format to PAF.
|
* <a name="splice2bed"></a>Script [splice2bed.js](splice2bed.js) converts
|
||||||
|
spliced alignment in SAM or PAF to 12-column BED format.
|
||||||
### <a name="gff2bed"></a>Convert GTF/GFF3 to BED12 format
|
|
||||||
|
|
||||||
Script [gff2bed.js](gff2bed.js) converts GFF format to 12-column BED format. It
|
|
||||||
seamlessly works with both GTF and GFF3.
|
|
||||||
|
|
||||||
### <a name="splice2bed"></a>Convert spliced alignment to BED12
|
|
||||||
|
|
||||||
Script [splice2bed.js](splice2bed.js) converts spliced alignment in SAM or PAF
|
|
||||||
to 12-column BED format.
|
|
||||||
|
|
||||||
## <a name="eval"></a>Evaluation
|
## <a name="eval"></a>Evaluation
|
||||||
|
|
||||||
|
|
@ -96,10 +100,72 @@ mappings, accumulative mapping error rate and the accumulative number of
|
||||||
mapped reads. The U-line gives the number of unmapped reads if they are present
|
mapped reads. The U-line gives the number of unmapped reads if they are present
|
||||||
in the SAM file.
|
in the SAM file.
|
||||||
|
|
||||||
|
Suppose the reported mapping coordinate overlap with the true coordinate like
|
||||||
|
the following:
|
||||||
|
|
||||||
|
```
|
||||||
|
truth: --------------------
|
||||||
|
mapper: ----------------------
|
||||||
|
|<- l1 ->|<-- o -->|<-- l2 -->|
|
||||||
|
```
|
||||||
|
|
||||||
|
Let `r=o/(l1+o+l2)`. The reported mapping is considered correct if `r>0.1` by
|
||||||
|
default.
|
||||||
|
|
||||||
### <a name="oveval"></a>Evaluating read overlap sensitivity
|
### <a name="oveval"></a>Evaluating read overlap sensitivity
|
||||||
|
|
||||||
Script [ov-eval.js](ov-eval.js) takes read-to-reference alignment in PAF and
|
Script [ov-eval.js](ov-eval.js) takes sorted read-to-reference alignment and
|
||||||
read overlaps in PAF and evaluates the sensitivity.
|
read overlaps in PAF as input, and evaluates the sensitivity. For example:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
minimap2 -cx map-pb ref.fa reads.fq.gz | sort -k6,6 -k8,8n > reads-to-ref.paf
|
||||||
|
minimap2 -x ava-pb reads.fq.gz reads.fq.gz > ovlp.paf
|
||||||
|
k8 ov-eval.js reads-to-ref.paf ovlp.paf
|
||||||
|
```
|
||||||
|
|
||||||
|
## <a name="asmvar"></a>Calling Variants from Haploid Assemblies
|
||||||
|
|
||||||
|
Script [paf2diff.js](paf2diff.js) calls variants from coordinate-sorted
|
||||||
|
assembly-to-reference alignment. It calls variants from the [cs tag][cs] and
|
||||||
|
identifies confident/callable regions as those covered by exactly one contig.
|
||||||
|
Here are example command lines:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
minimap2 -cx asm5 -t8 --cs ref.fa asm.fa > asm.paf # keeping this file is recommended; --cs required!
|
||||||
|
sort -k6,6 -k8,8n asm.paf > asm.srt.paf # sort by reference start coordinate
|
||||||
|
k8 paf2diff.js asm.srt.paf > asm.var.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
Here is sample output:
|
||||||
|
|
||||||
|
```
|
||||||
|
V chr1 3181702 3181703 1 60 c t
|
||||||
|
V chr1 3181730 3181768 1 60 gtcttacacacggagtcttacacacggtcttacacaca -
|
||||||
|
R chr1 3181796 3260557
|
||||||
|
V chr1 3181818 3181822 1 60 tgcg -
|
||||||
|
V chr1 3181831 3181832 1 60 a g
|
||||||
|
V chr1 3181832 3181833 1 60 t c
|
||||||
|
V chr1 3181833 3181834 1 60 t g
|
||||||
|
V chr1 3181874 3181874 1 60 - ca
|
||||||
|
V chr1 3181879 3181880 1 60 g a
|
||||||
|
V chr1 3181886 3181887 1 60 c g
|
||||||
|
V chr1 3181911 3181911 1 60 - agtcttacacatgcagtcttacacat
|
||||||
|
V chr1 3181924 3181925 1 60 t c
|
||||||
|
V chr1 3182079 3182080 1 60 g a
|
||||||
|
V chr1 3182150 3182151 1 60 t c
|
||||||
|
V chr1 3182336 3182337 1 60 t c
|
||||||
|
```
|
||||||
|
|
||||||
|
where a line starting with `R` gives regions covered by one contig, and a
|
||||||
|
V-line encodes a variant in the following format: chr, start, end, contig
|
||||||
|
depth, mapping quality, REF allele and ALT allele.
|
||||||
|
|
||||||
|
By default, when calling variants, this script ignores alignments 50kb or
|
||||||
|
shorter; when deriving callable regions, it ignores alignments 10kb or shorter.
|
||||||
|
It uses two thresholds to avoid edge effects. These defaults are designed for
|
||||||
|
long-read assemblies. For short reads, both should be reduced.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
[cs]: https://github.com/lh3/minimap2#cs
|
[cs]: https://github.com/lh3/minimap2#cs
|
||||||
[k8]: https://github.com/attractivechaos/k8
|
[k8]: https://github.com/attractivechaos/k8
|
||||||
|
|
|
||||||
|
|
@ -46,7 +46,16 @@ while ((c = getopt(arguments, "Q:r:m:c")) != null) {
|
||||||
else if (c == 'c') cap_short_mapq = true;
|
else if (c == 'c') cap_short_mapq = true;
|
||||||
}
|
}
|
||||||
|
|
||||||
var file = arguments.length == getopt.ind || arguments[getopt.ind] == '-'? new File() : new File(arguments[getopt.ind]);
|
if (arguments.length == getopt.ind) {
|
||||||
|
warn("Usage: k8 sim-eval.js [options] <in.paf>|<in.sam>");
|
||||||
|
warn("Options:");
|
||||||
|
warn(" -r FLOAT mapping correct if overlap_length/union_length>FLOAT [" + ovlp_ratio + "]");
|
||||||
|
warn(" -Q INT print wrong mappings with mapQ>INT [don't print]");
|
||||||
|
warn(" -m INT 0: eval the longest aln only; 1: first aln only; 2: all primary aln [0]");
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
var file = arguments[getopt.ind] == '-'? new File() : new File(arguments[getopt.ind]);
|
||||||
var buf = new Bytes();
|
var buf = new Bytes();
|
||||||
|
|
||||||
var tot = [], err = [];
|
var tot = [], err = [];
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue