Split README-alt.md into two documentations
This commit is contained in:
parent
9ade171f7a
commit
1993cb634d
|
|
@ -1,7 +1,7 @@
|
|||
## Getting Started
|
||||
## For the Impatient
|
||||
|
||||
```sh
|
||||
# Download the bwa-0.7.11 binary package
|
||||
# Download the bwa-0.7.11 binary package (download link may change)
|
||||
wget -O- http://sourceforge.net/projects/bio-bwa/files/bwakit-0.7.11_x64-linux.tar.bz2/download \
|
||||
| gzip -dc | tar xf -
|
||||
# Generate the GRCh38+ALT+decoy+HLA and create the BWA index
|
||||
|
|
@ -11,22 +11,9 @@ bwa.kit/bwa index hs38d6.fa # create BWA index
|
|||
bwa.kit/run-bwamem -o out hs38d6.fa read1.fq read2.fq | sh # skip "|sh" to show command lines
|
||||
```
|
||||
|
||||
This will generate the following files:
|
||||
|
||||
* `out.aln.bam`: unsorted alignments with ALT-aware mapping quality. In this
|
||||
file, one read may be placed on multiple overlapping ALT contigs at the same
|
||||
time even if the read is mapped better to some contigs than others. This makes
|
||||
it possible to analyze each contig independent of others.
|
||||
|
||||
* `out.hla.top`: best genotypes for HLA-A, -B, -C, -DQA1, -DQB1 and -DRB1 genes.
|
||||
|
||||
* `out.hla.all`: other possible genotypes on the six HLA genes.
|
||||
|
||||
* `out.log.*`: bwa-mem, samblaster and HLA typing log files.
|
||||
|
||||
Note that `run-bwamem` only prints command lines but doesn't execute them. It
|
||||
is advised to have a look at the command lines before passing them to `sh` for
|
||||
actual execution.
|
||||
This generates `out.aln.bam` as the final alignment, `out.hla.top` for best HLA
|
||||
genotypes on each gene and `out.hla.all` for other possible HLA genotypes.
|
||||
Please check out [bwa/bwakit/README.md][kithelp] for details.
|
||||
|
||||
## Background
|
||||
|
||||
|
|
@ -57,7 +44,7 @@ postprocessing. The `bwa.kit/run-bwamem` script performs the two steps when ALT
|
|||
contigs are present. The following picture shows an example about how BWA-MEM
|
||||
infers mapping quality and reports alignment after step 2:
|
||||
|
||||

|
||||

|
||||
|
||||
#### Step 1: BWA-MEM mapping
|
||||
|
||||
|
|
@ -189,3 +176,4 @@ can even get rid of ALT contigs for good.
|
|||
[hla2]: http://nar.oxfordjournals.org/content/41/14/e142.full.pdf+html
|
||||
[hla3]: http://www.biomedcentral.com/1471-2164/15/325
|
||||
[hla4]: http://genomemedicine.com/content/4/12/95
|
||||
[kithelp]: https://github.com/lh3/bwa/tree/master/bwakit
|
||||
|
|
|
|||
|
|
@ -0,0 +1,50 @@
|
|||
Bwakit is a self-consistent installation-free package of scripts and precompiled
|
||||
binaries which provide an end-to-end solution to read mapping. In addition to
|
||||
the basic mapping functionality implemented in bwa, bwakit is able to generate
|
||||
proper human reference genome and to take advantage of ALT contigs, if present,
|
||||
to improve read mapping and to perform HLA typing for high-coverage human data.
|
||||
It can remap name- or coordinate-sorted BAM with read group and barcode
|
||||
information retained. Bwakit also *optionally* trims adapters (via
|
||||
[trimadap][ta]), marks duplicates (via [samblaster][sb]) and sorts the final
|
||||
alignment (via [samtools][smtl]).
|
||||
|
||||
Bwakit has two entry scripts: `run-gen-ref` which downloads and generates human
|
||||
reference genomes, and `run-bwamem` which prints mapping command lines on the
|
||||
standard output that can be piped to `sh` to execute. The two scripts will call
|
||||
other programs or use data in `bwa.kit`. The following shows an example about
|
||||
how to use bwakit:
|
||||
|
||||
```sh
|
||||
# Download the bwa-0.7.11 binary package (download link may change)
|
||||
wget -O- http://sourceforge.net/projects/bio-bwa/files/bwakit/bwakit-0.7.11_x64-linux.tar.bz2/download \
|
||||
| gzip -dc | tar xf -
|
||||
# Generate the GRCh38+ALT+decoy+HLA and create the BWA index
|
||||
bwa.kit/run-gen-ref hs38d6 # download GRCh38 and write hs38d6.fa
|
||||
bwa.kit/bwa index hs38d6.fa # create BWA index
|
||||
# mapping
|
||||
bwa.kit/run-bwamem -o out hs38d6.fa read1.fq read2.fq | sh
|
||||
```
|
||||
|
||||
The last mapping command line will generate the following files:
|
||||
|
||||
* `out.aln.bam`: unsorted alignments with ALT-aware mapping quality. In this
|
||||
file, one read may be placed on multiple overlapping ALT contigs at the same
|
||||
time even if the read is mapped better to some contigs than others. This makes
|
||||
it possible to analyze each contig independent of others.
|
||||
|
||||
* `out.hla.top`: best genotypes for HLA-A, -B, -C, -DQA1, -DQB1 and -DRB1 genes.
|
||||
|
||||
* `out.hla.all`: other possible genotypes on the six HLA genes.
|
||||
|
||||
* `out.log.*`: bwa-mem, samblaster and HLA typing log files.
|
||||
|
||||
Bwakit can be [downloaded here][res]. It is only available to x86_64-linux. The
|
||||
scripts in the package are available in the [bwa/bwakit][kit] directory.
|
||||
Packaging is done manually for now.
|
||||
|
||||
|
||||
[res]: https://sourceforge.net/projects/bio-bwa/files/
|
||||
[sb]: https://github.com/GregoryFaust/samblaster
|
||||
[ta]: https://github.com/lh3/seqtk/blob/master/trimadap.c
|
||||
[smtl]: http://www.htslib.org
|
||||
[kit]: https://github.com/lh3/bwa/tree/master/bwakit
|
||||
Loading…
Reference in New Issue