From 1993cb634de19e156bd34bb559ec7272c083bba5 Mon Sep 17 00:00:00 2001 From: Heng Li Date: Wed, 19 Nov 2014 12:29:35 -0500 Subject: [PATCH] Split README-alt.md into two documentations --- README-alt.md | 26 +++++++------------------ bwakit/README.md | 50 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+), 19 deletions(-) create mode 100644 bwakit/README.md diff --git a/README-alt.md b/README-alt.md index 7d3a3c7..3460cdd 100644 --- a/README-alt.md +++ b/README-alt.md @@ -1,7 +1,7 @@ -## Getting Started +## For the Impatient ```sh -# Download the bwa-0.7.11 binary package +# Download the bwa-0.7.11 binary package (download link may change) wget -O- http://sourceforge.net/projects/bio-bwa/files/bwakit-0.7.11_x64-linux.tar.bz2/download \ | gzip -dc | tar xf - # Generate the GRCh38+ALT+decoy+HLA and create the BWA index @@ -11,22 +11,9 @@ bwa.kit/bwa index hs38d6.fa # create BWA index bwa.kit/run-bwamem -o out hs38d6.fa read1.fq read2.fq | sh # skip "|sh" to show command lines ``` -This will generate the following files: - -* `out.aln.bam`: unsorted alignments with ALT-aware mapping quality. In this - file, one read may be placed on multiple overlapping ALT contigs at the same - time even if the read is mapped better to some contigs than others. This makes - it possible to analyze each contig independent of others. - -* `out.hla.top`: best genotypes for HLA-A, -B, -C, -DQA1, -DQB1 and -DRB1 genes. - -* `out.hla.all`: other possible genotypes on the six HLA genes. - -* `out.log.*`: bwa-mem, samblaster and HLA typing log files. - -Note that `run-bwamem` only prints command lines but doesn't execute them. It -is advised to have a look at the command lines before passing them to `sh` for -actual execution. +This generates `out.aln.bam` as the final alignment, `out.hla.top` for best HLA +genotypes on each gene and `out.hla.all` for other possible HLA genotypes. +Please check out [bwa/bwakit/README.md][kithelp] for details. ## Background @@ -57,7 +44,7 @@ postprocessing. The `bwa.kit/run-bwamem` script performs the two steps when ALT contigs are present. The following picture shows an example about how BWA-MEM infers mapping quality and reports alignment after step 2: -![](https://raw.githubusercontent.com/lh3/bwa/dev/extras/alt-demo.png) +![](http://lh3lh3.users.sourceforge.net/images/alt-demo.png) #### Step 1: BWA-MEM mapping @@ -189,3 +176,4 @@ can even get rid of ALT contigs for good. [hla2]: http://nar.oxfordjournals.org/content/41/14/e142.full.pdf+html [hla3]: http://www.biomedcentral.com/1471-2164/15/325 [hla4]: http://genomemedicine.com/content/4/12/95 +[kithelp]: https://github.com/lh3/bwa/tree/master/bwakit diff --git a/bwakit/README.md b/bwakit/README.md new file mode 100644 index 0000000..4430ffb --- /dev/null +++ b/bwakit/README.md @@ -0,0 +1,50 @@ +Bwakit is a self-consistent installation-free package of scripts and precompiled +binaries which provide an end-to-end solution to read mapping. In addition to +the basic mapping functionality implemented in bwa, bwakit is able to generate +proper human reference genome and to take advantage of ALT contigs, if present, +to improve read mapping and to perform HLA typing for high-coverage human data. +It can remap name- or coordinate-sorted BAM with read group and barcode +information retained. Bwakit also *optionally* trims adapters (via +[trimadap][ta]), marks duplicates (via [samblaster][sb]) and sorts the final +alignment (via [samtools][smtl]). + +Bwakit has two entry scripts: `run-gen-ref` which downloads and generates human +reference genomes, and `run-bwamem` which prints mapping command lines on the +standard output that can be piped to `sh` to execute. The two scripts will call +other programs or use data in `bwa.kit`. The following shows an example about +how to use bwakit: + +```sh +# Download the bwa-0.7.11 binary package (download link may change) +wget -O- http://sourceforge.net/projects/bio-bwa/files/bwakit/bwakit-0.7.11_x64-linux.tar.bz2/download \ + | gzip -dc | tar xf - +# Generate the GRCh38+ALT+decoy+HLA and create the BWA index +bwa.kit/run-gen-ref hs38d6 # download GRCh38 and write hs38d6.fa +bwa.kit/bwa index hs38d6.fa # create BWA index +# mapping +bwa.kit/run-bwamem -o out hs38d6.fa read1.fq read2.fq | sh +``` + +The last mapping command line will generate the following files: + +* `out.aln.bam`: unsorted alignments with ALT-aware mapping quality. In this + file, one read may be placed on multiple overlapping ALT contigs at the same + time even if the read is mapped better to some contigs than others. This makes + it possible to analyze each contig independent of others. + +* `out.hla.top`: best genotypes for HLA-A, -B, -C, -DQA1, -DQB1 and -DRB1 genes. + +* `out.hla.all`: other possible genotypes on the six HLA genes. + +* `out.log.*`: bwa-mem, samblaster and HLA typing log files. + +Bwakit can be [downloaded here][res]. It is only available to x86_64-linux. The +scripts in the package are available in the [bwa/bwakit][kit] directory. +Packaging is done manually for now. + + +[res]: https://sourceforge.net/projects/bio-bwa/files/ +[sb]: https://github.com/GregoryFaust/samblaster +[ta]: https://github.com/lh3/seqtk/blob/master/trimadap.c +[smtl]: http://www.htslib.org +[kit]: https://github.com/lh3/bwa/tree/master/bwakit