fast-bwa/bwakit
Heng Li 84024101fa minor fix to a link 2014-11-19 12:31:32 -05:00
..
README.md minor fix to a link 2014-11-19 12:31:32 -05:00
run-HLA renamed extras to bwakit for clarity 2014-11-19 10:37:03 -05:00
run-bwamem transfer read groups from the input BAM 2014-11-19 11:43:59 -05:00
run-gen-ref renamed extras to bwakit for clarity 2014-11-19 10:37:03 -05:00
typeHLA-selctg.js renamed extras to bwakit for clarity 2014-11-19 10:37:03 -05:00
typeHLA.js renamed extras to bwakit for clarity 2014-11-19 10:37:03 -05:00
typeHLA.sh renamed extras to bwakit for clarity 2014-11-19 10:37:03 -05:00

README.md

Bwakit is a self-consistent installation-free package of scripts and precompiled binaries which provide an end-to-end solution to read mapping. In addition to the basic mapping functionality implemented in bwa, bwakit is able to generate proper human reference genome and to take advantage of ALT contigs, if present, to improve read mapping and to perform HLA typing for high-coverage human data. It can remap name- or coordinate-sorted BAM with read group and barcode information retained. Bwakit also optionally trims adapters (via trimadap), marks duplicates (via samblaster) and sorts the final alignment (via samtools).

Bwakit has two entry scripts: run-gen-ref which downloads and generates human reference genomes, and run-bwamem which prints mapping command lines on the standard output that can be piped to sh to execute. The two scripts will call other programs or use data in bwa.kit. The following shows an example about how to use bwakit:

# Download the bwa-0.7.11 binary package (download link may change)
wget -O- http://sourceforge.net/projects/bio-bwa/files/bwakit/bwakit-0.7.11_x64-linux.tar.bz2/download \
  | gzip -dc | tar xf -
# Generate the GRCh38+ALT+decoy+HLA and create the BWA index
bwa.kit/run-gen-ref hs38d6   # download GRCh38 and write hs38d6.fa
bwa.kit/bwa index hs38d6.fa  # create BWA index
# mapping
bwa.kit/run-bwamem -o out hs38d6.fa read1.fq read2.fq | sh

The last mapping command line will generate the following files:

  • out.aln.bam: unsorted alignments with ALT-aware mapping quality. In this file, one read may be placed on multiple overlapping ALT contigs at the same time even if the read is mapped better to some contigs than others. This makes it possible to analyze each contig independent of others.

  • out.hla.top: best genotypes for HLA-A, -B, -C, -DQA1, -DQB1 and -DRB1 genes.

  • out.hla.all: other possible genotypes on the six HLA genes.

  • out.log.*: bwa-mem, samblaster and HLA typing log files.

Bwakit can be downloaded here. It is only available to x86_64-linux. The scripts in the package are available in the bwa/bwakit directory. Packaging is done manually for now.