BWA-FastAlign/README.md

90 lines
4.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# BWA-FastAlign: Faster and Cheaper Sequence Alignment on Commercial CPUs
**BWA-FastAlign** is a high-performance, cost-efficient software package for mapping low-divergent sequences against a large reference genome, such as the human genome.
It is designed as a drop-in replacement for the de facto standard **BWA-MEM**, offering **2.27× 3.28× throughput speedup** and **2.54× 5.65× cost reductions** on standard CPU servers, while guaranteeing **100% identical output** (SAM/BAM) to BWA-MEM.
## 🚀 Key Features
* **High Throughput:** Achieves ~2.85× average speedup over BWA-MEM by optimizing both the seeding and extension phases.
* **Cost Efficient:** Delivers 2.54× 5.65× cost reduction compared to state-of-the-art CPU and GPU baselines (including BWA-MEM2 and BWA-GPU).
* **Identical Output:** Guarantees 100% output compatibility with BWA-MEM. You can swap it into your existing pipelines without changing downstream analysis results.
* **Low Memory Footprint:** Uses a novel Multi-stage Seeding strategy (Hybrid Index) that improves search performance without the massive memory overhead seen in hash-based or learned-index aligners (e.g., ERT-BWA-MEM2).
* **Optimized for Modern CPUs:** Features an Intra-query Parallel algorithm for the seed-extension phase, utilizing AVX2 instructions to eliminate computation bubbles caused by varying read lengths.
## 🔧 Technical Innovations
BWA-FastAlign revitalizes the traditional alignment pipeline with two core algorithmic contributions:
1. **Multi-Stage Seeding (Hybrid Index)**
* Combines **Kmer-Index**, **FMT-Index** (Enhanced FM-Index with prefetching), and **Direct-Index**.
* Dynamically switches strategies based on seed length and match density.
* Achieves an **18.92× improvement in memory efficiency** (bases processed per GB per second).
2. **Intra-Query Parallel Seed-Extension**
* Unlike BWA-MEM2 (which uses inter-query parallelism and suffers from load imbalance), BWA-FastAlign parallelizes the Smith-Waterman alignment *within* a single query.
* Includes **Dynamic Pruning** to skip zero-alignment scores.
* Implements a **Sliding Window** mechanism to reduce costly memory gather operations.
* Achieves **3.45× higher SIMD utilization**, performing consistently well on both WGS (Whole Genome Sequencing) and WES (Whole Exome Sequencing) data.
## 📥 Installation
### Prerequisites
* Linux operating system (tested on Ubuntu 22.04).
* GCC compiler (version 11.4 or higher recommended).
* CPU supporting **AVX2** instructions (most modern Intel/AMD CPUs).
* zlib development files.
### Compilation
```bash
git clone https://github.com/your-username/BWA-FastAlign.git
cd BWA-FastAlign
make
```
## 📖 Usage
BWA-FastAlign follows the same command-line interface as BWA-MEM.
0. **Download Datasets.** We download E.coli reference genome and sequencing reads.
```bash
# Download reference genome
wget http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/005/845/GCA_000005845.2_ASM584v2/GCA_000005845.2_ASM584v2_genomic.fna.gz
gzip -d GCA_000005845.2_ASM584v2_genomic.fna.gz
mv GCA_000005845.2_ASM584v2_genomic.fna ref.fasta
# Download sequencing reads
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/003/SRR2584863/SRR2584863_1.fastq.gz -O reads_1.fq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/003/SRR2584863/SRR2584863_2.fastq.gz -O reads_2.fq.gz
```
1. **Index the Reference.** Before alignment, you must index your reference genome.
```bash
# This will generate the hybrid index files
./fastalign index ref.fa
```
2. **Align Reads (Mem).** Map single-end or paired-end reads to the reference.
```bash
# Single-end alignment
./fastalign mem ref.fa reads.fq.gz > aln.sam
# Paired-end alignment
./fastalign mem ref.fa read1.fq.gz read2.fq.gz > aln.sam
# Using multiple threads (Recommended: 32-128 threads for high throughput)
./fastalign mem -t 64 ref.fa read1.fq.gz read2.fq.gz > aln.sam
```
3. **Options.** BWA-FastAlign supports the standard BWA-MEM options. Run ./fastalign mem to see the full list.
## 📜 Citation
If you find BWA-FastAlign is useful in your research, please cite our paper:
```bibtex
@inproceedings{fastalign2026,
title={Faster and Cheaper: Pushing the Sequence Alignment Throughput with Commercial CPUs},
author={Zhonghai Zhang, Yewen Li, Ke Meng, Chunming Zhang, Guangming Tan},
booktitle={Proceedings of the 31st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '26)},
year={2026}
}
```