|
|
||
|---|---|---|
| .vscode | ||
| bwakit | ||
| .gitignore | ||
| COPYING | ||
| ChangeLog | ||
| LICENSE | ||
| Makefile | ||
| NEWS.md | ||
| QSufSort.c | ||
| QSufSort.h | ||
| README-alt.md | ||
| README.md | ||
| bamlite.c | ||
| bamlite.h | ||
| bntseq.c | ||
| bntseq.h | ||
| bwa.1 | ||
| bwa.c | ||
| bwa.h | ||
| bwamem.c | ||
| bwamem.h | ||
| bwamem_extra.c | ||
| bwamem_pair.c | ||
| bwape.c | ||
| bwase.c | ||
| bwase.h | ||
| bwaseqio.c | ||
| bwashm.c | ||
| bwt.c | ||
| bwt.h | ||
| bwt_gen.c | ||
| bwt_lite.c | ||
| bwt_lite.h | ||
| bwtaln.c | ||
| bwtaln.h | ||
| bwtgap.c | ||
| bwtgap.h | ||
| bwtindex.c | ||
| bwtsw2.h | ||
| bwtsw2_aux.c | ||
| bwtsw2_chain.c | ||
| bwtsw2_core.c | ||
| bwtsw2_main.c | ||
| bwtsw2_pair.c | ||
| code_of_conduct.md | ||
| debug.c | ||
| debug.h | ||
| example.c | ||
| fastmap.c | ||
| fmt_idx.c | ||
| fmt_idx.h | ||
| h.txt | ||
| is.c | ||
| kbtree.h | ||
| khash.h | ||
| kopen.c | ||
| kseq.h | ||
| ksort.h | ||
| kstring.c | ||
| kstring.h | ||
| ksw.c | ||
| ksw.h | ||
| ksw_align_avx.h | ||
| ksw_align_avx2.c | ||
| ksw_align_avx512.c | ||
| ksw_extend2_avx2.c | ||
| ksw_extend2_avx2_u8.c | ||
| kthread.c | ||
| kvec.h | ||
| main.c | ||
| malloc_wrap.c | ||
| malloc_wrap.h | ||
| mate_sw.c | ||
| mate_sw.h | ||
| maxk.c | ||
| neon_sse.h | ||
| paired_sam.c | ||
| paired_sam.h | ||
| pemerge.c | ||
| profiling.c | ||
| profiling.h | ||
| qualfa2fq.pl | ||
| rle.c | ||
| rle.h | ||
| rope.c | ||
| rope.h | ||
| run.sh | ||
| scalar_sse.h | ||
| utils.c | ||
| utils.h | ||
| xa2multi.pl | ||
| yarn.c | ||
| yarn.h | ||
README.md
BWA-FastAlign: Faster and Cheaper Sequence Alignment on Commercial CPUs
BWA-FastAlign is a high-performance, cost-efficient software package for mapping low-divergent sequences against a large reference genome, such as the human genome.
It is designed as a drop-in replacement for the de facto standard BWA-MEM, offering 2.27× ∼ 3.28× throughput speedup and 2.54× ∼ 5.65× cost reductions on standard CPU servers, while guaranteeing 100% identical output (SAM/BAM) to BWA-MEM.
🚀 Key Features
- High Throughput: Achieves ~2.85× average speedup over BWA-MEM by optimizing both the seeding and extension phases.
- Cost Efficient: Delivers 2.54× ∼ 5.65× cost reduction compared to state-of-the-art CPU and GPU baselines (including BWA-MEM2 and BWA-GPU).
- Identical Output: Guarantees 100% output compatibility with BWA-MEM. You can swap it into your existing pipelines without changing downstream analysis results.
- Low Memory Footprint: Uses a novel Multi-stage Seeding strategy (Hybrid Index) that improves search performance without the massive memory overhead seen in hash-based or learned-index aligners (e.g., ERT-BWA-MEM2).
- Optimized for Modern CPUs: Features an Intra-query Parallel algorithm for the seed-extension phase, utilizing AVX2 instructions to eliminate computation bubbles caused by varying read lengths.
🔧 Technical Innovations
BWA-FastAlign revitalizes the traditional alignment pipeline with two core algorithmic contributions:
-
Multi-Stage Seeding (Hybrid Index)
- Combines Kmer-Index, FMT-Index (Enhanced FM-Index with prefetching), and Direct-Index.
- Dynamically switches strategies based on seed length and match density.
- Achieves an 18.92× improvement in memory efficiency (bases processed per GB per second).
-
Intra-Query Parallel Seed-Extension
- Unlike BWA-MEM2 (which uses inter-query parallelism and suffers from load imbalance), BWA-FastAlign parallelizes the Smith-Waterman alignment within a single query.
- Includes Dynamic Pruning to skip zero-alignment scores.
- Implements a Sliding Window mechanism to reduce costly memory gather operations.
- Achieves 3.45× higher SIMD utilization, performing consistently well on both WGS (Whole Genome Sequencing) and WES (Whole Exome Sequencing) data.
📥 Installation
Prerequisites
- Linux operating system (tested on Ubuntu 22.04).
- GCC compiler (version 11.4 or higher recommended).
- CPU supporting AVX2 instructions (most modern Intel/AMD CPUs).
- zlib development files.
Compilation
git clone https://github.com/your-username/BWA-FastAlign.git
cd BWA-FastAlign
make
📖 Usage
BWA-FastAlign follows the same command-line interface as BWA-MEM.
- Download Datasets. We download E.coli reference genome and sequencing reads.
# Download reference genome
wget http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/005/845/GCA_000005845.2_ASM584v2/GCA_000005845.2_ASM584v2_genomic.fna.gz
gzip -d GCA_000005845.2_ASM584v2_genomic.fna.gz
mv GCA_000005845.2_ASM584v2_genomic.fna ref.fasta
# Download sequencing reads
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/003/SRR2584863/SRR2584863_1.fastq.gz -O reads_1.fq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/003/SRR2584863/SRR2584863_2.fastq.gz -O reads_2.fq.gz
- Index the Reference. Before alignment, you must index your reference genome.
# This will generate the hybrid index files
./fastalign index ref.fa
- Align Reads (Mem). Map single-end or paired-end reads to the reference.
# Single-end alignment
./fastalign mem ref.fa reads.fq.gz > aln.sam
# Paired-end alignment
./fastalign mem ref.fa read1.fq.gz read2.fq.gz > aln.sam
# Using multiple threads (Recommended: 32-128 threads for high throughput)
./fastalign mem -t 64 ref.fa read1.fq.gz read2.fq.gz > aln.sam
- Options. BWA-FastAlign supports the standard BWA-MEM options. Run ./fastalign mem to see the full list.
📜 Citation
If you find BWA-FastAlign is useful in your research, please cite our paper:
@inproceedings{fastalign2026,
title={Faster and Cheaper: Pushing the Sequence Alignment Throughput with Commercial CPUs},
author={Zhonghai Zhang, Yewen Li, Ke Meng, Chunming Zhang, Guangming Tan},
booktitle={Proceedings of the 31st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '26)},
year={2026}
}