**BWA-FastAlign** is a high-performance, cost-efficient software package for mapping low-divergent sequences against a large reference genome, such as the human genome.
It is designed as a drop-in replacement for the de facto standard **BWA-MEM**, offering **2.27×∼ 3.28× throughput speedup** and **2.54×∼ 5.65× cost reductions** on standard CPU servers, while guaranteeing **100% identical output** (SAM/BAM) to BWA-MEM.
***High Throughput:** Achieves ~2.85× average speedup over BWA-MEM by optimizing both the seeding and extension phases.
***Cost Efficient:** Delivers 2.54×∼ 5.65× cost reduction compared to state-of-the-art CPU and GPU baselines (including BWA-MEM2 and BWA-GPU).
***Identical Output:** Guarantees 100% output compatibility with BWA-MEM. You can swap it into your existing pipelines without changing downstream analysis results.
***Low Memory Footprint:** Uses a novel Multi-stage Seeding strategy (Hybrid Index) that improves search performance without the massive memory overhead seen in hash-based or learned-index aligners (e.g., ERT-BWA-MEM2).
***Optimized for Modern CPUs:** Features an Intra-query Parallel algorithm for the seed-extension phase, utilizing AVX2 instructions to eliminate computation bubbles caused by varying read lengths.
* Unlike BWA-MEM2 (which uses inter-query parallelism and suffers from load imbalance), BWA-FastAlign parallelizes the Smith-Waterman alignment *within* a single query.