结合fmt-index，pattern-based sw优化，低内存占用，高性能的序列比对程序，结果与bwa-mem一致

Go to file

zzh e701805337 处理read seq的潜在的bug		2026-01-25 23:13:49 +08:00
.vscode	改了几个bug，比如gen_sam需要全局id，dedup-sort需要一个一个做，现在还有一个问题，就是dedup-sort之后有些skip结果会变	2026-01-16 22:16:16 +08:00
bwakit	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
.gitignore	改了几个bug，比如gen_sam需要全局id，dedup-sort需要一个一个做，现在还有一个问题，就是dedup-sort之后有些skip结果会变	2026-01-16 22:16:16 +08:00
COPYING	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
ChangeLog	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
LICENSE	Initial commit	2025-01-14 14:28:47 +08:00
Makefile	处理read seq的潜在的bug	2026-01-25 23:13:49 +08:00
NEWS.md	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
QSufSort.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
QSufSort.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
README-alt.md	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
README.md	Add test datasets in README	2026-01-07 13:57:08 +08:00
bamlite.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bamlite.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bntseq.c	粗糙的完成了mate sw计算，应该还有bug	2026-01-12 02:03:29 +08:00
bntseq.h	粗糙的完成了mate sw计算，应该还有bug	2026-01-12 02:03:29 +08:00
bwa.1	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwa.c	处理read seq的潜在的bug	2026-01-25 23:13:49 +08:00
bwa.h	处理read seq的潜在的bug	2026-01-25 23:13:49 +08:00
bwamem.c	处理read seq的潜在的bug	2026-01-25 23:13:49 +08:00
bwamem.h	完成了matesw的代码，但是还有bug，结果不一致	2026-01-13 23:37:06 +08:00
bwamem_extra.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwamem_pair.c	重构matesw计算，完成了一半的计算	2026-01-11 12:55:11 +08:00
bwape.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwase.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwase.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwaseqio.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwashm.c	重构matesw计算，完成了一半的计算	2026-01-11 12:55:11 +08:00
bwt.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwt.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwt_gen.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwt_lite.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwt_lite.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwtaln.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwtaln.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwtgap.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwtgap.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwtindex.c	重构matesw计算，完成了一半的计算	2026-01-11 12:55:11 +08:00
bwtsw2.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwtsw2_aux.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwtsw2_chain.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwtsw2_core.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwtsw2_main.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
bwtsw2_pair.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
code_of_conduct.md	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
debug.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
debug.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
example.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
fastmap.c	处理read seq的潜在的bug	2026-01-25 23:13:49 +08:00
fmt_idx.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
fmt_idx.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
h.txt	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
is.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
kbtree.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
khash.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
kopen.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
kseq.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
ksort.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
kstring.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
kstring.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
ksw.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
ksw.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
ksw_align_avx.h	完成了matesw的代码，但是还有bug，结果不一致	2026-01-13 23:37:06 +08:00
ksw_align_avx2.c	重构matesw计算，完成了一半的计算	2026-01-11 12:55:11 +08:00
ksw_align_avx512.c	完成了matesw的代码，但是还有bug，结果不一致	2026-01-13 23:37:06 +08:00
ksw_extend2_avx2.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
ksw_extend2_avx2_u8.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
kthread.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
kvec.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
main.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
malloc_wrap.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
malloc_wrap.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
mate_sw.c	清理了一下代码，优雅的处理u8和i16	2026-01-18 01:59:42 +08:00
mate_sw.h	清理了一下代码，优雅的处理u8和i16	2026-01-18 01:59:42 +08:00
maxk.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
neon_sse.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
paired_sam.c	处理read seq的潜在的bug	2026-01-25 23:13:49 +08:00
paired_sam.h	处理read seq的潜在的bug	2026-01-25 23:13:49 +08:00
pemerge.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
profiling.c	修正了一个bug，是xtra赋值搞错了，第二阶段应该赋值给xtras2	2026-01-15 23:04:02 +08:00
profiling.h	修正了一个bug，是xtra赋值搞错了，第二阶段应该赋值给xtras2	2026-01-15 23:04:02 +08:00
qualfa2fq.pl	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
rle.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
rle.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
rope.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
rope.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
run.sh	处理read seq的潜在的bug	2026-01-25 23:13:49 +08:00
scalar_sse.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
utils.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
utils.h	完成了matesw的代码，但是还有bug，结果不一致	2026-01-13 23:37:06 +08:00
xa2multi.pl	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
yarn.c	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00
yarn.h	Fast NGS sequence alignment tool based on bwa-mem.	2025-01-14 14:34:09 +08:00

README.md

BWA-FastAlign: Faster and Cheaper Sequence Alignment on Commercial CPUs

BWA-FastAlign is a high-performance, cost-efficient software package for mapping low-divergent sequences against a large reference genome, such as the human genome.

It is designed as a drop-in replacement for the de facto standard BWA-MEM, offering 2.27× ∼ 3.28× throughput speedup and 2.54× ∼ 5.65× cost reductions on standard CPU servers, while guaranteeing 100% identical output (SAM/BAM) to BWA-MEM.

🚀 Key Features

High Throughput: Achieves ~2.85× average speedup over BWA-MEM by optimizing both the seeding and extension phases.
Cost Efficient: Delivers 2.54× ∼ 5.65× cost reduction compared to state-of-the-art CPU and GPU baselines (including BWA-MEM2 and BWA-GPU).
Identical Output: Guarantees 100% output compatibility with BWA-MEM. You can swap it into your existing pipelines without changing downstream analysis results.
Low Memory Footprint: Uses a novel Multi-stage Seeding strategy (Hybrid Index) that improves search performance without the massive memory overhead seen in hash-based or learned-index aligners (e.g., ERT-BWA-MEM2).
Optimized for Modern CPUs: Features an Intra-query Parallel algorithm for the seed-extension phase, utilizing AVX2 instructions to eliminate computation bubbles caused by varying read lengths.

🔧 Technical Innovations

BWA-FastAlign revitalizes the traditional alignment pipeline with two core algorithmic contributions:

Multi-Stage Seeding (Hybrid Index)
- Combines Kmer-Index, FMT-Index (Enhanced FM-Index with prefetching), and Direct-Index.
- Dynamically switches strategies based on seed length and match density.
- Achieves an 18.92× improvement in memory efficiency (bases processed per GB per second).
Intra-Query Parallel Seed-Extension
- Unlike BWA-MEM2 (which uses inter-query parallelism and suffers from load imbalance), BWA-FastAlign parallelizes the Smith-Waterman alignment within a single query.
- Includes Dynamic Pruning to skip zero-alignment scores.
- Implements a Sliding Window mechanism to reduce costly memory gather operations.
- Achieves 3.45× higher SIMD utilization, performing consistently well on both WGS (Whole Genome Sequencing) and WES (Whole Exome Sequencing) data.

📥 Installation

Prerequisites

Linux operating system (tested on Ubuntu 22.04).
GCC compiler (version 11.4 or higher recommended).
CPU supporting AVX2 instructions (most modern Intel/AMD CPUs).
zlib development files.

Compilation

git clone https://github.com/your-username/BWA-FastAlign.git
cd BWA-FastAlign
make

📖 Usage

BWA-FastAlign follows the same command-line interface as BWA-MEM.

Download Datasets. We download E.coli reference genome and sequencing reads.

# Download reference genome
wget http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/005/845/GCA_000005845.2_ASM584v2/GCA_000005845.2_ASM584v2_genomic.fna.gz
gzip -d GCA_000005845.2_ASM584v2_genomic.fna.gz
mv GCA_000005845.2_ASM584v2_genomic.fna ref.fasta

# Download sequencing reads
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/003/SRR2584863/SRR2584863_1.fastq.gz -O reads_1.fq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR258/003/SRR2584863/SRR2584863_2.fastq.gz -O reads_2.fq.gz

Index the Reference. Before alignment, you must index your reference genome.

# This will generate the hybrid index files
./fastalign index ref.fa

Align Reads (Mem). Map single-end or paired-end reads to the reference.

# Single-end alignment
./fastalign mem ref.fa reads.fq.gz > aln.sam

# Paired-end alignment
./fastalign mem ref.fa read1.fq.gz read2.fq.gz > aln.sam

# Using multiple threads (Recommended: 32-128 threads for high throughput)
./fastalign mem -t 64 ref.fa read1.fq.gz read2.fq.gz > aln.sam

Options. BWA-FastAlign supports the standard BWA-MEM options. Run ./fastalign mem to see the full list.

📜 Citation

If you find BWA-FastAlign is useful in your research, please cite our paper:

@inproceedings{fastalign2026,
  title={Faster and Cheaper: Pushing the Sequence Alignment Throughput with Commercial CPUs},
  author={Zhonghai Zhang, Yewen Li, Ke Meng, Chunming Zhang, Guangming Tan},
  booktitle={Proceedings of the 31st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '26)},
  year={2026}
}

README.md Unescape Escape