From 643832af098fbf8e4466dc034f34a0cb16e18465 Mon Sep 17 00:00:00 2001 From: Yewen Li Date: Thu, 11 Dec 2025 18:41:20 +0800 Subject: [PATCH] Revise README to enhance documentation and clarity Expanded README with detailed features, installation instructions, and usage examples for FastAlign. --- README.md | 82 +++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 74 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index e2dbe88..e9bac15 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,77 @@ -# FastAlign -Fast alignment tool based on bwa-mem +# FastAlign: Faster and Cheaper Sequence Alignment on Commercial CPUs -# 1. Compile the source code -make -j 8 +**FastAlign** is a high-performance, cost-efficient software package for mapping low-divergent sequences against a large reference genome, such as the human genome. -# 2. Build the FMT-Index -./fastalign index reference.fasta +It is designed as a drop-in replacement for the de facto standard **BWA-MEM**, offering **2.27× ∼ 3.28× throughput speedup** and **2.54× ∼ 5.65× cost reductions** on standard CPU servers, while guaranteeing **100% identical output** (SAM/BAM) to BWA-MEM. -# 3. Run sequence alignment with FastAlign -./fastalign mem -t 64 -2 -M -R @RG\\tID:normal\\tSM:normal\\tPL:illumina\\tLB:normal\\tPG:fastbwa reference.fasta r1.fq.gz r2.fq.gz +## 🚀 Key Features + +* **High Throughput:** Achieves ~2.85× average speedup over BWA-MEM by optimizing both the seeding and extension phases. +* **Cost Efficient:** Delivers 2.54× ∼ 5.65× cost reduction compared to state-of-the-art CPU and GPU baselines (including BWA-MEM2 and BWA-GPU). +* **Identical Output:** Guarantees 100% output compatibility with BWA-MEM. You can swap it into your existing pipelines without changing downstream analysis results. +* **Low Memory Footprint:** Uses a novel Multi-stage Seeding strategy (Hybrid Index) that improves search performance without the massive memory overhead seen in hash-based or learned-index aligners (e.g., ERT-BWA-MEM2). +* **Optimized for Modern CPUs:** Features an Intra-query Parallel algorithm for the seed-extension phase, utilizing AVX2 instructions to eliminate computation bubbles caused by varying read lengths. + +## 🔧 Technical Innovations + +FastAlign revitalizes the traditional alignment pipeline with two core algorithmic contributions: + +1. **Multi-Stage Seeding (Hybrid Index)** + * Combines **Kmer-Index**, **FMT-Index** (Enhanced FM-Index with prefetching), and **Direct-Index**. + * Dynamically switches strategies based on seed length and match density. + * Achieves an **18.92× improvement in memory efficiency** (bases processed per GB per second). + +2. **Intra-Query Parallel Seed-Extension** + * Unlike BWA-MEM2 (which uses inter-query parallelism and suffers from load imbalance), FastAlign parallelizes the Smith-Waterman alignment *within* a single query. + * Includes **Dynamic Pruning** to skip zero-alignment scores. + * Implements a **Sliding Window** mechanism to reduce costly memory gather operations. + * Achieves **3.45× higher SIMD utilization**, performing consistently well on both WGS (Whole Genome Sequencing) and WES (Whole Exome Sequencing) data. + +## 📥 Installation + +### Prerequisites +* Linux operating system (tested on Ubuntu 22.04). +* GCC compiler (version 11.4 or higher recommended). +* CPU supporting **AVX2** instructions (most modern Intel/AMD CPUs). +* zlib development files. + +### Compilation +```bash +git clone https://github.com/your-username/FastAlign.git +cd FastAlign +make +``` + +## 📖 Usage +FastAlign follows the same command-line interface as BWA-MEM. + +1. **Index the Reference.** Before alignment, you must index your reference genome (e.g., human_g1k_v37.fasta). +```bash +# This will generate the Hybrid Index files +./fastalign index ref.fa +``` + +2. **Align Reads (Mem).** Map single-end or paired-end reads to the reference. +```bash +# Single-end alignment +./fastalign mem ref.fa reads.fq > aln.sam + +# Paired-end alignment +./fastalign mem ref.fa read1.fq read2.fq > aln.sam + +# Using multiple threads (Recommended: 32-128 threads for high throughput) +./fastalign mem -t 64 ref.fa read1.fq read2.fq > aln.sam +``` + +3. **Options.** FastAlign supports the standard BWA-MEM options. Run ./fastalign mem to see the full list. + +## 📜 Citation +If you find FastAlign is useful in your research, please cite our paper: +```bibtex +@inproceedings{fastalign2026, + title={Faster and Cheaper: Pushing the Sequence Alignment Throughput with Commercial CPUs}, + author={Zhonghai Zhang, Yewen Li, Ke Meng, Chunming Zhang, Guangming Tan}, + booktitle={Proceedings of the 31st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '26)}, + year={2026} +} +```