updated README
This commit is contained in:
parent
599e840779
commit
607e11d43d
36
README
36
README
|
|
@ -1,36 +0,0 @@
|
||||||
Released packages can be downloaded from SourceForge.net:
|
|
||||||
|
|
||||||
http://sourceforge.net/projects/bio-bwa/files/
|
|
||||||
|
|
||||||
Introduction and FAQ are available at:
|
|
||||||
|
|
||||||
http://bio-bwa.sourceforge.net
|
|
||||||
|
|
||||||
Manual page at:
|
|
||||||
|
|
||||||
http://bio-bwa.sourceforge.net/bwa.shtml
|
|
||||||
|
|
||||||
Mailing list:
|
|
||||||
|
|
||||||
bio-bwa-help@lists.sourceforge.net
|
|
||||||
|
|
||||||
To sign up:
|
|
||||||
|
|
||||||
http://sourceforge.net/mail/?group_id=276243
|
|
||||||
|
|
||||||
Publications (Open Access):
|
|
||||||
|
|
||||||
http://www.ncbi.nlm.nih.gov/pubmed/20080505
|
|
||||||
http://www.ncbi.nlm.nih.gov/pubmed/19451168
|
|
||||||
|
|
||||||
Incomplete list of citations (via HubMed.org):
|
|
||||||
|
|
||||||
http://www.hubmed.org/references.cgi?uids=20080505
|
|
||||||
http://www.hubmed.org/references.cgi?uids=19451168
|
|
||||||
|
|
||||||
Related projects:
|
|
||||||
|
|
||||||
http://pbwa.sourceforge.net/
|
|
||||||
http://www.many-core.group.cam.ac.uk/projects/lam.shtml
|
|
||||||
http://biodoop-seal.sourceforge.net/
|
|
||||||
http://gitorious.org/bwa-cuda
|
|
||||||
|
|
@ -0,0 +1,73 @@
|
||||||
|
###Getting started
|
||||||
|
|
||||||
|
git clone https://github.com/lh3/bwa.git
|
||||||
|
cd bwa; make
|
||||||
|
./bwa index ref.fa
|
||||||
|
./bwa mem ref.fa read-se.fq.gz | gzip -3 > aln-se.sam.gz
|
||||||
|
./bwa mem ref.fa read1.fq read2.fq | gzip -3 > aln-pe.sam.gz
|
||||||
|
|
||||||
|
###Introduction
|
||||||
|
|
||||||
|
BWA is a software package for mapping low-divergent sequences against a large
|
||||||
|
reference genome, such as the human genome. It consists of three algorithms:
|
||||||
|
BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina
|
||||||
|
sequence reads up to 100bp, while the rest two for longer sequences ranged from
|
||||||
|
70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as the support of
|
||||||
|
long reads and chimeric alignment, but BWA-MEM, which is the latest, is
|
||||||
|
generally recommended for high-quality queries as it is faster and more
|
||||||
|
accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp
|
||||||
|
Illumina reads.
|
||||||
|
|
||||||
|
For all the algorithms, BWA first needs to construct the FM-index for the
|
||||||
|
reference genome (the **index** command). Alignment algorithms are invoked with
|
||||||
|
different sub-commands: **aln**/**samse**/**sampe** for BWA-backtrack,
|
||||||
|
**bwasw** for BWA-SW and **mem** for the BWA-MEM algorithm.
|
||||||
|
|
||||||
|
###Availability
|
||||||
|
|
||||||
|
BWA is released under [GPLv3][1]. The latest souce code is [freely
|
||||||
|
available][2] at github. Released packages can [be downloaded ][3] at
|
||||||
|
SourceForge. After you acquire the source code, simply use `make` to compile
|
||||||
|
and copy the single executable `bwa` to the destination you want.
|
||||||
|
|
||||||
|
###Seeking helps
|
||||||
|
|
||||||
|
The detailed usage is described in the man page available together with the
|
||||||
|
source code. You can use `man ./bwa.1` to view the man page in a terminal. The
|
||||||
|
[HTML version][4] of the man page can be found at the [BWA website][5]. If you
|
||||||
|
have questions about BWA, you may [sign up the mailing list][6] and then send
|
||||||
|
the questions to [bio-bwa-help@sourceforge.net][7]. You may also ask questions
|
||||||
|
in forums such as [BioStar][8] and [SEQanswers][9].
|
||||||
|
|
||||||
|
###Citing BWA
|
||||||
|
|
||||||
|
* Li H. and Durbin R. (2009) Fast and accurate short read alignment with
|
||||||
|
Burrows-Wheeler transform. *Bioinformatics*, **25**, 1754-1760. [PMID:
|
||||||
|
[19451168][10]]. (if you use the BWA-backtrack algorithm)
|
||||||
|
|
||||||
|
* Li H. and Durbin R. (2010) Fast and accurate long-read alignment with
|
||||||
|
Burrows-Wheeler transform. *Bioinformatics*, **26**, 589-595. [PMID:
|
||||||
|
[20080505][11]]. (if you use the BWA-SW algorithm)
|
||||||
|
|
||||||
|
* Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs
|
||||||
|
with BWA-MEM. [arXiv:1303.3997v1][12] [q-bio.GN]. (if you use the BWA-MEM
|
||||||
|
algorithm or the **fastmap** command)
|
||||||
|
|
||||||
|
Please note that the last reference is a preprint hosted at [arXiv.org][13]. I
|
||||||
|
do not have plan to submit it to a peer-reviewed journal in the near future.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
[1]: http://en.wikipedia.org/wiki/GNU_General_Public_License
|
||||||
|
[2]: https://github.com/lh3/bwa
|
||||||
|
[3]: http://sourceforge.net/projects/bio-bwa/files/
|
||||||
|
[4]: http://bio-bwa.sourceforge.net/bwa.shtml
|
||||||
|
[5]: http://bio-bwa.sourceforge.net/
|
||||||
|
[6]: https://lists.sourceforge.net/lists/listinfo/bio-bwa-help
|
||||||
|
[7]: mailto:bio-bwa-help@sourceforge.net
|
||||||
|
[8]: http://biostars.org
|
||||||
|
[9]: http://seqanswers.com/
|
||||||
|
[10]: http://www.ncbi.nlm.nih.gov/pubmed/19451168
|
||||||
|
[11]: http://www.ncbi.nlm.nih.gov/pubmed/20080505
|
||||||
|
[12]: http://arxiv.org/abs/1303.3997
|
||||||
|
[13]: http://arxiv.org/
|
||||||
11
bwa.1
11
bwa.1
|
|
@ -1,4 +1,4 @@
|
||||||
.TH bwa 1 "23 April 2013" "bwa-0.7.4" "Bioinformatics tools"
|
.TH bwa 1 "24 May 2013" "bwa-0.7.5" "Bioinformatics tools"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
.PP
|
.PP
|
||||||
bwa - Burrows-Wheeler Alignment Tool
|
bwa - Burrows-Wheeler Alignment Tool
|
||||||
|
|
@ -718,12 +718,13 @@ If you use the BWA-SW algorithm, please cite:
|
||||||
Li H. and Durbin R. (2010) Fast and accurate long-read alignment with
|
Li H. and Durbin R. (2010) Fast and accurate long-read alignment with
|
||||||
Burrows-Wheeler transform. Bioinformatics, 26, 589-595. [PMID: 20080505]
|
Burrows-Wheeler transform. Bioinformatics, 26, 589-595. [PMID: 20080505]
|
||||||
.PP
|
.PP
|
||||||
If you use the fastmap component of BWA, please cite:
|
If you use BWA-MEM or the fastmap component of BWA, please cite:
|
||||||
.PP
|
.PP
|
||||||
Li H. (2012) Exploring single-sample SNP and INDEL calling with whole-genome de
|
Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with
|
||||||
novo assembly. Bioinformatics, 28, 1838-1844. [PMID: 22569178]
|
BWA-MEM. arXiv:1303.3997v1 [q-bio.GN].
|
||||||
.PP
|
.PP
|
||||||
The BWA-MEM algorithm has not been published yet.
|
It is likely that the BWA-MEM manuscript will not appear in a peer-reviewed
|
||||||
|
journal.
|
||||||
|
|
||||||
.SH HISTORY
|
.SH HISTORY
|
||||||
BWA is largely influenced by BWT-SW. It uses source codes from BWT-SW
|
BWA is largely influenced by BWT-SW. It uses source codes from BWT-SW
|
||||||
|
|
|
||||||
21
bwase.c
21
bwase.c
|
|
@ -167,20 +167,29 @@ void bwa_cal_pac_pos(const bntseq_t *bns, const char *prefix, int n_seqs, bwa_se
|
||||||
|
|
||||||
#define SW_BW 50
|
#define SW_BW 50
|
||||||
|
|
||||||
bwa_cigar_t *bwa_refine_gapped_core(bwtint_t l_pac, const ubyte_t *pacseq, int len, ubyte_t *seq, int ref_shift, bwtint_t rb, int *n_cigar)
|
bwa_cigar_t *bwa_refine_gapped_core(bwtint_t l_pac, const ubyte_t *pacseq, int len, ubyte_t *seq, int ref_shift, bwtint_t *_rb, int *n_cigar)
|
||||||
{
|
{
|
||||||
bwa_cigar_t *cigar = 0;
|
bwa_cigar_t *cigar = 0;
|
||||||
uint32_t *cigar32 = 0;
|
uint32_t *cigar32 = 0;
|
||||||
ubyte_t *rseq;
|
ubyte_t *rseq;
|
||||||
int64_t k, re, rlen;
|
int64_t k, rb, re, rlen;
|
||||||
int8_t mat[25];
|
int8_t mat[25];
|
||||||
|
|
||||||
bwa_fill_scmat(1, 3, mat);
|
bwa_fill_scmat(1, 3, mat);
|
||||||
re = rb + len + ref_shift;
|
rb = *_rb; re = rb + len + ref_shift;
|
||||||
assert(re <= l_pac);
|
assert(re <= l_pac);
|
||||||
rseq = bns_get_seq(l_pac, pacseq, rb, re, &rlen);
|
rseq = bns_get_seq(l_pac, pacseq, rb, re, &rlen);
|
||||||
assert(re - rb == rlen);
|
assert(re - rb == rlen);
|
||||||
ksw_global(len, seq, rlen, rseq, 5, mat, 5, 1, SW_BW, n_cigar, &cigar32); // right extension
|
ksw_global(len, seq, rlen, rseq, 5, mat, 5, 1, SW_BW, n_cigar, &cigar32);
|
||||||
|
assert(*n_cigar > 0);
|
||||||
|
if ((cigar32[*n_cigar - 1]&0xf) == 1) cigar32[*n_cigar - 1] = (cigar32[*n_cigar - 1]>>4<<4) | 4; // change endding ins to soft clipping
|
||||||
|
if ((cigar32[0]&0xf) == 1) cigar32[0] = (cigar32[0]>>4<<4) | 4; // change beginning ins to soft clipping
|
||||||
|
if ((cigar32[*n_cigar - 1]&0xf) == 2) --*n_cigar; // delete endding del
|
||||||
|
if ((cigar32[0]&0xf) == 2) { // delete beginning del
|
||||||
|
*_rb += cigar32[0]>>4;
|
||||||
|
--*n_cigar;
|
||||||
|
memmove(cigar32, cigar32+1, (*n_cigar) * 4);
|
||||||
|
}
|
||||||
cigar = (bwa_cigar_t*)cigar32;
|
cigar = (bwa_cigar_t*)cigar32;
|
||||||
for (k = 0; k < *n_cigar; ++k)
|
for (k = 0; k < *n_cigar; ++k)
|
||||||
cigar[k] = __cigar_create((cigar32[k]&0xf), (cigar32[k]>>4));
|
cigar[k] = __cigar_create((cigar32[k]&0xf), (cigar32[k]>>4));
|
||||||
|
|
@ -292,14 +301,14 @@ void bwa_refine_gapped(const bntseq_t *bns, int n_seqs, bwa_seq_t *seqs, ubyte_t
|
||||||
bwt_multi1_t *q = s->multi + j;
|
bwt_multi1_t *q = s->multi + j;
|
||||||
int n_cigar;
|
int n_cigar;
|
||||||
if (q->gap) { // gapped alignment
|
if (q->gap) { // gapped alignment
|
||||||
q->cigar = bwa_refine_gapped_core(bns->l_pac, pacseq, s->len, q->strand? s->rseq : s->seq, q->ref_shift, q->pos, &n_cigar);
|
q->cigar = bwa_refine_gapped_core(bns->l_pac, pacseq, s->len, q->strand? s->rseq : s->seq, q->ref_shift, &q->pos, &n_cigar);
|
||||||
q->n_cigar = n_cigar;
|
q->n_cigar = n_cigar;
|
||||||
if (q->cigar) s->multi[k++] = *q;
|
if (q->cigar) s->multi[k++] = *q;
|
||||||
} else s->multi[k++] = *q;
|
} else s->multi[k++] = *q;
|
||||||
}
|
}
|
||||||
s->n_multi = k; // this squeezes out gapped alignments which failed the CIGAR generation
|
s->n_multi = k; // this squeezes out gapped alignments which failed the CIGAR generation
|
||||||
if (s->type == BWA_TYPE_NO_MATCH || s->type == BWA_TYPE_MATESW || s->n_gapo == 0) continue;
|
if (s->type == BWA_TYPE_NO_MATCH || s->type == BWA_TYPE_MATESW || s->n_gapo == 0) continue;
|
||||||
s->cigar = bwa_refine_gapped_core(bns->l_pac, pacseq, s->len, s->strand? s->rseq : s->seq, s->ref_shift, s->pos, &s->n_cigar);
|
s->cigar = bwa_refine_gapped_core(bns->l_pac, pacseq, s->len, s->strand? s->rseq : s->seq, s->ref_shift, &s->pos, &s->n_cigar);
|
||||||
if (s->cigar == 0) s->type = BWA_TYPE_NO_MATCH;
|
if (s->cigar == 0) s->type = BWA_TYPE_NO_MATCH;
|
||||||
}
|
}
|
||||||
// generate MD tag
|
// generate MD tag
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue