updated documentations

(prepare for release)
This commit is contained in:
Heng Li 2014-12-21 16:30:29 -05:00
parent c05a721f28
commit ed95769a33
3 changed files with 58 additions and 14 deletions

20
NEWS.md
View File

@ -11,25 +11,27 @@ For general uses, the single BWA binary still works like the old way.
Another major addition to BWA-MEM is HLA typing, which made possible with the
new ALT mapping strategy. Necessary data and programs are included in the
binary release. The wrapper script also performs HLA typing when HLA genes are
included in the reference genome as additional ALT contigs.
binary release. The wrapper script also optionally performs HLA typing when HLA
genes are included in the reference genome as additional ALT contigs.
Other notable changes to BWA-MEM:
* Added option `-b` to `bwa index`. This option tunes the batch size used in
the construction of BWT. It is advised to use large `-b` for huge reference
sequences such as the *nt* database.
sequences such as the BLAST *nt* database.
* Optimized for PacBio data. This includes a change to the scoring based on a
mini-study done by Aaron Quinlan and a heuristic speedup. Further speedup is
* Optimized for PacBio data. This includes a change to scoring based on a
study done by Aaron Quinlan and a heuristic speedup. Further speedup is
possible, but needs more careful investigation.
* Dropped PacBio read-to-read alignment for now. BWA-MEM is only good at
finding the best hit, not all hits. Option `-x pbread` is still available,
but hidden on the command line.
* Dropped PacBio read-to-read alignment for now. BWA-MEM is good for finding
the best hit, but is not very sensitive to suboptimal hits. Option `-x pbread`
is still available, but hidden on the command line. This may be removed in
future releases.
* Added a new pre-setting for Oxford Nanopore 2D reads. LAST is still a little
more sensitive on bacterial data, but bwa-mem is times faster on human data.
more sensitive on older bacterial data, but bwa-mem is as good on more
recent data and is times faster for mapping against mammalian genomes.
* Added LAST-like seeding. This improves the accuracy for longer reads.

50
bwa.1
View File

@ -1,4 +1,4 @@
.TH bwa 1 "18 November 2014" "bwa-0.7.11-r999" "Bioinformatics tools"
.TH bwa 1 "21 December 2014" "bwa-0.7.11-r1032" "Bioinformatics tools"
.SH NAME
.PP
bwa - Burrows-Wheeler Alignment Tool
@ -75,7 +75,7 @@ appropriate algorithm will be chosen automatically.
.TP
.B mem
.B bwa mem
.RB [ -aCHMpP ]
.RB [ -aCHjMpP ]
.RB [ -t
.IR nThreads ]
.RB [ -k
@ -88,6 +88,12 @@ appropriate algorithm will be chosen automatically.
.IR seedSplitRatio ]
.RB [ -c
.IR maxOcc ]
.RB [ -D
.IR chainShadow ]
.RB [ -m
.IR maxMateSW ]
.RB [ -W
.IR minSeedMatch ]
.RB [ -A
.IR matchScore ]
.RB [ -B
@ -102,6 +108,8 @@ appropriate algorithm will be chosen automatically.
.IR unpairPen ]
.RB [ -R
.IR RGline ]
.RB [ -H
.IR HDlines ]
.RB [ -v
.IR verboseLevel ]
.I db.prefix
@ -193,9 +201,28 @@ Discard a MEM if it has more than
.I INT
occurence in the genome. This is an insensitive parameter. [500]
.TP
.BI -D \ INT
Drop chains shorter than
.I FLOAT
fraction of the longest overlapping chain [0.5]
.TP
.BI -m \ INT
Perform at most
.I INT
rounds of mate-SW [50]
.TP
.BI -W \ INT
Drop a chain if the number of bases in seeds is smaller than
.IR INT .
This option is primarily used for longer contigs/reads. When positive, it also
affects seed filtering. [0]
.TP
.B -P
In the paired-end mode, perform SW to rescue missing hits only but do not try to find
hits that fit a proper pair.
.TP
.B SCORING OPTIONS:
.TP
.BI -A \ INT
Matching score. [1]
@ -244,15 +271,30 @@ and will be converted to a TAB in the output SAM. The read group ID will be
attached to every read in the output. An example is '@RG\\tID:foo\\tSM:bar'.
[null]
.TP
.BI -H \ ARG
If ARG starts with @, it is interpreted as a string and gets inserted into the
output SAM header; otherwise, ARG is interpreted as a file with all lines
starting with @ in the file inserted into the SAM header. [null]
.TP
.BI -T \ INT
Don't output alignment with score lower than
.IR INT .
This option affects output and occasionally SAM flag 2. [30]
.TP
.BI -h \ INT
.BI -j
Treat ALT contigs as part of the primary assembly (i.e. ignore the
.I db.prefix.alt
file).
.TP
.BI -h \ INT[,INT2]
If a query has not more than
.I INT
hits with score higher than 80% of the best hit, output them all in the XA tag [5]
hits with score higher than 80% of the best hit, output them all in the XA tag.
If
.I INT2
is specified, BWA-MEM outputs up to
.I INT2
hits if the list contains a hit to an ALT contig. [5,200]
.TP
.B -a
Output all found alignments for single-end or unpaired paired-end reads. These

View File

@ -268,7 +268,7 @@ int main_mem(int argc, char *argv[])
fprintf(stderr, " -p smart pairing (ignoring in2.fq)\n");
fprintf(stderr, " -R STR read group header line such as '@RG\\tID:foo\\tSM:bar' [null]\n");
fprintf(stderr, " -H STR/FILE insert STR to header if it starts with @; or insert lines in FILE [null]\n");
fprintf(stderr, " -j ignore ALT contigs\n");
fprintf(stderr, " -j treat ALT contigs as part of the primary assembly (i.e. ignore <idxbase>.alt file)\n");
fprintf(stderr, "\n");
fprintf(stderr, " -v INT verbose level: 1=error, 2=warning, 3=message, 4+=debugging [%d]\n", bwa_verbose);
fprintf(stderr, " -T INT minimum score to output [%d]\n", opt->T);