diff --git a/NEWS.md b/NEWS.md index 4a3fbd3..4be1dfc 100644 --- a/NEWS.md +++ b/NEWS.md @@ -11,25 +11,27 @@ For general uses, the single BWA binary still works like the old way. Another major addition to BWA-MEM is HLA typing, which made possible with the new ALT mapping strategy. Necessary data and programs are included in the -binary release. The wrapper script also performs HLA typing when HLA genes are -included in the reference genome as additional ALT contigs. +binary release. The wrapper script also optionally performs HLA typing when HLA +genes are included in the reference genome as additional ALT contigs. Other notable changes to BWA-MEM: * Added option `-b` to `bwa index`. This option tunes the batch size used in the construction of BWT. It is advised to use large `-b` for huge reference - sequences such as the *nt* database. + sequences such as the BLAST *nt* database. - * Optimized for PacBio data. This includes a change to the scoring based on a - mini-study done by Aaron Quinlan and a heuristic speedup. Further speedup is + * Optimized for PacBio data. This includes a change to scoring based on a + study done by Aaron Quinlan and a heuristic speedup. Further speedup is possible, but needs more careful investigation. - * Dropped PacBio read-to-read alignment for now. BWA-MEM is only good at - finding the best hit, not all hits. Option `-x pbread` is still available, - but hidden on the command line. + * Dropped PacBio read-to-read alignment for now. BWA-MEM is good for finding + the best hit, but is not very sensitive to suboptimal hits. Option `-x pbread` + is still available, but hidden on the command line. This may be removed in + future releases. * Added a new pre-setting for Oxford Nanopore 2D reads. LAST is still a little - more sensitive on bacterial data, but bwa-mem is times faster on human data. + more sensitive on older bacterial data, but bwa-mem is as good on more + recent data and is times faster for mapping against mammalian genomes. * Added LAST-like seeding. This improves the accuracy for longer reads. diff --git a/bwa.1 b/bwa.1 index 0d556be..6e95c0b 100644 --- a/bwa.1 +++ b/bwa.1 @@ -1,4 +1,4 @@ -.TH bwa 1 "18 November 2014" "bwa-0.7.11-r999" "Bioinformatics tools" +.TH bwa 1 "21 December 2014" "bwa-0.7.11-r1032" "Bioinformatics tools" .SH NAME .PP bwa - Burrows-Wheeler Alignment Tool @@ -75,7 +75,7 @@ appropriate algorithm will be chosen automatically. .TP .B mem .B bwa mem -.RB [ -aCHMpP ] +.RB [ -aCHjMpP ] .RB [ -t .IR nThreads ] .RB [ -k @@ -88,6 +88,12 @@ appropriate algorithm will be chosen automatically. .IR seedSplitRatio ] .RB [ -c .IR maxOcc ] +.RB [ -D +.IR chainShadow ] +.RB [ -m +.IR maxMateSW ] +.RB [ -W +.IR minSeedMatch ] .RB [ -A .IR matchScore ] .RB [ -B @@ -102,6 +108,8 @@ appropriate algorithm will be chosen automatically. .IR unpairPen ] .RB [ -R .IR RGline ] +.RB [ -H +.IR HDlines ] .RB [ -v .IR verboseLevel ] .I db.prefix @@ -193,9 +201,28 @@ Discard a MEM if it has more than .I INT occurence in the genome. This is an insensitive parameter. [500] .TP +.BI -D \ INT +Drop chains shorter than +.I FLOAT +fraction of the longest overlapping chain [0.5] +.TP +.BI -m \ INT +Perform at most +.I INT +rounds of mate-SW [50] +.TP +.BI -W \ INT +Drop a chain if the number of bases in seeds is smaller than +.IR INT . +This option is primarily used for longer contigs/reads. When positive, it also +affects seed filtering. [0] +.TP .B -P In the paired-end mode, perform SW to rescue missing hits only but do not try to find hits that fit a proper pair. + +.TP +.B SCORING OPTIONS: .TP .BI -A \ INT Matching score. [1] @@ -244,15 +271,30 @@ and will be converted to a TAB in the output SAM. The read group ID will be attached to every read in the output. An example is '@RG\\tID:foo\\tSM:bar'. [null] .TP +.BI -H \ ARG +If ARG starts with @, it is interpreted as a string and gets inserted into the +output SAM header; otherwise, ARG is interpreted as a file with all lines +starting with @ in the file inserted into the SAM header. [null] +.TP .BI -T \ INT Don't output alignment with score lower than .IR INT . This option affects output and occasionally SAM flag 2. [30] .TP -.BI -h \ INT +.BI -j +Treat ALT contigs as part of the primary assembly (i.e. ignore the +.I db.prefix.alt +file). +.TP +.BI -h \ INT[,INT2] If a query has not more than .I INT -hits with score higher than 80% of the best hit, output them all in the XA tag [5] +hits with score higher than 80% of the best hit, output them all in the XA tag. +If +.I INT2 +is specified, BWA-MEM outputs up to +.I INT2 +hits if the list contains a hit to an ALT contig. [5,200] .TP .B -a Output all found alignments for single-end or unpaired paired-end reads. These diff --git a/fastmap.c b/fastmap.c index 79ebd39..3cca4de 100644 --- a/fastmap.c +++ b/fastmap.c @@ -268,7 +268,7 @@ int main_mem(int argc, char *argv[]) fprintf(stderr, " -p smart pairing (ignoring in2.fq)\n"); fprintf(stderr, " -R STR read group header line such as '@RG\\tID:foo\\tSM:bar' [null]\n"); fprintf(stderr, " -H STR/FILE insert STR to header if it starts with @; or insert lines in FILE [null]\n"); - fprintf(stderr, " -j ignore ALT contigs\n"); + fprintf(stderr, " -j treat ALT contigs as part of the primary assembly (i.e. ignore .alt file)\n"); fprintf(stderr, "\n"); fprintf(stderr, " -v INT verbose level: 1=error, 2=warning, 3=message, 4+=debugging [%d]\n", bwa_verbose); fprintf(stderr, " -T INT minimum score to output [%d]\n", opt->T);