Heng Li
a458442b24
r845: updated NEWS
...
I will use the new version for a while and then release it.
2014-09-16 14:38:41 -04:00
Heng Li
92bc6849a3
r844: added intra-species contig mapping mode
2014-09-16 10:53:07 -04:00
Heng Li
90518f11e3
r843: presetting for ONT 2d reads
...
Somewhat working for 1d reads, but not very well
2014-09-16 10:38:15 -04:00
Heng Li
4b6eeb34c8
r830: optionally fixed chunk size
2014-09-15 23:42:24 -04:00
Heng Li
624687b072
r829: killed a harmless gcc warning
2014-09-15 23:33:22 -04:00
Heng Li
5d26ab0ee3
r828: changed the default scoring for pacbio
2014-09-15 23:22:05 -04:00
Heng Li
b07587f806
r827: an alt hit as good as a pri hit as supp
2014-09-15 16:07:51 -04:00
Heng Li
bd85af08ab
r826: improved alt mapping for PE
2014-09-15 12:13:04 -04:00
Heng Li
aee53f1334
r824: ALT mapping seems working
2014-09-15 00:29:05 -04:00
Heng Li
015ab3f6c3
r823: towards ALT support
2014-09-14 16:41:14 -04:00
Heng Li
8d2b93156b
r821: more relax on containing seeds
2014-09-12 10:35:49 -04:00
Heng Li
f4aedddee6
r819: bugfix - added too many sub-SMEMs
2014-09-08 11:32:48 -04:00
Heng Li
35ac99b4f7
r815: optionally output ref fasta header
...
Also fixed a bug in reading .ann files
2014-08-29 10:51:23 -04:00
Heng Li
1e611b235c
r810: add err_puts()
...
puts() adds '\n', but fputs() does not.
2014-08-26 11:07:24 -04:00
Heng Li
b5cba257c1
r809: new strategy for the -a mode
2014-08-25 11:59:27 -04:00
Heng Li
bf7d1d46ca
r808: a minor bug with the new index -b
2014-08-25 10:36:24 -04:00
Heng Li
1bba5ef20e
r807: allow to change block size in bwt_gen
...
For a very large reference genome, the default is too small.
2014-08-25 10:31:54 -04:00
Heng Li
705aa53894
Released 0.7.10
2014-07-13 22:57:27 -04:00
Heng Li
7fd6a11569
r788: segfault when the last ref is "weird"
...
mem_patch_reg() did not check if two hits are on the same strand, which may
lead to an alignment bridging the forward-backward boundary.
2014-07-10 10:53:56 -04:00
Heng Li
cffff4338f
r787: use mem_seed_sw() also for non-PacBio reads
...
In the previous version, mem_seed_sw() is only used for PacBio reads to filter
bad seeds. For non-PacBio long queries, bwa-mem uses mem_chain2aln_short() for
a similar purpose. However, it turns out that mem_chain2aln_short() is not
effective given long near-tandem repeats. Bwa-mem still wastes a lot of time
of futile ref substring and extensions.
In this commit, mem_chain2aln_short() has been removed. mem_seed_sw() is used
if the query sequence is long enough (~700bp). For shorter reads, the results
should be almost identical to the previous version.
2014-07-10 10:30:22 -04:00
Heng Li
3efc33160c
0.7.9a-r786: fixed a segfault in a rare case
...
More likely to happen given a circular genome
2014-05-19 16:47:25 -04:00
Heng Li
031d3d83ce
Wrong release number: 0.7.8 => 0.7.9
2014-05-19 09:49:26 -04:00
Heng Li
be74dbc00c
Release bwa-0.7.9-r783
2014-05-19 09:09:11 -04:00
Heng Li
e4752b321b
Release bwa-0.7.9-r782
2014-05-19 09:08:07 -04:00
Heng Li
f00cc94e1d
r779: fixed a memory leak in SE
2014-05-16 00:06:34 -04:00
Heng Li
a5ad0cff7f
r778: reduced the number of alloc() calls a bit
2014-05-15 23:23:04 -04:00
Heng Li
8d2986ece2
r770: fixed a compiling warning
2014-05-14 14:44:03 -04:00
Heng Li
061c63f36a
r766: removed useless code
2014-05-13 13:09:29 -04:00
Heng Li
0168f39eeb
r765: fixed a declaration error
...
Reported by Andreas Tile from Debian
2014-05-13 12:54:23 -04:00
Heng Li
08517ac09b
r764: changed -c in "-x pacbio" to 500
2014-05-13 12:53:24 -04:00
Heng Li
39a6cd5bb0
r762: cleanup for the new release; unfinished
...
It will take to make the documentation ready.
2014-05-11 15:15:44 -04:00
Heng Li
cfe6996173
r760: removed commented code
...
It is slow and is not very effective. And I hate useless code.
2014-05-09 14:59:07 -04:00
Heng Li
43b498a37e
r759: bugfix - frac_rep not working
...
Also added commented code for a 3rd round seeding. Not used.
2014-05-09 14:56:59 -04:00
Heng Li
c9b33502f3
r758: fixed a typo
...
mostly negligible in practice
2014-05-07 15:07:29 -04:00
Heng Li
6ac8dd5840
r754: added command msg for -h
2014-05-06 16:15:14 -04:00
Heng Li
ce3c198245
r749: max_hits tunable on CMD; default to 5
2014-05-04 10:17:03 -04:00
Heng Li
f21d6498bc
r748: reduced the default -m to 50
2014-05-02 16:49:19 -04:00
Heng Li
e8f28cb529
r747: fixed a minor issue in the last (mis)commit
2014-05-02 16:17:50 -04:00
Heng Li
6db761e269
r746: tuned heuristic for GRCh38
...
Reduced -c to 500 by default. As a compensation, we choose up to 1000 positions
if a seed has 500 or more occurrences. In addition, a read with big portion
from such seeds will have lower mapping quality.
2014-05-02 16:06:27 -04:00
Heng Li
b7076848ab
r744: int overflow given MB query
2014-05-01 15:30:36 -04:00
Heng Li
fa20c71920
r742: further control the max bandwidth
...
I am looking at 6kb bandwidth...
2014-05-01 14:27:38 -04:00
Heng Li
7954e77a1b
r741: fixed segfault in rare cases
2014-05-01 11:13:05 -04:00
Heng Li
4b2441069f
r740: don't attempt merge if bandwidth too large
...
Sometimes the bandwidth can be >10k.
2014-05-01 11:01:52 -04:00
Heng Li
5aedc978d1
r739: output suboptimal hits in the PE mode
...
However, PE information is not used for suboptimal hits
2014-04-30 23:23:54 -04:00
Heng Li
c6c943f9d7
r738: output multi-map in the XA tag (SE only)
...
... PE support coming soon
2014-04-30 16:46:05 -04:00
Heng Li
d59d78838c
r737: fixed an assertion when failed to convert sa
...
A bug pointed out by Mikkle Schubert
2014-04-30 14:55:44 -04:00
Heng Li
88f89be60e
r736: improved in low-complexity regions
...
Example: GGAGGGGAAGGGTGGGCTGGAGGGGACGGGTGGGCTGGAGGGGAAGGGTGTGCTGGAGGGAAAAGGTGGACTGGAGGGGAAGGGTGGGCTGGAGGGGAAGG
This read has 5 chains, two of which are:
weight=80 26;26;0,4591439948(10:-3095894) 23;23;27,4591439957(10:-3095888) 31;31;70,4591439964(10:-3095873)
weight=50 45;45;51,4591440017(10:-3095806) 50;50;51,4591440017(10:-3095801) 31;31;70,4591440090(10:-3095747)
Extension from the 26bp seed in the 1st chain gives an alignment [0,101) <=> [4591439948,4591440067), which
contains the 50bp seed in the second chain. However, if we extend the 50bp seed, it yields a better alignment
[0,101) <=> [4591439966,4591440067) with a different starting position. The 26bp seed is wrong. This commit
adds a heuristic to fix this issue.
2014-04-30 14:14:20 -04:00
Heng Li
11698fc4e5
r735: fixed a bug caused by merge
2014-04-30 13:12:43 -04:00
Heng Li
b603fed39c
r733: bugfix - seed score unset when no -W
2014-04-29 14:58:53 -04:00
Heng Li
44754cd615
r731: separate layouter
2014-04-28 10:39:29 -04:00
Heng Li
dadd5d6281
r730: more permissive about merging overlapping
2014-04-28 10:01:54 -04:00
Heng Li
76bb49e01b
r729: halved band width; doubled patch band width
2014-04-24 16:06:01 -04:00
Heng Li
6052d3015b
r728: sorting the end in mem_sort_dedup_patch()
...
The older version does this, which is correct.
2014-04-24 15:44:59 -04:00
Heng Li
df65893fb5
r727: extend seeds with SW
2014-04-24 14:28:40 -04:00
Heng Li
b92bbb47e5
Merge branch '0.7.7-softclip' into layout
...
Conflicts:
Makefile
bwamem.h
fastmap.c
main.c
2014-04-24 12:24:49 -04:00
Heng Li
8c12ec4a4b
r725: optionally disable hard clipping
...
as is reqested by the cancer group
2014-04-24 11:56:43 -04:00
Heng Li
b93fca2b2e
r723: merge adjacent hits
2014-04-16 16:38:50 -04:00
Heng Li
48847af2fc
code backup
2014-04-16 12:00:13 -04:00
Heng Li
00a07f61bf
r721: merge overlapping hits by default
2014-04-15 16:16:04 -04:00
Heng Li
45f24b4ae8
r720: improved overlap hit merging
2014-04-15 16:09:42 -04:00
Heng Li
bdb7b000cd
r719: more stringent overlap merge
...
Will consider to make it the default
2014-04-15 14:52:17 -04:00
Heng Li
4e22270eba
r718: merge alnregs overlapping on both query/ref
2014-04-14 17:01:17 -04:00
Heng Li
6d4a6debdc
r716: changed -x pbread
2014-04-14 16:04:29 -04:00
Heng Li
bbcabfe342
r707: change params for pacbio-to-pacbio
2014-04-10 21:53:52 -04:00
Heng Li
658f27eae4
Merge branch 'dev' into layout
...
Conflicts:
main.c
2014-04-10 21:48:47 -04:00
Heng Li
6fda93502f
r705: pairing performed on one chr only
...
Change of versioning: the revision number is acquired with:
git rev-list --all --count
This counts the total number of commits across all branches.
2014-04-10 21:38:14 -04:00
Heng Li
db4b171fa6
Merge branch 'dev' into layout
...
Conflicts:
main.c
2014-04-10 21:09:47 -04:00
Heng Li
07182d9061
dev-475: -F outputs unit score, not raw score
2014-04-10 21:09:06 -04:00
Heng Li
7d25fe2de3
Merge branch 'dev' into layout
...
Conflicts:
main.c
2014-04-10 21:07:16 -04:00
Heng Li
e80bccc923
dev-474: fixed a typo
2014-04-10 21:04:02 -04:00
Heng Li
f02cd42679
dev-473: added a few assertions
...
to make sure the new change works as is expected
2014-04-10 21:03:13 -04:00
Heng Li
8638cfadc8
dev-472: get rid of bwa_fix_xref()
...
This function causes all kinds of problems when the reference genome consists
of many short reads/contigs/chromsomes. Some of the problems are nearly
unfixable at the point where bwa_fix_xref() gets called. This commit attempts
to fix the problem at the root. It disallows chains spanning multiple contigs
and never retrieves sequences bridging two adjacent contigs. Thus all the
chaining, extension, SW and global alignments are confined to on contig only.
This commit brings many changes. I have tested it on a couple examples
including Peter Field's PacBio example. It works well so far.
2014-04-10 20:54:27 -04:00
Heng Li
e2d0c996e9
layout-477: output unit score, not the raw score
2014-04-10 18:03:28 -04:00
Heng Li
0eeacbbe39
Merge branch 'dev' into layout
2014-04-10 17:56:24 -04:00
Heng Li
23e0e99ec0
dev-471: fixed a compiling error from last commit
2014-04-10 11:54:17 -04:00
Heng Li
ccbbe48c4f
dev-470: don't stop on bwa_fix_xref2() failures
...
Peter Field has sent me an example caused by an alignment bridging three
adjacent chromosomes/contigs. Bwa-mem always aligns the query to the contig
covering the middle point of the alignment. In this example, it chooses the
middle contig, which should not be aligned. This leads to weird things failing
bwa_fix_xref2(), which cannot be fixed unless we build the contig boundaries
into the FM-index.
In the old code, bwa-mem halts when bwa_fix_xref2() fails. With this commit,
bwa-mem will give a warning instead of halting.
2014-04-10 11:43:17 -04:00
Heng Li
8220008564
an attempt to layout tool
2014-04-09 16:11:52 -04:00
Heng Li
db58392e9b
dev-469: fixed wrong command line prompt
2014-04-09 13:20:04 -04:00
Heng Li
d766591c1e
dev-468: fixed a segfault caused by NULL
2014-04-08 22:11:36 -04:00
Heng Li
99f6f9a0d1
dev-467: limit the max #chains to extend
2014-04-08 21:45:49 -04:00
Heng Li
c0a308a8b6
dev-466: simplified chain filtering
2014-04-08 17:33:07 -04:00
Heng Li
f12dfae772
dev-465: a new output format for read overlap
...
Also moved a few functions to bwamem_extra.c. File bwamem.c is becoming far too
long.
2014-04-08 16:29:36 -04:00
Heng Li
b45aeb87e1
dev-464: preset for pacbio read2read aln
2014-04-08 11:40:54 -04:00
Heng Li
172ba83241
dev-463: added option -x to change multiple params
...
I hate to copy-paste long command line options.
2014-04-07 11:29:36 -04:00
Heng Li
114901b005
dev-r462: refined setting for PacBio; weight flt
...
The recommended setting in the last commit is wrong. If we can extend a random
seed hit to the full length, we will force the read aligned through break
points, which is wrong. The new setting is better but it may lead to a small
fraction of fragmented alignments.
In addition, I added a filter on the minimum chain weight and tied
min_HSP_score to this filter. It doubles the mapping speed.
2014-04-04 17:01:04 -04:00
Heng Li
41f720dfa7
dev-461: added a heuristic for PacBio data
...
See the comment above mem_test_chain_sw() for details.
2014-04-04 16:05:41 -04:00
Heng Li
066ec4aa95
dev-460: disallow a cigar 20M2D2I30M in extension
...
Global alignment does not allow contiguous insertions and deletions, but local
alignment and extension allow such CIGARs. The optimal global alignment may
have a lower score than extension, which actually happens often for PacBio
data. This commit disallows a CIGAR like 20M2D2I30M to fix this inconsistency.
Local alignment has not been changed.
2014-04-04 10:44:34 -04:00
Heng Li
b6bd33b26c
dev-459: don't hard code the drop ratio
...
In the old code, if a secondary alignment is 50% worse, it won't be outputted.
2014-04-03 18:58:49 -04:00
Heng Li
b3225581be
dev-458: simplified the smem iterator
...
simpler but less powful.
2014-04-03 15:23:48 -04:00
Heng Li
acfe7613db
dev-457: separated interval collection and seeding
2014-04-03 15:10:50 -04:00
Heng Li
3efb7c0e91
r455: release bwa-0.7.8
2014-03-31 15:27:23 -04:00
Heng Li
127c00cc96
dev-454: wording change in command line prompt
2014-03-31 12:03:27 -04:00
Heng Li
b27bdf1ae0
dev-453: change of -A scales -TdBOELU
...
These paramemters are all proportional to -A.
2014-03-31 11:52:52 -04:00
Heng Li
b7076d9023
dev-r452: allow to specify insert size at cmd
...
This is also very useful for debugging.
2014-03-31 11:21:03 -04:00
Heng Li
417c6d66c7
dev-r451: fixed a few bugs when -A!=1
...
Something is still wrong.
2014-03-31 10:52:45 -04:00
Heng Li
9ce50a4e5e
dev-450: support diff ins/del penalties. NO TEST!!
2014-03-28 14:54:06 -04:00
Heng Li
578bb55c38
dev-449: unequal ins/del in global() and extend()
2014-03-28 14:15:38 -04:00
Heng Li
0c783399e8
dev-448: different ins/del penalties
2014-03-28 10:54:23 -04:00
Heng Li
2e9463ebf1
dev-r442: suppress exact full-length matches
2014-02-26 22:04:19 -05:00
Heng Li
1c19bc630f
Released bwa-0.7.7-r441
2014-02-25 01:05:37 -05:00