Commit Graph

504 Commits (8638cfadc875a44f7db5ee822e71f7cddb5221fa)

Author SHA1 Message Date
Heng Li 8638cfadc8 dev-472: get rid of bwa_fix_xref()
This function causes all kinds of problems when the reference genome consists
of many short reads/contigs/chromsomes. Some of the problems are nearly
unfixable at the point where bwa_fix_xref() gets called. This commit attempts
to fix the problem at the root. It disallows chains spanning multiple contigs
and never retrieves sequences bridging two adjacent contigs. Thus all the
chaining, extension, SW and global alignments are confined to on contig only.

This commit brings many changes. I have tested it on a couple examples
including Peter Field's PacBio example. It works well so far.
2014-04-10 20:54:27 -04:00
Heng Li 23e0e99ec0 dev-471: fixed a compiling error from last commit 2014-04-10 11:54:17 -04:00
Heng Li ccbbe48c4f dev-470: don't stop on bwa_fix_xref2() failures
Peter Field has sent me an example caused by an alignment bridging three
adjacent chromosomes/contigs. Bwa-mem always aligns the query to the contig
covering the middle point of the alignment. In this example, it chooses the
middle contig, which should not be aligned. This leads to weird things failing
bwa_fix_xref2(), which cannot be fixed unless we build the contig boundaries
into the FM-index.

In the old code, bwa-mem halts when bwa_fix_xref2() fails. With this commit,
bwa-mem will give a warning instead of halting.
2014-04-10 11:43:17 -04:00
Heng Li db58392e9b dev-469: fixed wrong command line prompt 2014-04-09 13:20:04 -04:00
Heng Li d766591c1e dev-468: fixed a segfault caused by NULL 2014-04-08 22:11:36 -04:00
Heng Li 99f6f9a0d1 dev-467: limit the max #chains to extend 2014-04-08 21:45:49 -04:00
Heng Li c0a308a8b6 dev-466: simplified chain filtering 2014-04-08 17:33:07 -04:00
Heng Li f12dfae772 dev-465: a new output format for read overlap
Also moved a few functions to bwamem_extra.c. File bwamem.c is becoming far too
long.
2014-04-08 16:29:36 -04:00
Heng Li b45aeb87e1 dev-464: preset for pacbio read2read aln 2014-04-08 11:40:54 -04:00
Heng Li 172ba83241 dev-463: added option -x to change multiple params
I hate to copy-paste long command line options.
2014-04-07 11:29:36 -04:00
Heng Li 114901b005 dev-r462: refined setting for PacBio; weight flt
The recommended setting in the last commit is wrong. If we can extend a random
seed hit to the full length, we will force the read aligned through break
points, which is wrong. The new setting is better but it may lead to a small
fraction of fragmented alignments.

In addition, I added a filter on the minimum chain weight and tied
min_HSP_score to this filter. It doubles the mapping speed.
2014-04-04 17:01:04 -04:00
Heng Li 41f720dfa7 dev-461: added a heuristic for PacBio data
See the comment above mem_test_chain_sw() for details.
2014-04-04 16:05:41 -04:00
Heng Li 066ec4aa95 dev-460: disallow a cigar 20M2D2I30M in extension
Global alignment does not allow contiguous insertions and deletions, but local
alignment and extension allow such CIGARs. The optimal global alignment may
have a lower score than extension, which actually happens often for PacBio
data. This commit disallows a CIGAR like 20M2D2I30M to fix this inconsistency.
Local alignment has not been changed.
2014-04-04 10:44:34 -04:00
Heng Li b6bd33b26c dev-459: don't hard code the drop ratio
In the old code, if a secondary alignment is 50% worse, it won't be outputted.
2014-04-03 18:58:49 -04:00
Heng Li b3225581be dev-458: simplified the smem iterator
simpler but less powful.
2014-04-03 15:23:48 -04:00
Heng Li acfe7613db dev-457: separated interval collection and seeding 2014-04-03 15:10:50 -04:00
Heng Li 9a5705289c added more debugging infomation
I can see a bug, but I do not know where it comes from.
2014-04-03 13:38:08 -04:00
Heng Li 3efb7c0e91 r455: release bwa-0.7.8 2014-03-31 15:27:23 -04:00
Heng Li 127c00cc96 dev-454: wording change in command line prompt 2014-03-31 12:03:27 -04:00
Heng Li b27bdf1ae0 dev-453: change of -A scales -TdBOELU
These paramemters are all proportional to -A.
2014-03-31 11:52:52 -04:00
Heng Li b7076d9023 dev-r452: allow to specify insert size at cmd
This is also very useful for debugging.
2014-03-31 11:21:03 -04:00
Heng Li 417c6d66c7 dev-r451: fixed a few bugs when -A!=1
Something is still wrong.
2014-03-31 10:52:45 -04:00
Heng Li 9ce50a4e5e dev-450: support diff ins/del penalties. NO TEST!! 2014-03-28 14:54:06 -04:00
Heng Li 578bb55c38 dev-449: unequal ins/del in global() and extend() 2014-03-28 14:15:38 -04:00
Heng Li 0c783399e8 dev-448: different ins/del penalties 2014-03-28 10:54:23 -04:00
Heng Li 8f9aeef4ec Merge branch 'master' into dev
Conflicts:
	main.c
2014-03-17 00:03:52 -04:00
Heng Li e6931bec03 r445: unnecessarily large bandwidth in global 2014-03-17 00:01:00 -04:00
Heng Li 7d63e76245 r444: more debugging output in CIGAR generation
Also found a potential issue which should not affect accuracy but may hurt
speed. Will investigate later.
2014-03-16 23:25:04 -04:00
Heng Li 8929bd1c25 r443: more verbose debugging information 2014-03-16 15:18:58 -04:00
Heng Li 8ede4ffbfa Fixed clang compiling warnings 2014-03-16 15:18:22 -04:00
Heng Li 2e9463ebf1 dev-r442: suppress exact full-length matches 2014-02-26 22:04:19 -05:00
Heng Li 1c19bc630f Released bwa-0.7.7-r441 2014-02-25 01:05:37 -05:00
Heng Li e879817373 r440: a condition not work due to a typo 2014-02-20 13:06:40 -05:00
Heng Li ce026a07fc r439: expose mem_opt_t::max_matesw 2014-02-19 13:10:33 -05:00
Heng Li 17fb85a227 r438: still an issue in MD
It occurs when the global alignment disagrees with the local alignment.
2014-02-19 11:31:54 -05:00
Heng Li 52391a9855 r437: print timing for each batch of reads 2014-02-19 10:54:26 -05:00
Heng Li bdd14d2946 r436: fix rare MD/NM-CIGAR inconsistencies 2014-02-19 10:08:43 -05:00
Heng Li 4adc34eccb r435: bugfix - base not complemented on the rev 2014-02-18 10:32:24 -05:00
Heng Li 14aa43cca0 r434: added the missing bwasw/aln commands! 2014-02-12 15:39:02 -05:00
Heng Li 7c50bad567 Release bwa-0.7.6a-r433 2014-01-31 12:58:21 -05:00
Heng Li 5fdab3ae13 Released bwa-0.7.6-r432 2014-01-31 11:12:59 -05:00
Heng Li f524c7d3d8 r431: added the MD tag to bwa-mem 2014-01-29 12:05:11 -05:00
Heng Li ea3dc2f003 r430: fix a bug producing incorrect alignment
Ksw uses two rounds of SSE2-SW to find the boundaries of an alignment. If the
second round gives a different score from the first round, it will fail. The
fix checks if this happens, though I have not dig into an example to understand
why this may happen in the first place.
2014-01-29 10:51:02 -05:00
Heng Li d17ae1e808 Merge pull request #21 from bpow/fix-duplicate-pg
fix duplicate PG lines in bwape and bwase
2014-01-06 06:41:16 -08:00
Bradford Powell c26ba4e376 fix duplicate PG lines in bwape and bwase 2014-01-05 14:54:48 -05:00
Heng Li 10cb6b0507 r428: allow to change the default chain_drop_ratio 2013-12-30 16:18:45 -05:00
Heng Li 3afcdc7746 debugging code only: print seeds 2013-12-30 16:05:43 -05:00
Heng Li 74a1a53499 print debugging msg to stdout 2013-12-30 15:49:41 -05:00
Heng Li f70d80a5a2 r427: fixed bugs in backtrack
See comments in ksw_global() for details.
2013-12-30 15:40:18 -05:00
Heng Li 8b6ec74907 r424: fixed a bw bug in samse/pe 2013-11-25 15:48:04 -05:00