Commit Graph

233 Commits (c6c943f9d73dd41209521813b7e0b5e9de47a208)

Author SHA1 Message Date
Heng Li c6c943f9d7 r738: output multi-map in the XA tag (SE only)
... PE support coming soon
2014-04-30 16:46:05 -04:00
Heng Li d59d78838c r737: fixed an assertion when failed to convert sa
A bug pointed out by Mikkle Schubert
2014-04-30 14:55:44 -04:00
Heng Li 88f89be60e r736: improved in low-complexity regions
Example: GGAGGGGAAGGGTGGGCTGGAGGGGACGGGTGGGCTGGAGGGGAAGGGTGTGCTGGAGGGAAAAGGTGGACTGGAGGGGAAGGGTGGGCTGGAGGGGAAGG

This read has 5 chains, two of which are:

weight=80  26;26;0,4591439948(10:-3095894)  23;23;27,4591439957(10:-3095888)  31;31;70,4591439964(10:-3095873)
weight=50  45;45;51,4591440017(10:-3095806) 50;50;51,4591440017(10:-3095801)  31;31;70,4591440090(10:-3095747)

Extension from the 26bp seed in the 1st chain gives an alignment [0,101) <=> [4591439948,4591440067), which
contains the 50bp seed in the second chain. However, if we extend the 50bp seed, it yields a better alignment
[0,101) <=> [4591439966,4591440067) with a different starting position. The 26bp seed is wrong. This commit
adds a heuristic to fix this issue.
2014-04-30 14:14:20 -04:00
Heng Li 11698fc4e5 r735: fixed a bug caused by merge 2014-04-30 13:12:43 -04:00
Heng Li b603fed39c r733: bugfix - seed score unset when no -W 2014-04-29 14:58:53 -04:00
Heng Li 44754cd615 r731: separate layouter 2014-04-28 10:39:29 -04:00
Heng Li dadd5d6281 r730: more permissive about merging overlapping 2014-04-28 10:01:54 -04:00
Heng Li 76bb49e01b r729: halved band width; doubled patch band width 2014-04-24 16:06:01 -04:00
Heng Li 6052d3015b r728: sorting the end in mem_sort_dedup_patch()
The older version does this, which is correct.
2014-04-24 15:44:59 -04:00
Heng Li df65893fb5 r727: extend seeds with SW 2014-04-24 14:28:40 -04:00
Heng Li b92bbb47e5 Merge branch '0.7.7-softclip' into layout
Conflicts:
	Makefile
	bwamem.h
	fastmap.c
	main.c
2014-04-24 12:24:49 -04:00
Heng Li 8c12ec4a4b r725: optionally disable hard clipping
as is reqested by the cancer group
2014-04-24 11:56:43 -04:00
Heng Li b93fca2b2e r723: merge adjacent hits 2014-04-16 16:38:50 -04:00
Heng Li 48847af2fc code backup 2014-04-16 12:00:13 -04:00
Heng Li 00a07f61bf r721: merge overlapping hits by default 2014-04-15 16:16:04 -04:00
Heng Li 45f24b4ae8 r720: improved overlap hit merging 2014-04-15 16:09:42 -04:00
Heng Li bdb7b000cd r719: more stringent overlap merge
Will consider to make it the default
2014-04-15 14:52:17 -04:00
Heng Li 4e22270eba r718: merge alnregs overlapping on both query/ref 2014-04-14 17:01:17 -04:00
Heng Li 6d4a6debdc r716: changed -x pbread 2014-04-14 16:04:29 -04:00
Heng Li bbcabfe342 r707: change params for pacbio-to-pacbio 2014-04-10 21:53:52 -04:00
Heng Li 658f27eae4 Merge branch 'dev' into layout
Conflicts:
	main.c
2014-04-10 21:48:47 -04:00
Heng Li 6fda93502f r705: pairing performed on one chr only
Change of versioning: the revision number is acquired with:

  git rev-list --all --count

This counts the total number of commits across all branches.
2014-04-10 21:38:14 -04:00
Heng Li db4b171fa6 Merge branch 'dev' into layout
Conflicts:
	main.c
2014-04-10 21:09:47 -04:00
Heng Li 07182d9061 dev-475: -F outputs unit score, not raw score 2014-04-10 21:09:06 -04:00
Heng Li 7d25fe2de3 Merge branch 'dev' into layout
Conflicts:
	main.c
2014-04-10 21:07:16 -04:00
Heng Li e80bccc923 dev-474: fixed a typo 2014-04-10 21:04:02 -04:00
Heng Li f02cd42679 dev-473: added a few assertions
to make sure the new change works as is expected
2014-04-10 21:03:13 -04:00
Heng Li 8638cfadc8 dev-472: get rid of bwa_fix_xref()
This function causes all kinds of problems when the reference genome consists
of many short reads/contigs/chromsomes. Some of the problems are nearly
unfixable at the point where bwa_fix_xref() gets called. This commit attempts
to fix the problem at the root. It disallows chains spanning multiple contigs
and never retrieves sequences bridging two adjacent contigs. Thus all the
chaining, extension, SW and global alignments are confined to on contig only.

This commit brings many changes. I have tested it on a couple examples
including Peter Field's PacBio example. It works well so far.
2014-04-10 20:54:27 -04:00
Heng Li e2d0c996e9 layout-477: output unit score, not the raw score 2014-04-10 18:03:28 -04:00
Heng Li 0eeacbbe39 Merge branch 'dev' into layout 2014-04-10 17:56:24 -04:00
Heng Li 23e0e99ec0 dev-471: fixed a compiling error from last commit 2014-04-10 11:54:17 -04:00
Heng Li ccbbe48c4f dev-470: don't stop on bwa_fix_xref2() failures
Peter Field has sent me an example caused by an alignment bridging three
adjacent chromosomes/contigs. Bwa-mem always aligns the query to the contig
covering the middle point of the alignment. In this example, it chooses the
middle contig, which should not be aligned. This leads to weird things failing
bwa_fix_xref2(), which cannot be fixed unless we build the contig boundaries
into the FM-index.

In the old code, bwa-mem halts when bwa_fix_xref2() fails. With this commit,
bwa-mem will give a warning instead of halting.
2014-04-10 11:43:17 -04:00
Heng Li 8220008564 an attempt to layout tool 2014-04-09 16:11:52 -04:00
Heng Li db58392e9b dev-469: fixed wrong command line prompt 2014-04-09 13:20:04 -04:00
Heng Li d766591c1e dev-468: fixed a segfault caused by NULL 2014-04-08 22:11:36 -04:00
Heng Li 99f6f9a0d1 dev-467: limit the max #chains to extend 2014-04-08 21:45:49 -04:00
Heng Li c0a308a8b6 dev-466: simplified chain filtering 2014-04-08 17:33:07 -04:00
Heng Li f12dfae772 dev-465: a new output format for read overlap
Also moved a few functions to bwamem_extra.c. File bwamem.c is becoming far too
long.
2014-04-08 16:29:36 -04:00
Heng Li b45aeb87e1 dev-464: preset for pacbio read2read aln 2014-04-08 11:40:54 -04:00
Heng Li 172ba83241 dev-463: added option -x to change multiple params
I hate to copy-paste long command line options.
2014-04-07 11:29:36 -04:00
Heng Li 114901b005 dev-r462: refined setting for PacBio; weight flt
The recommended setting in the last commit is wrong. If we can extend a random
seed hit to the full length, we will force the read aligned through break
points, which is wrong. The new setting is better but it may lead to a small
fraction of fragmented alignments.

In addition, I added a filter on the minimum chain weight and tied
min_HSP_score to this filter. It doubles the mapping speed.
2014-04-04 17:01:04 -04:00
Heng Li 41f720dfa7 dev-461: added a heuristic for PacBio data
See the comment above mem_test_chain_sw() for details.
2014-04-04 16:05:41 -04:00
Heng Li 066ec4aa95 dev-460: disallow a cigar 20M2D2I30M in extension
Global alignment does not allow contiguous insertions and deletions, but local
alignment and extension allow such CIGARs. The optimal global alignment may
have a lower score than extension, which actually happens often for PacBio
data. This commit disallows a CIGAR like 20M2D2I30M to fix this inconsistency.
Local alignment has not been changed.
2014-04-04 10:44:34 -04:00
Heng Li b6bd33b26c dev-459: don't hard code the drop ratio
In the old code, if a secondary alignment is 50% worse, it won't be outputted.
2014-04-03 18:58:49 -04:00
Heng Li b3225581be dev-458: simplified the smem iterator
simpler but less powful.
2014-04-03 15:23:48 -04:00
Heng Li acfe7613db dev-457: separated interval collection and seeding 2014-04-03 15:10:50 -04:00
Heng Li 3efb7c0e91 r455: release bwa-0.7.8 2014-03-31 15:27:23 -04:00
Heng Li 127c00cc96 dev-454: wording change in command line prompt 2014-03-31 12:03:27 -04:00
Heng Li b27bdf1ae0 dev-453: change of -A scales -TdBOELU
These paramemters are all proportional to -A.
2014-03-31 11:52:52 -04:00
Heng Li b7076d9023 dev-r452: allow to specify insert size at cmd
This is also very useful for debugging.
2014-03-31 11:21:03 -04:00