Commit Graph

349 Commits (main)

Author SHA1 Message Date
Heng Li 0168f39eeb r765: fixed a declaration error
Reported by Andreas Tile from Debian
2014-05-13 12:54:23 -04:00
Heng Li 08517ac09b r764: changed -c in "-x pacbio" to 500 2014-05-13 12:53:24 -04:00
Heng Li 39a6cd5bb0 r762: cleanup for the new release; unfinished
It will take to make the documentation ready.
2014-05-11 15:15:44 -04:00
Heng Li cfe6996173 r760: removed commented code
It is slow and is not very effective. And I hate useless code.
2014-05-09 14:59:07 -04:00
Heng Li 43b498a37e r759: bugfix - frac_rep not working
Also added commented code for a 3rd round seeding. Not used.
2014-05-09 14:56:59 -04:00
Heng Li c9b33502f3 r758: fixed a typo
mostly negligible in practice
2014-05-07 15:07:29 -04:00
Heng Li 6ac8dd5840 r754: added command msg for -h 2014-05-06 16:15:14 -04:00
Heng Li ce3c198245 r749: max_hits tunable on CMD; default to 5 2014-05-04 10:17:03 -04:00
Heng Li f21d6498bc r748: reduced the default -m to 50 2014-05-02 16:49:19 -04:00
Heng Li e8f28cb529 r747: fixed a minor issue in the last (mis)commit 2014-05-02 16:17:50 -04:00
Heng Li 6db761e269 r746: tuned heuristic for GRCh38
Reduced -c to 500 by default. As a compensation, we choose up to 1000 positions
if a seed has 500 or more occurrences. In addition, a read with big portion
from such seeds will have lower mapping quality.
2014-05-02 16:06:27 -04:00
Heng Li b7076848ab r744: int overflow given MB query 2014-05-01 15:30:36 -04:00
Heng Li fa20c71920 r742: further control the max bandwidth
I am looking at 6kb bandwidth...
2014-05-01 14:27:38 -04:00
Heng Li 7954e77a1b r741: fixed segfault in rare cases 2014-05-01 11:13:05 -04:00
Heng Li 4b2441069f r740: don't attempt merge if bandwidth too large
Sometimes the bandwidth can be >10k.
2014-05-01 11:01:52 -04:00
Heng Li 5aedc978d1 r739: output suboptimal hits in the PE mode
However, PE information is not used for suboptimal hits
2014-04-30 23:23:54 -04:00
Heng Li c6c943f9d7 r738: output multi-map in the XA tag (SE only)
... PE support coming soon
2014-04-30 16:46:05 -04:00
Heng Li d59d78838c r737: fixed an assertion when failed to convert sa
A bug pointed out by Mikkle Schubert
2014-04-30 14:55:44 -04:00
Heng Li 88f89be60e r736: improved in low-complexity regions
Example: GGAGGGGAAGGGTGGGCTGGAGGGGACGGGTGGGCTGGAGGGGAAGGGTGTGCTGGAGGGAAAAGGTGGACTGGAGGGGAAGGGTGGGCTGGAGGGGAAGG

This read has 5 chains, two of which are:

weight=80  26;26;0,4591439948(10:-3095894)  23;23;27,4591439957(10:-3095888)  31;31;70,4591439964(10:-3095873)
weight=50  45;45;51,4591440017(10:-3095806) 50;50;51,4591440017(10:-3095801)  31;31;70,4591440090(10:-3095747)

Extension from the 26bp seed in the 1st chain gives an alignment [0,101) <=> [4591439948,4591440067), which
contains the 50bp seed in the second chain. However, if we extend the 50bp seed, it yields a better alignment
[0,101) <=> [4591439966,4591440067) with a different starting position. The 26bp seed is wrong. This commit
adds a heuristic to fix this issue.
2014-04-30 14:14:20 -04:00
Heng Li 11698fc4e5 r735: fixed a bug caused by merge 2014-04-30 13:12:43 -04:00
Heng Li b603fed39c r733: bugfix - seed score unset when no -W 2014-04-29 14:58:53 -04:00
Heng Li 44754cd615 r731: separate layouter 2014-04-28 10:39:29 -04:00
Heng Li dadd5d6281 r730: more permissive about merging overlapping 2014-04-28 10:01:54 -04:00
Heng Li 76bb49e01b r729: halved band width; doubled patch band width 2014-04-24 16:06:01 -04:00
Heng Li 6052d3015b r728: sorting the end in mem_sort_dedup_patch()
The older version does this, which is correct.
2014-04-24 15:44:59 -04:00
Heng Li df65893fb5 r727: extend seeds with SW 2014-04-24 14:28:40 -04:00
Heng Li b92bbb47e5 Merge branch '0.7.7-softclip' into layout
Conflicts:
	Makefile
	bwamem.h
	fastmap.c
	main.c
2014-04-24 12:24:49 -04:00
Heng Li 8c12ec4a4b r725: optionally disable hard clipping
as is reqested by the cancer group
2014-04-24 11:56:43 -04:00
Heng Li b93fca2b2e r723: merge adjacent hits 2014-04-16 16:38:50 -04:00
Heng Li 48847af2fc code backup 2014-04-16 12:00:13 -04:00
Heng Li 00a07f61bf r721: merge overlapping hits by default 2014-04-15 16:16:04 -04:00
Heng Li 45f24b4ae8 r720: improved overlap hit merging 2014-04-15 16:09:42 -04:00
Heng Li bdb7b000cd r719: more stringent overlap merge
Will consider to make it the default
2014-04-15 14:52:17 -04:00
Heng Li 4e22270eba r718: merge alnregs overlapping on both query/ref 2014-04-14 17:01:17 -04:00
Heng Li 6d4a6debdc r716: changed -x pbread 2014-04-14 16:04:29 -04:00
Heng Li bbcabfe342 r707: change params for pacbio-to-pacbio 2014-04-10 21:53:52 -04:00
Heng Li 658f27eae4 Merge branch 'dev' into layout
Conflicts:
	main.c
2014-04-10 21:48:47 -04:00
Heng Li 6fda93502f r705: pairing performed on one chr only
Change of versioning: the revision number is acquired with:

  git rev-list --all --count

This counts the total number of commits across all branches.
2014-04-10 21:38:14 -04:00
Heng Li db4b171fa6 Merge branch 'dev' into layout
Conflicts:
	main.c
2014-04-10 21:09:47 -04:00
Heng Li 07182d9061 dev-475: -F outputs unit score, not raw score 2014-04-10 21:09:06 -04:00
Heng Li 7d25fe2de3 Merge branch 'dev' into layout
Conflicts:
	main.c
2014-04-10 21:07:16 -04:00
Heng Li e80bccc923 dev-474: fixed a typo 2014-04-10 21:04:02 -04:00
Heng Li f02cd42679 dev-473: added a few assertions
to make sure the new change works as is expected
2014-04-10 21:03:13 -04:00
Heng Li 8638cfadc8 dev-472: get rid of bwa_fix_xref()
This function causes all kinds of problems when the reference genome consists
of many short reads/contigs/chromsomes. Some of the problems are nearly
unfixable at the point where bwa_fix_xref() gets called. This commit attempts
to fix the problem at the root. It disallows chains spanning multiple contigs
and never retrieves sequences bridging two adjacent contigs. Thus all the
chaining, extension, SW and global alignments are confined to on contig only.

This commit brings many changes. I have tested it on a couple examples
including Peter Field's PacBio example. It works well so far.
2014-04-10 20:54:27 -04:00
Heng Li e2d0c996e9 layout-477: output unit score, not the raw score 2014-04-10 18:03:28 -04:00
Heng Li 0eeacbbe39 Merge branch 'dev' into layout 2014-04-10 17:56:24 -04:00
Heng Li 23e0e99ec0 dev-471: fixed a compiling error from last commit 2014-04-10 11:54:17 -04:00
Heng Li ccbbe48c4f dev-470: don't stop on bwa_fix_xref2() failures
Peter Field has sent me an example caused by an alignment bridging three
adjacent chromosomes/contigs. Bwa-mem always aligns the query to the contig
covering the middle point of the alignment. In this example, it chooses the
middle contig, which should not be aligned. This leads to weird things failing
bwa_fix_xref2(), which cannot be fixed unless we build the contig boundaries
into the FM-index.

In the old code, bwa-mem halts when bwa_fix_xref2() fails. With this commit,
bwa-mem will give a warning instead of halting.
2014-04-10 11:43:17 -04:00
Heng Li 8220008564 an attempt to layout tool 2014-04-09 16:11:52 -04:00
Heng Li db58392e9b dev-469: fixed wrong command line prompt 2014-04-09 13:20:04 -04:00
Heng Li d766591c1e dev-468: fixed a segfault caused by NULL 2014-04-08 22:11:36 -04:00
Heng Li 99f6f9a0d1 dev-467: limit the max #chains to extend 2014-04-08 21:45:49 -04:00
Heng Li c0a308a8b6 dev-466: simplified chain filtering 2014-04-08 17:33:07 -04:00
Heng Li f12dfae772 dev-465: a new output format for read overlap
Also moved a few functions to bwamem_extra.c. File bwamem.c is becoming far too
long.
2014-04-08 16:29:36 -04:00
Heng Li b45aeb87e1 dev-464: preset for pacbio read2read aln 2014-04-08 11:40:54 -04:00
Heng Li 172ba83241 dev-463: added option -x to change multiple params
I hate to copy-paste long command line options.
2014-04-07 11:29:36 -04:00
Heng Li 114901b005 dev-r462: refined setting for PacBio; weight flt
The recommended setting in the last commit is wrong. If we can extend a random
seed hit to the full length, we will force the read aligned through break
points, which is wrong. The new setting is better but it may lead to a small
fraction of fragmented alignments.

In addition, I added a filter on the minimum chain weight and tied
min_HSP_score to this filter. It doubles the mapping speed.
2014-04-04 17:01:04 -04:00
Heng Li 41f720dfa7 dev-461: added a heuristic for PacBio data
See the comment above mem_test_chain_sw() for details.
2014-04-04 16:05:41 -04:00
Heng Li 066ec4aa95 dev-460: disallow a cigar 20M2D2I30M in extension
Global alignment does not allow contiguous insertions and deletions, but local
alignment and extension allow such CIGARs. The optimal global alignment may
have a lower score than extension, which actually happens often for PacBio
data. This commit disallows a CIGAR like 20M2D2I30M to fix this inconsistency.
Local alignment has not been changed.
2014-04-04 10:44:34 -04:00
Heng Li b6bd33b26c dev-459: don't hard code the drop ratio
In the old code, if a secondary alignment is 50% worse, it won't be outputted.
2014-04-03 18:58:49 -04:00
Heng Li b3225581be dev-458: simplified the smem iterator
simpler but less powful.
2014-04-03 15:23:48 -04:00
Heng Li acfe7613db dev-457: separated interval collection and seeding 2014-04-03 15:10:50 -04:00
Heng Li 3efb7c0e91 r455: release bwa-0.7.8 2014-03-31 15:27:23 -04:00
Heng Li 127c00cc96 dev-454: wording change in command line prompt 2014-03-31 12:03:27 -04:00
Heng Li b27bdf1ae0 dev-453: change of -A scales -TdBOELU
These paramemters are all proportional to -A.
2014-03-31 11:52:52 -04:00
Heng Li b7076d9023 dev-r452: allow to specify insert size at cmd
This is also very useful for debugging.
2014-03-31 11:21:03 -04:00
Heng Li 417c6d66c7 dev-r451: fixed a few bugs when -A!=1
Something is still wrong.
2014-03-31 10:52:45 -04:00
Heng Li 9ce50a4e5e dev-450: support diff ins/del penalties. NO TEST!! 2014-03-28 14:54:06 -04:00
Heng Li 578bb55c38 dev-449: unequal ins/del in global() and extend() 2014-03-28 14:15:38 -04:00
Heng Li 0c783399e8 dev-448: different ins/del penalties 2014-03-28 10:54:23 -04:00
Heng Li 2e9463ebf1 dev-r442: suppress exact full-length matches 2014-02-26 22:04:19 -05:00
Heng Li 1c19bc630f Released bwa-0.7.7-r441 2014-02-25 01:05:37 -05:00
Heng Li e879817373 r440: a condition not work due to a typo 2014-02-20 13:06:40 -05:00
Heng Li ce026a07fc r439: expose mem_opt_t::max_matesw 2014-02-19 13:10:33 -05:00
Heng Li 17fb85a227 r438: still an issue in MD
It occurs when the global alignment disagrees with the local alignment.
2014-02-19 11:31:54 -05:00
Heng Li 52391a9855 r437: print timing for each batch of reads 2014-02-19 10:54:26 -05:00
Heng Li bdd14d2946 r436: fix rare MD/NM-CIGAR inconsistencies 2014-02-19 10:08:43 -05:00
Heng Li 4adc34eccb r435: bugfix - base not complemented on the rev 2014-02-18 10:32:24 -05:00
Heng Li 14aa43cca0 r434: added the missing bwasw/aln commands! 2014-02-12 15:39:02 -05:00
Heng Li 7c50bad567 Release bwa-0.7.6a-r433 2014-01-31 12:58:21 -05:00
Heng Li 5fdab3ae13 Released bwa-0.7.6-r432 2014-01-31 11:12:59 -05:00
Heng Li f524c7d3d8 r431: added the MD tag to bwa-mem 2014-01-29 12:05:11 -05:00
Heng Li ea3dc2f003 r430: fix a bug producing incorrect alignment
Ksw uses two rounds of SSE2-SW to find the boundaries of an alignment. If the
second round gives a different score from the first round, it will fail. The
fix checks if this happens, though I have not dig into an example to understand
why this may happen in the first place.
2014-01-29 10:51:02 -05:00
Bradford Powell c26ba4e376 fix duplicate PG lines in bwape and bwase 2014-01-05 14:54:48 -05:00
Heng Li 10cb6b0507 r428: allow to change the default chain_drop_ratio 2013-12-30 16:18:45 -05:00
Heng Li f70d80a5a2 r427: fixed bugs in backtrack
See comments in ksw_global() for details.
2013-12-30 15:40:18 -05:00
Heng Li 8b6ec74907 r424: fixed a bw bug in samse/pe 2013-11-25 15:48:04 -05:00
Heng Li 4219e58623 r423: bugfix - SE hits not random 2013-11-23 09:36:26 -05:00
Heng Li 29aa855432 r422: matesw hits not sorted 2013-11-21 14:43:50 -05:00
Heng Li ff4762f3c7 r421: bw doubling in the final alignment
In some cases, the band width used in the final alignment needs to be larger
than the band width in extension.
2013-11-20 10:04:16 -05:00
Heng Li 6e3fa0515a r420: inferred bandwidth is not used in the final 2013-11-20 09:50:46 -05:00
Heng Li ff6faf811a r419: print the @PG line 2013-11-19 11:08:45 -05:00
Heng Li deb19593aa r418: use the new mapQ estimator by default 2013-11-02 12:25:53 -04:00
Heng Li c564653b40 r416: removed a line of debugging code 2013-09-12 10:41:43 -04:00
Heng Li 7144a0cefc r415: bug in the new (optional) mapQ computation
I may use the new method as the default. Testing needed.
2013-09-09 17:51:05 -04:00
Heng Li ebb7b02e9b r414: fixed a bug caused by the last commit 2013-09-09 16:57:55 -04:00
Heng Li b51a66e4c1 r413: fixed an issue causing redundant alignment
I have seen a fosmid aligned to the same position but with two slightly
different CIGARs: 30000M and 29900M50D100M, possibly caused by tandem repeats.
0.7.5a will regard them as two distinct alignments and generates a very small
mapping quality. However, these two are essentially the same. Although there is
ambiguity in aligning the end of the fosmid, we should not penalize the entire
alignment with a small mapQ. This commit fixes this issue. More testing is
needed, though.
2013-09-09 11:36:50 -04:00
Heng Li 1346f03ff1 use the old mapQ by default
the new mapQ overestimate
2013-09-06 14:04:41 -04:00
Heng Li ed78df9184 Merge branch 'master' into clip2 2013-08-28 16:00:34 -04:00
Heng Li 3b84c03c1e r406: allow to use diff clipping penalties
for 5'-end or for 3'-end
2013-08-28 15:59:05 -04:00
John Marshall b88718d8f4 Reformat note for 80 columns, and fix typo 2013-06-14 14:03:08 +01:00
Heng Li 7ec8b5c9e7 Release bwa-0.7.5a 2013-05-30 16:20:16 -04:00
Heng Li ef18cb91cb Release bwa-0.7.5-r404 2013-05-29 11:49:08 -04:00
Heng Li 73619754f8 r401: bugfix - forgot to change sampe
some changes to samse should also be applied to sampe
2013-05-27 22:24:35 -04:00
Heng Li 599e840779 r397: multi changes/bugfixes to bwa-backtrack
1. Check .sai versioning
2. Keep track of #ins and #del during backtrack
3. Use info above to get accurate aligned regions; don't call SW extension any more
4. Identify alignment crossing the for-rev boundary
5. Fixed a bug in printing the XA tag: ungapped alignments missing
2013-05-24 16:28:18 -04:00
Heng Li bde5005f39 r396: er... the new tag is named SA not SP 2013-05-23 12:48:18 -04:00
Heng Li 3d2450ed97 r395: bugfix - hard clipping not applied on revaln 2013-05-23 12:45:14 -04:00
Heng Li 9441bb7f2a r394: added future plan 2013-05-22 20:02:53 -04:00
Heng Li 9a6abe51b6 r391: better method to resolve xref alignment
The old method does not work when the alignment bridges three chr. This may
actually happen often. The new method does not work all the time, either, but
should be better than the old one. It is also simpler, arguably.
2013-05-22 18:57:51 -04:00
Rob Davies e88529687f Merge branch 'master' into master_fixes. Merged up to r389.
Conflicts:
	bwamem.c
	kopen.c
2013-04-29 12:09:30 +01:00
Heng Li 1a2bd2cf91 r389: return non-zero upon errors 2013-04-27 10:08:01 -04:00
Heng Li 19cb7cd7ed r388: cleanup mem_process_seqs() interface
Print output outside the function and allow to feed insert size distribution.
2013-04-26 12:31:18 -04:00
Heng Li 8896cb942e r386: bugfix - samse/pe segfault
This happens when a read is aligned across the forward-reverse boundary.
2013-04-24 16:00:02 -04:00
Rob Davies b3d0a13b32 Merge branch 'master' into master_fixes. Merged up to release bwa-0.7.4-r385. 2013-04-23 17:31:34 +01:00
Heng Li c14aaad1ce Released bwa-0.7.4-r385 2013-04-23 11:40:56 -04:00
Heng Li 2f6897c72b r384: don't compile bwamem-lite by default 2013-04-23 11:27:30 -04:00
Heng Li 78ed00021f r384: updated NEWS 2013-04-23 11:25:46 -04:00
Rob Davies 4cb5110d03 Merge branch 'master' into master_fixes 2013-04-22 09:51:07 +01:00
Heng Li f6ae0d4d0f r382: similar treatment in bwa-sw (see r381) 2013-04-19 17:52:06 -04:00
Heng Li 3f8caef33c r381: fixed a bug when upper bound < max read len 2013-04-19 17:44:35 -04:00
Heng Li db7a98636f r380: er... another compiling error 2013-04-19 12:04:44 -04:00
Heng Li f0c94d80d1 r379: fixed compiling error 2013-04-19 12:04:00 -04:00
Heng Li be11e27e12 r378: bugfix - wrong CIGAR
This is actually caused by a bug in SSE2-SW, where the query begin may be
smaller than the true one if there is an exact tandem repeat.
2013-04-19 12:00:37 -04:00
Heng Li 2087dc162f r377: increased unpaired penalty from 9 to 17
This leads to more aggressive pairing - more properly paired reads. I have
found a few cases where, for example, read1 is umambiguously mapped to chr20
while its 100bp mate has a perfect match to another chr but has 3 mismatches
and 1 deletion when it is paired with read1 on chr20. With longer reads, it
seems that the chr20 hit is correct, although it is not obvious how this
happened in evolution.
2013-04-17 16:50:20 -04:00
Rob Davies 3dd10bd7db Merge branch 'master' into master_fixes 2013-04-12 16:20:13 +01:00
Rob Davies 90ecd344ba Merge branch 'master' into master_fixes. Merged up to master r375.
Conflicts:
	bwt.c
2013-04-11 11:15:39 +01:00
Heng Li 499cf4c00d r376: reduce wasteful seed extension
mainly for contig alignment
2013-04-10 12:18:56 -04:00
Heng Li 47520134e7 r375: fixed compiling errors by the last change 2013-04-10 11:04:32 -04:00
Heng Li 3d8a8c1e37 r374: fix - clipping penalty not always working
This only happens to gaps where mem underestimates the bandwidth without
considering the clipping penalty.
2013-04-10 01:09:37 -04:00
Heng Li 53bb846407 r373: optionally distable mate rescue 2013-04-09 16:13:55 -04:00
Heng Li d64eaa851d fixed an issue caused by a Mac/Darwin bug
On Mac/Darwin, it is not possible to read >2GB data with one fread().
2013-04-09 15:17:04 -04:00
Heng Li d7ca0885eb r371: extend overlapping seeds
to avoid misalignment in tandem repeats
2013-04-04 00:43:43 -04:00
Heng Li 1e118e0823 r370: suppress "D" at the end of a cigar
This is caused by seeds in tandem repeats, in which case, bwa-mem may not
extend the true seed. The change in this commit is only a temporary cure.
2013-04-03 23:57:19 -04:00
Rob Davies c89756e2b0 Merge branch 'master' into master_fixes 2013-03-19 12:11:51 +00:00
Heng Li 8437cd4edd r369: bugfix - segfault caused by the last change
Sigh... Even the simplest change can lead to new bugs.
2013-03-19 01:04:57 -04:00
Heng Li 1e3cadbfc2 r368: bugfix - wrong CIGAR when bridging 3 contigs
In this case, bwa_fix_xref() will return insane coordinates. The old version
did not check the return status and write wrong CIGAR. This bug only happen to
very short assembly contigs.
2013-03-18 20:49:32 -04:00
Rob Davies c862a1a396 Merge branch 'master' into master_fixes 2013-03-18 13:35:12 +00:00
Heng Li 9346acde1b Release bwa-0.7.3a-r367
In 0.7.3, the wrong CIGAR bug was only fixed in one scenario, but not fixed
in another corner case.
2013-03-15 21:26:37 -04:00
Heng Li 7dec00c217 Release BWA-0.7.3-r366 2013-03-15 12:51:53 -04:00
Heng Li dd51177837 r365: bugfix - wrong alignment (right mapping)
The bug only happens when there is a 1bp del and 1bp ins which are close to the
end and there are no other substitutions or indels. In this case, bwa mem gave
a wrong band width.
2013-03-15 11:59:05 -04:00
Heng Li e5355fe3a0 r364: bug in mem pairing (no effect with -A=1)
Forgot to adjust for matching score. This bug has no effect when -A takes the
default value.
2013-03-14 22:01:26 -04:00
Rob Davies cca27c1ef5 Merge branch 'master' into master_fixes
Conflicts:
	bwamem.c
	bwamem_pair.c
	example.c
2013-03-13 12:12:28 +00:00
Heng Li bdf34f6ce7 r363: XA=>XP; output mapQ in XP
In BWA, XA gives hits "shadowed" by the primary hit. In BWA-MEM, we output
primary hits only. Primary hits may have non-zero mapping quality.
2013-03-12 09:56:04 -04:00
Heng Li c29b176cb6 r362: bugfix - occasionally wrong TLEN
Use the 0.7.2 way to compute TLEN
2013-03-12 00:14:36 -04:00
Heng Li aa7cdf4bb3 r361: flag proper pair even if multi-primary
Up to here, all the features in my checklist have been implemented.
2013-03-12 00:00:04 -04:00
Heng Li dab5b17c1a r360: output alternative primary alignments in XA 2013-03-11 23:43:58 -04:00
Heng Li 6c665189ad r359: identical output to 0.7.2 (without -a) 2013-03-11 23:16:18 -04:00
Rob Davies 9228e48efd Merge branch 'master' into master_fixes
Conflicts:
	Makefile
2013-03-11 13:50:49 +00:00
Heng Li 5581cb9152 Release bwa-0.7.2-r351
For the TLEN sign fix. Sorry for the significant bug in 0.7.0/0.7.1
2013-03-09 18:15:41 -05:00
Heng Li 2d01a297fb Improving 'properly paired' flag.
If one end has a low quality tail that happens to have a score-20 hit,
the pair won't be flagged as properly paired because bwa-mem thought it has
multiple hits. By filtering with -T, we won't have this problem.
2013-03-09 18:05:50 -05:00