Commit Graph

347 Commits (dd7db7beb687d4ec3d0c06f815167cfeaa21a22c)

Author SHA1 Message Date
Heng Li ef18cb91cb Release bwa-0.7.5-r404 2013-05-29 11:49:08 -04:00
Heng Li 73619754f8 r401: bugfix - forgot to change sampe
some changes to samse should also be applied to sampe
2013-05-27 22:24:35 -04:00
Heng Li 599e840779 r397: multi changes/bugfixes to bwa-backtrack
1. Check .sai versioning
2. Keep track of #ins and #del during backtrack
3. Use info above to get accurate aligned regions; don't call SW extension any more
4. Identify alignment crossing the for-rev boundary
5. Fixed a bug in printing the XA tag: ungapped alignments missing
2013-05-24 16:28:18 -04:00
Heng Li bde5005f39 r396: er... the new tag is named SA not SP 2013-05-23 12:48:18 -04:00
Heng Li 3d2450ed97 r395: bugfix - hard clipping not applied on revaln 2013-05-23 12:45:14 -04:00
Heng Li 9441bb7f2a r394: added future plan 2013-05-22 20:02:53 -04:00
Heng Li 9a6abe51b6 r391: better method to resolve xref alignment
The old method does not work when the alignment bridges three chr. This may
actually happen often. The new method does not work all the time, either, but
should be better than the old one. It is also simpler, arguably.
2013-05-22 18:57:51 -04:00
Rob Davies e88529687f Merge branch 'master' into master_fixes. Merged up to r389.
Conflicts:
	bwamem.c
	kopen.c
2013-04-29 12:09:30 +01:00
Heng Li 1a2bd2cf91 r389: return non-zero upon errors 2013-04-27 10:08:01 -04:00
Heng Li 19cb7cd7ed r388: cleanup mem_process_seqs() interface
Print output outside the function and allow to feed insert size distribution.
2013-04-26 12:31:18 -04:00
Heng Li 8896cb942e r386: bugfix - samse/pe segfault
This happens when a read is aligned across the forward-reverse boundary.
2013-04-24 16:00:02 -04:00
Rob Davies b3d0a13b32 Merge branch 'master' into master_fixes. Merged up to release bwa-0.7.4-r385. 2013-04-23 17:31:34 +01:00
Heng Li c14aaad1ce Released bwa-0.7.4-r385 2013-04-23 11:40:56 -04:00
Heng Li 2f6897c72b r384: don't compile bwamem-lite by default 2013-04-23 11:27:30 -04:00
Heng Li 78ed00021f r384: updated NEWS 2013-04-23 11:25:46 -04:00
Rob Davies 4cb5110d03 Merge branch 'master' into master_fixes 2013-04-22 09:51:07 +01:00
Heng Li f6ae0d4d0f r382: similar treatment in bwa-sw (see r381) 2013-04-19 17:52:06 -04:00
Heng Li 3f8caef33c r381: fixed a bug when upper bound < max read len 2013-04-19 17:44:35 -04:00
Heng Li db7a98636f r380: er... another compiling error 2013-04-19 12:04:44 -04:00
Heng Li f0c94d80d1 r379: fixed compiling error 2013-04-19 12:04:00 -04:00
Heng Li be11e27e12 r378: bugfix - wrong CIGAR
This is actually caused by a bug in SSE2-SW, where the query begin may be
smaller than the true one if there is an exact tandem repeat.
2013-04-19 12:00:37 -04:00
Heng Li 2087dc162f r377: increased unpaired penalty from 9 to 17
This leads to more aggressive pairing - more properly paired reads. I have
found a few cases where, for example, read1 is umambiguously mapped to chr20
while its 100bp mate has a perfect match to another chr but has 3 mismatches
and 1 deletion when it is paired with read1 on chr20. With longer reads, it
seems that the chr20 hit is correct, although it is not obvious how this
happened in evolution.
2013-04-17 16:50:20 -04:00
Rob Davies 3dd10bd7db Merge branch 'master' into master_fixes 2013-04-12 16:20:13 +01:00
Rob Davies 90ecd344ba Merge branch 'master' into master_fixes. Merged up to master r375.
Conflicts:
	bwt.c
2013-04-11 11:15:39 +01:00
Heng Li 499cf4c00d r376: reduce wasteful seed extension
mainly for contig alignment
2013-04-10 12:18:56 -04:00
Heng Li 47520134e7 r375: fixed compiling errors by the last change 2013-04-10 11:04:32 -04:00
Heng Li 3d8a8c1e37 r374: fix - clipping penalty not always working
This only happens to gaps where mem underestimates the bandwidth without
considering the clipping penalty.
2013-04-10 01:09:37 -04:00
Heng Li 53bb846407 r373: optionally distable mate rescue 2013-04-09 16:13:55 -04:00
Heng Li d64eaa851d fixed an issue caused by a Mac/Darwin bug
On Mac/Darwin, it is not possible to read >2GB data with one fread().
2013-04-09 15:17:04 -04:00
Heng Li d7ca0885eb r371: extend overlapping seeds
to avoid misalignment in tandem repeats
2013-04-04 00:43:43 -04:00
Heng Li 1e118e0823 r370: suppress "D" at the end of a cigar
This is caused by seeds in tandem repeats, in which case, bwa-mem may not
extend the true seed. The change in this commit is only a temporary cure.
2013-04-03 23:57:19 -04:00
Rob Davies c89756e2b0 Merge branch 'master' into master_fixes 2013-03-19 12:11:51 +00:00
Heng Li 8437cd4edd r369: bugfix - segfault caused by the last change
Sigh... Even the simplest change can lead to new bugs.
2013-03-19 01:04:57 -04:00
Heng Li 1e3cadbfc2 r368: bugfix - wrong CIGAR when bridging 3 contigs
In this case, bwa_fix_xref() will return insane coordinates. The old version
did not check the return status and write wrong CIGAR. This bug only happen to
very short assembly contigs.
2013-03-18 20:49:32 -04:00
Rob Davies c862a1a396 Merge branch 'master' into master_fixes 2013-03-18 13:35:12 +00:00
Heng Li 9346acde1b Release bwa-0.7.3a-r367
In 0.7.3, the wrong CIGAR bug was only fixed in one scenario, but not fixed
in another corner case.
2013-03-15 21:26:37 -04:00
Heng Li 7dec00c217 Release BWA-0.7.3-r366 2013-03-15 12:51:53 -04:00
Heng Li dd51177837 r365: bugfix - wrong alignment (right mapping)
The bug only happens when there is a 1bp del and 1bp ins which are close to the
end and there are no other substitutions or indels. In this case, bwa mem gave
a wrong band width.
2013-03-15 11:59:05 -04:00
Heng Li e5355fe3a0 r364: bug in mem pairing (no effect with -A=1)
Forgot to adjust for matching score. This bug has no effect when -A takes the
default value.
2013-03-14 22:01:26 -04:00
Rob Davies cca27c1ef5 Merge branch 'master' into master_fixes
Conflicts:
	bwamem.c
	bwamem_pair.c
	example.c
2013-03-13 12:12:28 +00:00
Heng Li bdf34f6ce7 r363: XA=>XP; output mapQ in XP
In BWA, XA gives hits "shadowed" by the primary hit. In BWA-MEM, we output
primary hits only. Primary hits may have non-zero mapping quality.
2013-03-12 09:56:04 -04:00
Heng Li c29b176cb6 r362: bugfix - occasionally wrong TLEN
Use the 0.7.2 way to compute TLEN
2013-03-12 00:14:36 -04:00
Heng Li aa7cdf4bb3 r361: flag proper pair even if multi-primary
Up to here, all the features in my checklist have been implemented.
2013-03-12 00:00:04 -04:00
Heng Li dab5b17c1a r360: output alternative primary alignments in XA 2013-03-11 23:43:58 -04:00
Heng Li 6c665189ad r359: identical output to 0.7.2 (without -a) 2013-03-11 23:16:18 -04:00
Rob Davies 9228e48efd Merge branch 'master' into master_fixes
Conflicts:
	Makefile
2013-03-11 13:50:49 +00:00
Heng Li 5581cb9152 Release bwa-0.7.2-r351
For the TLEN sign fix. Sorry for the significant bug in 0.7.0/0.7.1
2013-03-09 18:15:41 -05:00
Heng Li 2d01a297fb Improving 'properly paired' flag.
If one end has a low quality tail that happens to have a score-20 hit,
the pair won't be flagged as properly paired because bwa-mem thought it has
multiple hits. By filtering with -T, we won't have this problem.
2013-03-09 18:05:50 -05:00
Heng Li 1d132a546d Release 0.7.1-r347 2013-03-08 15:30:06 -05:00
Heng Li 66c9783daf r345: bugfix in mem - wrong mate strand for unmap
Received a clean bill from Picard
2013-03-08 13:15:43 -05:00
Heng Li 274c0ac96c r343: bugfix in mem - wrong mate info for unmap
SAM generation is always among the nastiest bits. I would need to refactor at
some point (hardly happening).
2013-03-08 12:40:31 -05:00
Heng Li 017be45407 r342: bugfix in bwasw - AS is off by one
but I do not understand why the old code does not have the same problem.
2013-03-08 12:06:45 -05:00
Rob Davies aabd990e8f Merge branch 'master' into master_fixes
Conflicts:
	Makefile
	bwape.c
	bwase.c
	bwtsw2_aux.c
	stdaln.c
2013-03-08 16:46:45 +00:00
Heng Li b5b50ac8da r341: bugfix - wrong mate position
when one end is mapped with a score less than -T. Caused by the -T option.
2013-03-07 21:35:57 -05:00
Heng Li b0a76884e8 r340: feature freeze; updated the manpage
I will stop adding new features to bwa and prepare for the next release. I will
briefly evaluate the variant calling accuracy before the release.
2013-03-07 11:51:23 -05:00
Heng Li 503ca9ed2e r339: pemerge - expose some settings to CLI 2013-03-07 11:22:19 -05:00
Heng Li 1cadfa1552 r338: pemerge - fixed memory leaks; multithreading
pemerge is actually quite slow.
2013-03-07 11:14:52 -05:00
Heng Li 3e3236dfc4 r337: mem - always read even number of reads
In the old code, we may read odd number of reads from an interleaved fastq.
2013-03-07 11:00:15 -05:00
Heng Li 72817b664e r336: fine tuning pemerge 2013-03-06 23:38:07 -05:00
Heng Li 557d50c7e1 r335: fixed a compiling error
Caused by the last change
2013-03-06 21:57:13 -05:00
Heng Li 042e1f4442 r334: added pemerge to bwa 2013-03-06 21:55:02 -05:00
Heng Li 5fbd454682 r332: added output threshold
Otherwise there are far too many short hits
2013-03-05 22:49:38 -05:00
Heng Li 6476343a83 r331: rewrote CIGAR generation for bwa-short
When backtracking, bwa-short does not keep the detailed alignment or the exact
start and end positions. To find the boundary and the CIGAR, the old code does
a global alignment with a small end-gap penalty. It then deals with a lot of
special cases to derive the right position and CIGAR, which are actually not
always right. It is a mess.

As the new ksw.{c,h} does not support a different end-gap penalty, the old
strategy does not work. But we get something better. The new code finds the
boundaries with ksw_extend(). It is cleaner and gives more accurate CIGAR in
most cases.
2013-03-05 19:56:37 -05:00
Heng Li 98f8966750 r329: ditch stdaln.{c,h}; no changes to bwa-mem
stdaln.{c,h} was written ten years ago. Its local and SW extension code are
actually buggy (though that rarely happens and usually does not affect the
results too much). ksw.{c,h} is more concise, potentially faster, less buggy,
and richer in features.
2013-03-05 12:00:24 -05:00
Rob Davies 8a078cc16d Merge branch 'master' into master_fixes
Conflicts:
	bntseq.c
	bwamem.c
2013-03-05 10:21:07 +00:00
Heng Li efd9769b07 r324: a little code cleanup
The changes after r317 aim to improve the performance and accuracy for very
long query alignment. The short-read alignment should not be affected. The
changes include:

1) Z-dropoff. This is a variant of blast's X-dropoff. I orginally thought this
   heuristic only improves speed, but now I realize it also reduces poor
   alignment with long good flanking alignments. The difference from blast's
   X-dropoff is that Z-dropoff allows big gaps, but X-dropoff does not.

2) Band width doubling. When band width is too small, we will get a poor
   alignment in the middle. Sometimes such alignments cannot be fully excluded
   with Z-dropoff. Band width doubling is an alternative heuristic. It is based
   on the observation that the existing of close-to-boundary high score
   possibly implies inadequate band width. When we see such a signal, we double
   the band width.
2013-03-05 00:57:16 -05:00
Heng Li e0991d6a45 r323: added Z-dropoff, a variant of blast's X-drop 2013-03-05 00:34:33 -05:00
Heng Li 733410b50d r320: speed up very long sequence alignment
100-200bp read alignment should not be affected at all.
2013-03-04 14:43:49 -05:00
Heng Li 7e00dbcac5 r317: bugfix - out-of-range extension
This happens when target region crosses the forward-reverse boundary. This will
almost never happen to short-read alignment.
2013-03-04 11:35:23 -05:00
Heng Li d35f33b513 r316: don't allocate zero-length memory
It is not a bug, but Electric Fence does not like that.
2013-03-04 10:22:18 -05:00
Heng Li 35fb7f9fdf r315: move kopen.o out of libbwa.a 2013-03-01 11:47:51 -05:00
Heng Li 3e4a178e08 r314: cleanup bwamem API
Don't modify input sequences; more documentations
2013-03-01 11:14:51 -05:00
Rob Davies 6beab5f765 Merge branch 'master' into master_fixes
Merge changes to commit c5434ac (0.7.0 release)

Conflicts:
	Makefile
	bwamem.c
2013-03-01 10:22:49 +00:00
Rob Davies 3d33ab063e Merge branch 'master' into master_fixes
Merged to master version b621d3a

Conflicts:
	Makefile
	bntseq.c
	bwa.c
	bwase.c
	bwaseqio.c
	bwtaln.c
	bwtindex.c
	bwtio.c
	bwtmisc.c
	bwtsw2_aux.c
	cs2nt.c
	fastmap.c
	khash.h
	kseq.h
	ksw.c
	kvec.h
	simple_dp.c
	utils.c
	utils.h
2013-03-01 09:37:46 +00:00
Heng Li c5434ac865 r313: release bwa-0.7.0 2013-02-28 15:56:05 -05:00
Heng Li f3cff1c609 r311: even tighter bw for CIGAR 2013-02-27 23:59:50 -05:00
Heng Li 6a4d8c79d8 r309: bugfix - soft clipping missing in example.c 2013-02-27 22:45:18 -05:00
Heng Li df7c3f0000 r308: added a new API to convert region to CIGAR
and an example program demonstrating how to do single-end alignment in <50
lines of C code.
2013-02-27 22:28:29 -05:00
Heng Li 4bb0bdddca r306: introduce clipping penalty
More clipping leads to more severe reference bias. We should not clip the
alignment unless necessary.
2013-02-27 21:13:39 -05:00
Heng Li 292e92b602 r303: bugfix - wrong band width when CIGAR 2013-02-27 15:39:15 -05:00
Heng Li e620f0ff4e r302: updated the manpage 2013-02-27 13:16:22 -05:00
Heng Li b621d3ae38 r301: left-align indels
Don't know why the change is working...
2013-02-27 00:42:19 -05:00
Heng Li 65e099df34 r300: fixed an out-of-boundary bug in rare case 2013-02-27 00:37:17 -05:00
Heng Li 0b533385ef r299: better way to exclude seed 2013-02-27 00:29:11 -05:00
Heng Li acd1ab607b r297: reduce wasteful SW extension
This is particularly important for long sequences
2013-02-26 16:26:46 -05:00
Heng Li 98787f0ae0 r295: generate NM 2013-02-26 13:36:01 -05:00
Heng Li 32f2d60a2e r294: bugfix - -M not working 2013-02-26 13:14:33 -05:00
Heng Li 619ac4f93d r293: bugfix - wrong RG type in SAM output 2013-02-26 13:03:35 -05:00
Heng Li c6b226d719 r292: fixed a very stupid bug on CLI
I was thinking 0x10 or 16, but wrote 0x16...
2013-02-26 12:49:48 -05:00
Heng Li bfb2583d7f r291: summary - bwt.c micro optimization 2013-02-26 12:10:19 -05:00
Heng Li e70c7c2a71 r284: amend cross-reference hit
I really hate this: complex and twisted logic for a nasty scenario that almost
never happens to short reads - but it may become serious when the reference
genome consists of many contigs.

On toy examples, the code seems to work. Don't know if it really works...
2013-02-26 00:03:49 -05:00
Heng Li 61dd3bf13a r283: prepare for fixing cross-ref aln 2013-02-25 22:49:15 -05:00
Heng Li 77b5b586ad r282: set min split_len to read length 2013-02-25 17:29:35 -05:00
Heng Li 30cc8a95d1 fixed an unimportant memory leak 2013-02-25 16:34:19 -05:00
Heng Li d19e834d84 r280: align two ends in the same thread
Otherwise odd-number threads may be of different speed from even-number threads.
2013-02-25 15:40:15 -05:00
Heng Li 20aa848b3c r279: for PE mapq, consider the number of pairs
If there are a lot of proper pairs, it is more likely that the best pair is
wrong.
2013-02-25 13:00:35 -05:00
Heng Li 9957e04590 r278: don't perform too many mate-sw 2013-02-25 11:56:02 -05:00
Heng Li e9e5ee6a3d r277: updated the revision number 2013-02-25 11:34:06 -05:00
Heng Li 0b4a40dc25 updated revision number; to merge into master 2013-02-24 13:34:20 -05:00
Heng Li 545fb87feb removed another part related to color-space 2013-02-22 17:15:57 -05:00
Heng Li 6ad5a3c086 removed color-space support
which has been broken since 0.6.x
2013-02-12 10:21:17 -05:00
Heng Li 91debf412b move smem iterators to bwamem.{c,h} 2013-01-31 13:59:48 -05:00
Heng Li 292f9061ab r132: optionally copy FASTA/Q comment to SAM 2012-10-26 12:54:32 -04:00
Heng Li 3abfd0743a r131: r128 plus remote changes 2012-06-28 14:52:18 -04:00
Heng Li f44edd4fc9 r128: more conservative chaining filter 2012-06-28 14:51:02 -04:00
Heng Li 09ee115dcc r126: release bwa-0.6.2 2012-06-19 13:29:44 -04:00
Heng Li 29ed2d8287 rename the "api" branch as "master" 2012-06-19 13:13:29 -04:00
Heng Li d97ff6bf72 r124: updated version number 2012-04-17 20:45:07 -04:00
Heng Li 790df95e1a updated revision number 2012-04-02 11:43:32 -04:00
Heng Li bdc953cad9 Tim's suggestion suffix file name with .64 2012-03-29 12:22:51 -04:00
Heng Li 91a4a0c8ea Release bwa-0.6.1 2011-11-28 09:52:07 -05:00
Heng Li bf65b6463a fastmap: optionally output the original query seq 2011-11-24 19:44:21 -05:00
Heng Li b5170e0efa output the NM tag 2011-11-24 11:51:38 -05:00
Heng Li 196b50dde3 optionally mark multi-part hits as secondary 2011-11-23 23:39:59 -05:00
Heng Li 182cb2e89c use standard SW when no SSE2 2011-11-19 19:38:21 -05:00
Heng Li dc4008936c avoid duplicated XA tags 2011-11-19 14:52:47 -05:00
Heng Li 8f89f55484 fixed a segfault when there are too few good bases. 2011-11-17 22:13:38 -05:00
Heng Li 770a5f2ae0 Release BWA-0.6.0 2011-11-12 20:04:39 -05:00
Heng Li 7544aca718 updated revision number 2011-11-12 16:56:21 -05:00
Heng Li 8060693411 multithreading works again 2011-11-12 16:50:58 -05:00
Heng Li fa8cfe5567 bugfix: wrong mapping quality 2011-11-12 12:12:45 -05:00
Heng Li b42910ada6 proper mate information 2011-11-12 00:49:21 -05:00
Heng Li e06685db45 bwa-sw PE seems working (SAM is incorrect) 2011-11-07 00:51:43 -05:00
Heng Li 673ae4aaf8 throw an error if insufficient memory during index 2011-10-31 13:26:24 -04:00
Heng Li 02946df28a fixed a off-by-1 bug 2011-10-27 13:55:48 -04:00
Heng Li 7babb54e4c drop smem based mapping algorithm
While we can compute smems very efficiently, there is still a long way to get
the alignment. On simulated data, this smem-based algorithm is 4X faster than
bwasw and twice as fast as bowtie2, but the accuracy is far lower than bwasw
and even lower than bowtie2 in the high-mapQ range. I am kind of sure that if
we continue to increase the mapping accuracy, the speed will approach to bwasw,
if not slower.

Smem-based mapping algorithm is still interesting, but given that I am short of
time, I will not explore it further.
2011-10-27 10:56:09 -04:00
Heng Li 7664795ffb fixed a minor issue about +/-1 2011-10-25 13:00:41 -04:00
Heng Li 7168f5c10a updated revision number 2011-10-25 12:50:19 -04:00
Heng Li 22c2252e15 added bidirectional bwt; seems buggy 2011-10-25 00:22:28 -04:00
Heng Li 7b4266a6e5 bugfix: integer overflow and strand error in sampe 2011-10-24 17:07:12 -04:00
Heng Li b59fd2bf47 fixed an integer overflow 2011-10-24 14:39:57 -04:00
Heng Li 8f3c780552 fixed a potential int overflow 2011-10-24 14:22:39 -04:00
Heng Li 1f970b4557 updated revision number 2011-10-24 14:14:42 -04:00
Heng Li 26b77eabef updated version number 2011-10-21 12:32:00 -04:00
Heng Li 46123639cf removed reverse pac; bwa is not working right now 2011-10-20 12:09:35 -04:00
Heng Li d70754e234 update revision number 2011-10-14 10:32:31 -04:00
Heng Li 72563c38f3 automatically choose the algorithm for BWT 2011-06-09 17:33:25 -04:00
Heng Li a74523a68d increase maximum barcode length limit to 63bp 2011-06-09 17:17:13 -04:00
Heng Li 243e735431 applied patches from Alec Wysoker 2011-05-04 09:46:50 -04:00
Heng Li 87664941b0 Release bwa-0.5.9 (r16) 2011-01-24 22:00:24 -05:00
Heng Li 7fd8948689 Added recommendation for PacBio reads 2011-01-22 13:20:11 -05:00
Heng Li 1d7d8be9e8 Put BC: to both ends 2011-01-18 20:16:57 -05:00
Heng Li 51d354cd28 Added barcode support 2011-01-15 15:35:39 -05:00
Heng Li 10721ca602 Added an option to accept Illumina 1.3+ fastq 2011-01-15 14:07:08 -05:00
Heng Li f335b33624 fixed a bug in bwase: no RG for unmapped read pairs 2011-01-15 10:32:45 -05:00
Heng Li 5e30884730 Update to the latest modfication 0.5.9rc1-2. Update ChangeLog 2011-01-13 20:54:10 -05:00
Heng Li 007c3eb75d Imported from my local bwa repository, the master repository. 2011-01-13 20:52:12 -05:00