Commit Graph

88 Commits (1cadfa15520ae611c8324ee8528a28dea1f33f8d)

Author SHA1 Message Date
Heng Li 1cadfa1552 r338: pemerge - fixed memory leaks; multithreading
pemerge is actually quite slow.
2013-03-07 11:14:52 -05:00
Heng Li 3e3236dfc4 r337: mem - always read even number of reads
In the old code, we may read odd number of reads from an interleaved fastq.
2013-03-07 11:00:15 -05:00
Heng Li 72817b664e r336: fine tuning pemerge 2013-03-06 23:38:07 -05:00
Heng Li 557d50c7e1 r335: fixed a compiling error
Caused by the last change
2013-03-06 21:57:13 -05:00
Heng Li 042e1f4442 r334: added pemerge to bwa 2013-03-06 21:55:02 -05:00
Heng Li 5fbd454682 r332: added output threshold
Otherwise there are far too many short hits
2013-03-05 22:49:38 -05:00
Heng Li 6476343a83 r331: rewrote CIGAR generation for bwa-short
When backtracking, bwa-short does not keep the detailed alignment or the exact
start and end positions. To find the boundary and the CIGAR, the old code does
a global alignment with a small end-gap penalty. It then deals with a lot of
special cases to derive the right position and CIGAR, which are actually not
always right. It is a mess.

As the new ksw.{c,h} does not support a different end-gap penalty, the old
strategy does not work. But we get something better. The new code finds the
boundaries with ksw_extend(). It is cleaner and gives more accurate CIGAR in
most cases.
2013-03-05 19:56:37 -05:00
Heng Li 98f8966750 r329: ditch stdaln.{c,h}; no changes to bwa-mem
stdaln.{c,h} was written ten years ago. Its local and SW extension code are
actually buggy (though that rarely happens and usually does not affect the
results too much). ksw.{c,h} is more concise, potentially faster, less buggy,
and richer in features.
2013-03-05 12:00:24 -05:00
Heng Li efd9769b07 r324: a little code cleanup
The changes after r317 aim to improve the performance and accuracy for very
long query alignment. The short-read alignment should not be affected. The
changes include:

1) Z-dropoff. This is a variant of blast's X-dropoff. I orginally thought this
   heuristic only improves speed, but now I realize it also reduces poor
   alignment with long good flanking alignments. The difference from blast's
   X-dropoff is that Z-dropoff allows big gaps, but X-dropoff does not.

2) Band width doubling. When band width is too small, we will get a poor
   alignment in the middle. Sometimes such alignments cannot be fully excluded
   with Z-dropoff. Band width doubling is an alternative heuristic. It is based
   on the observation that the existing of close-to-boundary high score
   possibly implies inadequate band width. When we see such a signal, we double
   the band width.
2013-03-05 00:57:16 -05:00
Heng Li e0991d6a45 r323: added Z-dropoff, a variant of blast's X-drop 2013-03-05 00:34:33 -05:00
Heng Li 733410b50d r320: speed up very long sequence alignment
100-200bp read alignment should not be affected at all.
2013-03-04 14:43:49 -05:00
Heng Li 7e00dbcac5 r317: bugfix - out-of-range extension
This happens when target region crosses the forward-reverse boundary. This will
almost never happen to short-read alignment.
2013-03-04 11:35:23 -05:00
Heng Li d35f33b513 r316: don't allocate zero-length memory
It is not a bug, but Electric Fence does not like that.
2013-03-04 10:22:18 -05:00
Heng Li 35fb7f9fdf r315: move kopen.o out of libbwa.a 2013-03-01 11:47:51 -05:00
Heng Li 3e4a178e08 r314: cleanup bwamem API
Don't modify input sequences; more documentations
2013-03-01 11:14:51 -05:00
Heng Li c5434ac865 r313: release bwa-0.7.0 2013-02-28 15:56:05 -05:00
Heng Li f3cff1c609 r311: even tighter bw for CIGAR 2013-02-27 23:59:50 -05:00
Heng Li 6a4d8c79d8 r309: bugfix - soft clipping missing in example.c 2013-02-27 22:45:18 -05:00
Heng Li df7c3f0000 r308: added a new API to convert region to CIGAR
and an example program demonstrating how to do single-end alignment in <50
lines of C code.
2013-02-27 22:28:29 -05:00
Heng Li 4bb0bdddca r306: introduce clipping penalty
More clipping leads to more severe reference bias. We should not clip the
alignment unless necessary.
2013-02-27 21:13:39 -05:00
Heng Li 292e92b602 r303: bugfix - wrong band width when CIGAR 2013-02-27 15:39:15 -05:00
Heng Li e620f0ff4e r302: updated the manpage 2013-02-27 13:16:22 -05:00
Heng Li b621d3ae38 r301: left-align indels
Don't know why the change is working...
2013-02-27 00:42:19 -05:00
Heng Li 65e099df34 r300: fixed an out-of-boundary bug in rare case 2013-02-27 00:37:17 -05:00
Heng Li 0b533385ef r299: better way to exclude seed 2013-02-27 00:29:11 -05:00
Heng Li acd1ab607b r297: reduce wasteful SW extension
This is particularly important for long sequences
2013-02-26 16:26:46 -05:00
Heng Li 98787f0ae0 r295: generate NM 2013-02-26 13:36:01 -05:00
Heng Li 32f2d60a2e r294: bugfix - -M not working 2013-02-26 13:14:33 -05:00
Heng Li 619ac4f93d r293: bugfix - wrong RG type in SAM output 2013-02-26 13:03:35 -05:00
Heng Li c6b226d719 r292: fixed a very stupid bug on CLI
I was thinking 0x10 or 16, but wrote 0x16...
2013-02-26 12:49:48 -05:00
Heng Li bfb2583d7f r291: summary - bwt.c micro optimization 2013-02-26 12:10:19 -05:00
Heng Li e70c7c2a71 r284: amend cross-reference hit
I really hate this: complex and twisted logic for a nasty scenario that almost
never happens to short reads - but it may become serious when the reference
genome consists of many contigs.

On toy examples, the code seems to work. Don't know if it really works...
2013-02-26 00:03:49 -05:00
Heng Li 61dd3bf13a r283: prepare for fixing cross-ref aln 2013-02-25 22:49:15 -05:00
Heng Li 77b5b586ad r282: set min split_len to read length 2013-02-25 17:29:35 -05:00
Heng Li 30cc8a95d1 fixed an unimportant memory leak 2013-02-25 16:34:19 -05:00
Heng Li d19e834d84 r280: align two ends in the same thread
Otherwise odd-number threads may be of different speed from even-number threads.
2013-02-25 15:40:15 -05:00
Heng Li 20aa848b3c r279: for PE mapq, consider the number of pairs
If there are a lot of proper pairs, it is more likely that the best pair is
wrong.
2013-02-25 13:00:35 -05:00
Heng Li 9957e04590 r278: don't perform too many mate-sw 2013-02-25 11:56:02 -05:00
Heng Li e9e5ee6a3d r277: updated the revision number 2013-02-25 11:34:06 -05:00
Heng Li 0b4a40dc25 updated revision number; to merge into master 2013-02-24 13:34:20 -05:00
Heng Li 545fb87feb removed another part related to color-space 2013-02-22 17:15:57 -05:00
Heng Li 6ad5a3c086 removed color-space support
which has been broken since 0.6.x
2013-02-12 10:21:17 -05:00
Heng Li 91debf412b move smem iterators to bwamem.{c,h} 2013-01-31 13:59:48 -05:00
Heng Li 292f9061ab r132: optionally copy FASTA/Q comment to SAM 2012-10-26 12:54:32 -04:00
Heng Li 3abfd0743a r131: r128 plus remote changes 2012-06-28 14:52:18 -04:00
Heng Li f44edd4fc9 r128: more conservative chaining filter 2012-06-28 14:51:02 -04:00
Heng Li 09ee115dcc r126: release bwa-0.6.2 2012-06-19 13:29:44 -04:00
Heng Li 29ed2d8287 rename the "api" branch as "master" 2012-06-19 13:13:29 -04:00
Heng Li d97ff6bf72 r124: updated version number 2012-04-17 20:45:07 -04:00
Heng Li 790df95e1a updated revision number 2012-04-02 11:43:32 -04:00