Commit Graph

52 Commits (e6c262594fcb6c48aa4f1ced3f49827cfdb5543f)

Author SHA1 Message Date
Heng Li efd9769b07 r324: a little code cleanup
The changes after r317 aim to improve the performance and accuracy for very
long query alignment. The short-read alignment should not be affected. The
changes include:

1) Z-dropoff. This is a variant of blast's X-dropoff. I orginally thought this
   heuristic only improves speed, but now I realize it also reduces poor
   alignment with long good flanking alignments. The difference from blast's
   X-dropoff is that Z-dropoff allows big gaps, but X-dropoff does not.

2) Band width doubling. When band width is too small, we will get a poor
   alignment in the middle. Sometimes such alignments cannot be fully excluded
   with Z-dropoff. Band width doubling is an alternative heuristic. It is based
   on the observation that the existing of close-to-boundary high score
   possibly implies inadequate band width. When we see such a signal, we double
   the band width.
2013-03-05 00:57:16 -05:00
Heng Li e0991d6a45 r323: added Z-dropoff, a variant of blast's X-drop 2013-03-05 00:34:33 -05:00
Heng Li 59bc9341f6 code backup; more changes coming later 2013-03-04 17:29:07 -05:00
Heng Li 35fb7f9fdf r315: move kopen.o out of libbwa.a 2013-03-01 11:47:51 -05:00
Heng Li 3e4a178e08 r314: cleanup bwamem API
Don't modify input sequences; more documentations
2013-03-01 11:14:51 -05:00
Heng Li 6a4d8c79d8 r309: bugfix - soft clipping missing in example.c 2013-02-27 22:45:18 -05:00
Heng Li df7c3f0000 r308: added a new API to convert region to CIGAR
and an example program demonstrating how to do single-end alignment in <50
lines of C code.
2013-02-27 22:28:29 -05:00
Heng Li 4bb0bdddca r306: introduce clipping penalty
More clipping leads to more severe reference bias. We should not clip the
alignment unless necessary.
2013-02-27 21:13:39 -05:00
Heng Li c6b226d719 r292: fixed a very stupid bug on CLI
I was thinking 0x10 or 16, but wrote 0x16...
2013-02-26 12:49:48 -05:00
Heng Li 20aa848b3c r279: for PE mapq, consider the number of pairs
If there are a lot of proper pairs, it is more likely that the best pair is
wrong.
2013-02-25 13:00:35 -05:00
Heng Li 9957e04590 r278: don't perform too many mate-sw 2013-02-25 11:56:02 -05:00
Heng Li 5ead86acd3 optionally mark split hit as secondary 2013-02-25 11:18:35 -05:00
Heng Li 85775c3384 output multiple hits 2013-02-24 13:23:43 -05:00
Heng Li 6bdccf2a8a added a bit documentation 2013-02-24 13:09:29 -05:00
Heng Li ee59a13109 simplified bwamem.h
Hide mem_seed_t and mem_chain_t. Don't expose unnecessary routines.
2013-02-24 12:17:29 -05:00
Heng Li e613195e17 moved some common code to bwa.{c,h} 2013-02-23 15:30:46 -05:00
Heng Li 17c123d65a pring paired-end SAM 2013-02-22 16:38:48 -05:00
Heng Li a578688fa8 generate multiple alignments from one chain 2013-02-21 14:58:51 -05:00
Heng Li 54da54ffd4 extend more seeds (and thus slower...) 2013-02-21 12:52:00 -05:00
Heng Li 5626fe29b7 Well, at least output sth 2013-02-20 19:11:44 -05:00
Heng Li 688872fb1b code backup 2013-02-19 00:50:39 -05:00
Heng Li 66585b7982 code backup 2013-02-18 16:33:06 -05:00
Heng Li df1ff2b36e better and proper way to infer orinentation 2013-02-14 12:59:32 -05:00
Heng Li 604e3d8da1 code backup; to upgrade ksw.{c,h} 2013-02-12 16:15:26 -05:00
Heng Li cd0969332f keep track of the "parent" of a secondary 2013-02-12 15:52:23 -05:00
Heng Li 22b79b3475 mark primary, instead of dropping secondary 2013-02-12 15:34:44 -05:00
Heng Li 95d18449b3 merge bseq.{h,c} to utils.{h,c}
I do not like many small files.
2013-02-12 10:36:15 -05:00
Heng Li 99907c98fb separated and improved SAM printing code
This is for the PE mode. The routines may also be useful for bwa-sw, but
probably I won't change the old code.
2013-02-11 15:29:03 -05:00
Heng Li 59eaf650ac code backup 2013-02-11 10:59:38 -05:00
Heng Li 829664d6b5 missing identical hits; improved sub_n 2013-02-08 17:55:35 -05:00
Heng Li b2c7148dc9 consider the number of suboptimal hits 2013-02-08 17:20:44 -05:00
Heng Li 39607065e0 allow more seeds to be seen (thus slower..) 2013-02-08 16:56:28 -05:00
Heng Li fdb0a7405f better dealing with microrepeat 2013-02-08 14:46:57 -05:00
Heng Li 1bf1a674a8 minor improvement to mapQ 2013-02-08 13:43:15 -05:00
Heng Li bfeb37c4de code backup 2013-02-07 13:29:01 -05:00
Heng Li 5dc398cdef start to write CLI 2013-02-07 13:13:43 -05:00
Heng Li 5a0b32bfd2 updated to the latest kseq.h 2013-02-06 14:38:40 -05:00
Heng Li a9292d674d a bit code cleanup 2013-02-06 13:59:32 -05:00
Heng Li e65b2096f7 removed useless members 2013-02-06 12:25:49 -05:00
Heng Li a61288c768 separate CIGAR generation 2013-02-05 21:49:19 -05:00
Heng Li d6a73c9171 chain filtering apparently working 2013-02-05 00:17:20 -05:00
Heng Li 9d0cdb2d3c unfinished chain filter 2013-02-04 17:23:06 -05:00
Heng Li f27bd18f20 check if every seed is included; not used for now 2013-02-04 15:09:47 -05:00
Heng Li 5bfa45a69b write the mem_aln_t struct 2013-02-04 15:02:56 -05:00
Heng Li ba18db1a9f sw extension works for the simplest case 2013-02-04 12:37:38 -05:00
Heng Li d25a87cc50 code backup 2013-02-02 15:14:24 -05:00
Heng Li 00e5302219 routine to get subsequence from 2-bit pac 2013-02-01 16:39:50 -05:00
Heng Li f8f3b7577a code cleanup; added a missing file 2013-02-01 14:38:44 -05:00
Heng Li 620ad6e5b9 reseed long SMEMs 2013-02-01 14:20:38 -05:00
Heng Li 8977737460 basic chaining working
Definitely suboptimal in a lot of corner cases...
2013-01-31 16:26:05 -05:00