Commit Graph

125 Commits (6c665189ad628a08ee421ce408d8fc25e7684878)

Author SHA1 Message Date
Heng Li 6c665189ad r359: identical output to 0.7.2 (without -a) 2013-03-11 23:16:18 -04:00
Heng Li 0f88103d2a SAM almost identical to 0.7.2 2013-03-11 23:01:51 -04:00
Heng Li 26f4c704ed drop the old SAM writer 2013-03-11 22:24:54 -04:00
Heng Li ebb45dc42e new code works for SE 2013-03-11 21:59:15 -04:00
Heng Li c7edaa8e84 to test the new sam writer... 2013-03-11 21:55:52 -04:00
Heng Li 47952b6f3f drop an unnecessary member from mem_aln_t 2013-03-11 21:35:32 -04:00
Heng Li 8f0d439913 prepare to replace the SAM printing code
This move is dangerous as SAM printing is very complex, but it will benefit in
the long run. The planned change will reduce the redundancy, improves clarity
and most importantly makes it much easier to output multiple primary hits in an
optional tag.
2013-03-11 21:25:17 -04:00
Heng Li 9ea7f83974 Emergent bugfix: wrong TLEN sign
It is interesting that Picard did not find the issue.
2013-03-09 18:03:15 -05:00
Heng Li 66c9783daf r345: bugfix in mem - wrong mate strand for unmap
Received a clean bill from Picard
2013-03-08 13:15:43 -05:00
Heng Li af7b4d8980 gcc wrongly thinks a variable may be uninitialized
It should always be initialized. To avoid a warning, made a change.
2013-03-08 12:45:50 -05:00
Heng Li 274c0ac96c r343: bugfix in mem - wrong mate info for unmap
SAM generation is always among the nastiest bits. I would need to refactor at
some point (hardly happening).
2013-03-08 12:40:31 -05:00
Heng Li 5fbd454682 r332: added output threshold
Otherwise there are far too many short hits
2013-03-05 22:49:38 -05:00
Heng Li 07921659cf move mem_fill_scmat() to bwa.{h,c} 2013-03-05 09:38:12 -05:00
Heng Li efd9769b07 r324: a little code cleanup
The changes after r317 aim to improve the performance and accuracy for very
long query alignment. The short-read alignment should not be affected. The
changes include:

1) Z-dropoff. This is a variant of blast's X-dropoff. I orginally thought this
   heuristic only improves speed, but now I realize it also reduces poor
   alignment with long good flanking alignments. The difference from blast's
   X-dropoff is that Z-dropoff allows big gaps, but X-dropoff does not.

2) Band width doubling. When band width is too small, we will get a poor
   alignment in the middle. Sometimes such alignments cannot be fully excluded
   with Z-dropoff. Band width doubling is an alternative heuristic. It is based
   on the observation that the existing of close-to-boundary high score
   possibly implies inadequate band width. When we see such a signal, we double
   the band width.
2013-03-05 00:57:16 -05:00
Heng Li e0991d6a45 r323: added Z-dropoff, a variant of blast's X-drop 2013-03-05 00:34:33 -05:00
Heng Li d6096c3f99 bugfix: caused by the latest change 2013-03-04 18:41:57 -05:00
Heng Li 59bc9341f6 code backup; more changes coming later 2013-03-04 17:29:07 -05:00
Heng Li 733410b50d r320: speed up very long sequence alignment
100-200bp read alignment should not be affected at all.
2013-03-04 14:43:49 -05:00
Heng Li 40f1214736 change to debugging code only 2013-03-04 11:52:11 -05:00
Heng Li 7e00dbcac5 r317: bugfix - out-of-range extension
This happens when target region crosses the forward-reverse boundary. This will
almost never happen to short-read alignment.
2013-03-04 11:35:23 -05:00
Heng Li 3e4a178e08 r314: cleanup bwamem API
Don't modify input sequences; more documentations
2013-03-01 11:14:51 -05:00
Heng Li f3cff1c609 r311: even tighter bw for CIGAR 2013-02-27 23:59:50 -05:00
Heng Li a33b9c0633 tighter bw for cigar SW 2013-02-27 23:40:46 -05:00
Heng Li 6a4d8c79d8 r309: bugfix - soft clipping missing in example.c 2013-02-27 22:45:18 -05:00
Heng Li df7c3f0000 r308: added a new API to convert region to CIGAR
and an example program demonstrating how to do single-end alignment in <50
lines of C code.
2013-02-27 22:28:29 -05:00
Heng Li 4bb0bdddca r306: introduce clipping penalty
More clipping leads to more severe reference bias. We should not clip the
alignment unless necessary.
2013-02-27 21:13:39 -05:00
Heng Li 65e099df34 r300: fixed an out-of-boundary bug in rare case 2013-02-27 00:37:17 -05:00
Heng Li 0b533385ef r299: better way to exclude seed 2013-02-27 00:29:11 -05:00
Heng Li ee80fb8bd0 Test each seed to see if extension is needed
The old version wastefully extends many seeds contained in an aligned region
found before. While this wastes little time for short reads, it becomes a
serious defect for long query sequences.

This is an attempt to fix this problem, but more tuning are needed.
2013-02-26 22:55:44 -05:00
Heng Li acd1ab607b r297: reduce wasteful SW extension
This is particularly important for long sequences
2013-02-26 16:26:46 -05:00
Heng Li 98787f0ae0 r295: generate NM 2013-02-26 13:36:01 -05:00
Heng Li 32f2d60a2e r294: bugfix - -M not working 2013-02-26 13:14:33 -05:00
Heng Li 619ac4f93d r293: bugfix - wrong RG type in SAM output 2013-02-26 13:03:35 -05:00
Heng Li e70c7c2a71 r284: amend cross-reference hit
I really hate this: complex and twisted logic for a nasty scenario that almost
never happens to short reads - but it may become serious when the reference
genome consists of many contigs.

On toy examples, the code seems to work. Don't know if it really works...
2013-02-26 00:03:49 -05:00
Heng Li 77b5b586ad r282: set min split_len to read length 2013-02-25 17:29:35 -05:00
Heng Li d19e834d84 r280: align two ends in the same thread
Otherwise odd-number threads may be of different speed from even-number threads.
2013-02-25 15:40:15 -05:00
Heng Li 20aa848b3c r279: for PE mapq, consider the number of pairs
If there are a lot of proper pairs, it is more likely that the best pair is
wrong.
2013-02-25 13:00:35 -05:00
Heng Li 9957e04590 r278: don't perform too many mate-sw 2013-02-25 11:56:02 -05:00
Heng Li 5ead86acd3 optionally mark split hit as secondary 2013-02-25 11:18:35 -05:00
Heng Li 514563bd0a no poor hits with -a; reduce mapq for 2nd primary 2013-02-25 10:54:12 -05:00
Heng Li 29e41b592c bugfix: isize is off by 1 2013-02-24 23:00:51 -05:00
Heng Li 85775c3384 output multiple hits 2013-02-24 13:23:43 -05:00
Heng Li 6bdccf2a8a added a bit documentation 2013-02-24 13:09:29 -05:00
Heng Li ee59a13109 simplified bwamem.h
Hide mem_seed_t and mem_chain_t. Don't expose unnecessary routines.
2013-02-24 12:17:29 -05:00
Heng Li cda85be059 fixed a couple bugs identified by gcc
Recent gcc is better.
2013-02-23 17:15:07 -05:00
Heng Li b4c38bcc1c append fasta/q comment 2013-02-23 16:57:34 -05:00
Heng Li ee4540c394 support read group in bwa-mem 2013-02-23 16:41:44 -05:00
Heng Li 67543f19a1 code refactoring 2013-02-23 15:55:55 -05:00
Heng Li e613195e17 moved some common code to bwa.{c,h} 2013-02-23 15:30:46 -05:00
Heng Li d460f2ec9e bugfix in multi-threaded bwa-mem 2013-02-23 14:48:54 -05:00