Heng Li
274c0ac96c
r343: bugfix in mem - wrong mate info for unmap
...
SAM generation is always among the nastiest bits. I would need to refactor at
some point (hardly happening).
2013-03-08 12:40:31 -05:00
Heng Li
5fbd454682
r332: added output threshold
...
Otherwise there are far too many short hits
2013-03-05 22:49:38 -05:00
Heng Li
07921659cf
move mem_fill_scmat() to bwa.{h,c}
2013-03-05 09:38:12 -05:00
Heng Li
efd9769b07
r324: a little code cleanup
...
The changes after r317 aim to improve the performance and accuracy for very
long query alignment. The short-read alignment should not be affected. The
changes include:
1) Z-dropoff. This is a variant of blast's X-dropoff. I orginally thought this
heuristic only improves speed, but now I realize it also reduces poor
alignment with long good flanking alignments. The difference from blast's
X-dropoff is that Z-dropoff allows big gaps, but X-dropoff does not.
2) Band width doubling. When band width is too small, we will get a poor
alignment in the middle. Sometimes such alignments cannot be fully excluded
with Z-dropoff. Band width doubling is an alternative heuristic. It is based
on the observation that the existing of close-to-boundary high score
possibly implies inadequate band width. When we see such a signal, we double
the band width.
2013-03-05 00:57:16 -05:00
Heng Li
e0991d6a45
r323: added Z-dropoff, a variant of blast's X-drop
2013-03-05 00:34:33 -05:00
Heng Li
d6096c3f99
bugfix: caused by the latest change
2013-03-04 18:41:57 -05:00
Heng Li
59bc9341f6
code backup; more changes coming later
2013-03-04 17:29:07 -05:00
Heng Li
733410b50d
r320: speed up very long sequence alignment
...
100-200bp read alignment should not be affected at all.
2013-03-04 14:43:49 -05:00
Heng Li
40f1214736
change to debugging code only
2013-03-04 11:52:11 -05:00
Heng Li
7e00dbcac5
r317: bugfix - out-of-range extension
...
This happens when target region crosses the forward-reverse boundary. This will
almost never happen to short-read alignment.
2013-03-04 11:35:23 -05:00
Heng Li
3e4a178e08
r314: cleanup bwamem API
...
Don't modify input sequences; more documentations
2013-03-01 11:14:51 -05:00
Heng Li
f3cff1c609
r311: even tighter bw for CIGAR
2013-02-27 23:59:50 -05:00
Heng Li
a33b9c0633
tighter bw for cigar SW
2013-02-27 23:40:46 -05:00
Heng Li
6a4d8c79d8
r309: bugfix - soft clipping missing in example.c
2013-02-27 22:45:18 -05:00
Heng Li
df7c3f0000
r308: added a new API to convert region to CIGAR
...
and an example program demonstrating how to do single-end alignment in <50
lines of C code.
2013-02-27 22:28:29 -05:00
Heng Li
4bb0bdddca
r306: introduce clipping penalty
...
More clipping leads to more severe reference bias. We should not clip the
alignment unless necessary.
2013-02-27 21:13:39 -05:00
Heng Li
65e099df34
r300: fixed an out-of-boundary bug in rare case
2013-02-27 00:37:17 -05:00
Heng Li
0b533385ef
r299: better way to exclude seed
2013-02-27 00:29:11 -05:00
Heng Li
ee80fb8bd0
Test each seed to see if extension is needed
...
The old version wastefully extends many seeds contained in an aligned region
found before. While this wastes little time for short reads, it becomes a
serious defect for long query sequences.
This is an attempt to fix this problem, but more tuning are needed.
2013-02-26 22:55:44 -05:00
Heng Li
acd1ab607b
r297: reduce wasteful SW extension
...
This is particularly important for long sequences
2013-02-26 16:26:46 -05:00
Heng Li
98787f0ae0
r295: generate NM
2013-02-26 13:36:01 -05:00
Heng Li
32f2d60a2e
r294: bugfix - -M not working
2013-02-26 13:14:33 -05:00
Heng Li
619ac4f93d
r293: bugfix - wrong RG type in SAM output
2013-02-26 13:03:35 -05:00
Heng Li
e70c7c2a71
r284: amend cross-reference hit
...
I really hate this: complex and twisted logic for a nasty scenario that almost
never happens to short reads - but it may become serious when the reference
genome consists of many contigs.
On toy examples, the code seems to work. Don't know if it really works...
2013-02-26 00:03:49 -05:00
Heng Li
77b5b586ad
r282: set min split_len to read length
2013-02-25 17:29:35 -05:00
Heng Li
d19e834d84
r280: align two ends in the same thread
...
Otherwise odd-number threads may be of different speed from even-number threads.
2013-02-25 15:40:15 -05:00
Heng Li
20aa848b3c
r279: for PE mapq, consider the number of pairs
...
If there are a lot of proper pairs, it is more likely that the best pair is
wrong.
2013-02-25 13:00:35 -05:00
Heng Li
9957e04590
r278: don't perform too many mate-sw
2013-02-25 11:56:02 -05:00
Heng Li
5ead86acd3
optionally mark split hit as secondary
2013-02-25 11:18:35 -05:00
Heng Li
514563bd0a
no poor hits with -a; reduce mapq for 2nd primary
2013-02-25 10:54:12 -05:00
Heng Li
29e41b592c
bugfix: isize is off by 1
2013-02-24 23:00:51 -05:00
Heng Li
85775c3384
output multiple hits
2013-02-24 13:23:43 -05:00
Heng Li
6bdccf2a8a
added a bit documentation
2013-02-24 13:09:29 -05:00
Heng Li
ee59a13109
simplified bwamem.h
...
Hide mem_seed_t and mem_chain_t. Don't expose unnecessary routines.
2013-02-24 12:17:29 -05:00
Heng Li
cda85be059
fixed a couple bugs identified by gcc
...
Recent gcc is better.
2013-02-23 17:15:07 -05:00
Heng Li
b4c38bcc1c
append fasta/q comment
2013-02-23 16:57:34 -05:00
Heng Li
ee4540c394
support read group in bwa-mem
2013-02-23 16:41:44 -05:00
Heng Li
67543f19a1
code refactoring
2013-02-23 15:55:55 -05:00
Heng Li
e613195e17
moved some common code to bwa.{c,h}
2013-02-23 15:30:46 -05:00
Heng Li
d460f2ec9e
bugfix in multi-threaded bwa-mem
2013-02-23 14:48:54 -05:00
Heng Li
904c3205c0
removed a few unused variables
...
These variables have been assigned but never actually used. Reported by
gcc-4.7. Lower version cannot give such warnings.
2013-02-23 13:26:50 -05:00
Heng Li
17c123d65a
pring paired-end SAM
2013-02-22 16:38:48 -05:00
Heng Li
ba15b787cb
rework PE mapq; don't know if better
2013-02-22 14:47:57 -05:00
Heng Li
c5ce72f593
scoring pairs by score, not by errors
...
This is important for bwa-mem which does local alignment. A short exact match
is worse than a long inexact match. Also fixed a bug in approximating mapping
quality.
2013-02-22 12:10:20 -05:00
Heng Li
d4cf6d97a6
bugfix: memory leak
2013-02-21 15:04:31 -05:00
Heng Li
a578688fa8
generate multiple alignments from one chain
2013-02-21 14:58:51 -05:00
Heng Li
cfbc4c89e3
perform extension when there are, say, 20bp tandem
2013-02-21 14:34:10 -05:00
Heng Li
54da54ffd4
extend more seeds (and thus slower...)
2013-02-21 12:52:00 -05:00
Heng Li
f8829318cf
weakened the chain filter
2013-02-21 12:25:20 -05:00
Heng Li
84a328764a
bugfix: mis-chaining caused by integer overflow
...
I really need to rewrite kbtree some time.
2013-02-21 11:42:30 -05:00