Heng Li
499cf4c00d
r376: reduce wasteful seed extension
...
mainly for contig alignment
2013-04-10 12:18:56 -04:00
Heng Li
3d8a8c1e37
r374: fix - clipping penalty not always working
...
This only happens to gaps where mem underestimates the bandwidth without
considering the clipping penalty.
2013-04-10 01:09:37 -04:00
Heng Li
d7ca0885eb
r371: extend overlapping seeds
...
to avoid misalignment in tandem repeats
2013-04-04 00:43:43 -04:00
Heng Li
1e118e0823
r370: suppress "D" at the end of a cigar
...
This is caused by seeds in tandem repeats, in which case, bwa-mem may not
extend the true seed. The change in this commit is only a temporary cure.
2013-04-03 23:57:19 -04:00
Rob Davies
c89756e2b0
Merge branch 'master' into master_fixes
2013-03-19 12:11:51 +00:00
Heng Li
8437cd4edd
r369: bugfix - segfault caused by the last change
...
Sigh... Even the simplest change can lead to new bugs.
2013-03-19 01:04:57 -04:00
Heng Li
1e3cadbfc2
r368: bugfix - wrong CIGAR when bridging 3 contigs
...
In this case, bwa_fix_xref() will return insane coordinates. The old version
did not check the return status and write wrong CIGAR. This bug only happen to
very short assembly contigs.
2013-03-18 20:49:32 -04:00
Rob Davies
c862a1a396
Merge branch 'master' into master_fixes
2013-03-18 13:35:12 +00:00
Heng Li
9346acde1b
Release bwa-0.7.3a-r367
...
In 0.7.3, the wrong CIGAR bug was only fixed in one scenario, but not fixed
in another corner case.
2013-03-15 21:26:37 -04:00
Heng Li
dd51177837
r365: bugfix - wrong alignment (right mapping)
...
The bug only happens when there is a 1bp del and 1bp ins which are close to the
end and there are no other substitutions or indels. In this case, bwa mem gave
a wrong band width.
2013-03-15 11:59:05 -04:00
Rob Davies
cca27c1ef5
Merge branch 'master' into master_fixes
...
Conflicts:
bwamem.c
bwamem_pair.c
example.c
2013-03-13 12:12:28 +00:00
Heng Li
bdf34f6ce7
r363: XA=>XP; output mapQ in XP
...
In BWA, XA gives hits "shadowed" by the primary hit. In BWA-MEM, we output
primary hits only. Primary hits may have non-zero mapping quality.
2013-03-12 09:56:04 -04:00
Heng Li
c29b176cb6
r362: bugfix - occasionally wrong TLEN
...
Use the 0.7.2 way to compute TLEN
2013-03-12 00:14:36 -04:00
Heng Li
dab5b17c1a
r360: output alternative primary alignments in XA
2013-03-11 23:43:58 -04:00
Heng Li
6c665189ad
r359: identical output to 0.7.2 (without -a)
2013-03-11 23:16:18 -04:00
Heng Li
0f88103d2a
SAM almost identical to 0.7.2
2013-03-11 23:01:51 -04:00
Heng Li
26f4c704ed
drop the old SAM writer
2013-03-11 22:24:54 -04:00
Heng Li
ebb45dc42e
new code works for SE
2013-03-11 21:59:15 -04:00
Heng Li
c7edaa8e84
to test the new sam writer...
2013-03-11 21:55:52 -04:00
Heng Li
47952b6f3f
drop an unnecessary member from mem_aln_t
2013-03-11 21:35:32 -04:00
Heng Li
8f0d439913
prepare to replace the SAM printing code
...
This move is dangerous as SAM printing is very complex, but it will benefit in
the long run. The planned change will reduce the redundancy, improves clarity
and most importantly makes it much easier to output multiple primary hits in an
optional tag.
2013-03-11 21:25:17 -04:00
Rob Davies
9228e48efd
Merge branch 'master' into master_fixes
...
Conflicts:
Makefile
2013-03-11 13:50:49 +00:00
Heng Li
9ea7f83974
Emergent bugfix: wrong TLEN sign
...
It is interesting that Picard did not find the issue.
2013-03-09 18:03:15 -05:00
Heng Li
66c9783daf
r345: bugfix in mem - wrong mate strand for unmap
...
Received a clean bill from Picard
2013-03-08 13:15:43 -05:00
Heng Li
af7b4d8980
gcc wrongly thinks a variable may be uninitialized
...
It should always be initialized. To avoid a warning, made a change.
2013-03-08 12:45:50 -05:00
Heng Li
274c0ac96c
r343: bugfix in mem - wrong mate info for unmap
...
SAM generation is always among the nastiest bits. I would need to refactor at
some point (hardly happening).
2013-03-08 12:40:31 -05:00
Rob Davies
aabd990e8f
Merge branch 'master' into master_fixes
...
Conflicts:
Makefile
bwape.c
bwase.c
bwtsw2_aux.c
stdaln.c
2013-03-08 16:46:45 +00:00
Heng Li
5fbd454682
r332: added output threshold
...
Otherwise there are far too many short hits
2013-03-05 22:49:38 -05:00
Heng Li
07921659cf
move mem_fill_scmat() to bwa.{h,c}
2013-03-05 09:38:12 -05:00
Rob Davies
8a078cc16d
Merge branch 'master' into master_fixes
...
Conflicts:
bntseq.c
bwamem.c
2013-03-05 10:21:07 +00:00
Heng Li
efd9769b07
r324: a little code cleanup
...
The changes after r317 aim to improve the performance and accuracy for very
long query alignment. The short-read alignment should not be affected. The
changes include:
1) Z-dropoff. This is a variant of blast's X-dropoff. I orginally thought this
heuristic only improves speed, but now I realize it also reduces poor
alignment with long good flanking alignments. The difference from blast's
X-dropoff is that Z-dropoff allows big gaps, but X-dropoff does not.
2) Band width doubling. When band width is too small, we will get a poor
alignment in the middle. Sometimes such alignments cannot be fully excluded
with Z-dropoff. Band width doubling is an alternative heuristic. It is based
on the observation that the existing of close-to-boundary high score
possibly implies inadequate band width. When we see such a signal, we double
the band width.
2013-03-05 00:57:16 -05:00
Heng Li
e0991d6a45
r323: added Z-dropoff, a variant of blast's X-drop
2013-03-05 00:34:33 -05:00
Heng Li
d6096c3f99
bugfix: caused by the latest change
2013-03-04 18:41:57 -05:00
Heng Li
59bc9341f6
code backup; more changes coming later
2013-03-04 17:29:07 -05:00
Heng Li
733410b50d
r320: speed up very long sequence alignment
...
100-200bp read alignment should not be affected at all.
2013-03-04 14:43:49 -05:00
Heng Li
40f1214736
change to debugging code only
2013-03-04 11:52:11 -05:00
Heng Li
7e00dbcac5
r317: bugfix - out-of-range extension
...
This happens when target region crosses the forward-reverse boundary. This will
almost never happen to short-read alignment.
2013-03-04 11:35:23 -05:00
Heng Li
3e4a178e08
r314: cleanup bwamem API
...
Don't modify input sequences; more documentations
2013-03-01 11:14:51 -05:00
Rob Davies
6beab5f765
Merge branch 'master' into master_fixes
...
Merge changes to commit c5434ac (0.7.0 release)
Conflicts:
Makefile
bwamem.c
2013-03-01 10:22:49 +00:00
Rob Davies
3d33ab063e
Merge branch 'master' into master_fixes
...
Merged to master version b621d3a
Conflicts:
Makefile
bntseq.c
bwa.c
bwase.c
bwaseqio.c
bwtaln.c
bwtindex.c
bwtio.c
bwtmisc.c
bwtsw2_aux.c
cs2nt.c
fastmap.c
khash.h
kseq.h
ksw.c
kvec.h
simple_dp.c
utils.c
utils.h
2013-03-01 09:37:46 +00:00
Heng Li
f3cff1c609
r311: even tighter bw for CIGAR
2013-02-27 23:59:50 -05:00
Heng Li
a33b9c0633
tighter bw for cigar SW
2013-02-27 23:40:46 -05:00
Heng Li
6a4d8c79d8
r309: bugfix - soft clipping missing in example.c
2013-02-27 22:45:18 -05:00
Heng Li
df7c3f0000
r308: added a new API to convert region to CIGAR
...
and an example program demonstrating how to do single-end alignment in <50
lines of C code.
2013-02-27 22:28:29 -05:00
Heng Li
4bb0bdddca
r306: introduce clipping penalty
...
More clipping leads to more severe reference bias. We should not clip the
alignment unless necessary.
2013-02-27 21:13:39 -05:00
Heng Li
65e099df34
r300: fixed an out-of-boundary bug in rare case
2013-02-27 00:37:17 -05:00
Heng Li
0b533385ef
r299: better way to exclude seed
2013-02-27 00:29:11 -05:00
Heng Li
ee80fb8bd0
Test each seed to see if extension is needed
...
The old version wastefully extends many seeds contained in an aligned region
found before. While this wastes little time for short reads, it becomes a
serious defect for long query sequences.
This is an attempt to fix this problem, but more tuning are needed.
2013-02-26 22:55:44 -05:00
Heng Li
acd1ab607b
r297: reduce wasteful SW extension
...
This is particularly important for long sequences
2013-02-26 16:26:46 -05:00
Heng Li
98787f0ae0
r295: generate NM
2013-02-26 13:36:01 -05:00
Heng Li
32f2d60a2e
r294: bugfix - -M not working
2013-02-26 13:14:33 -05:00
Heng Li
619ac4f93d
r293: bugfix - wrong RG type in SAM output
2013-02-26 13:03:35 -05:00
Heng Li
e70c7c2a71
r284: amend cross-reference hit
...
I really hate this: complex and twisted logic for a nasty scenario that almost
never happens to short reads - but it may become serious when the reference
genome consists of many contigs.
On toy examples, the code seems to work. Don't know if it really works...
2013-02-26 00:03:49 -05:00
Heng Li
77b5b586ad
r282: set min split_len to read length
2013-02-25 17:29:35 -05:00
Heng Li
d19e834d84
r280: align two ends in the same thread
...
Otherwise odd-number threads may be of different speed from even-number threads.
2013-02-25 15:40:15 -05:00
Heng Li
20aa848b3c
r279: for PE mapq, consider the number of pairs
...
If there are a lot of proper pairs, it is more likely that the best pair is
wrong.
2013-02-25 13:00:35 -05:00
Heng Li
9957e04590
r278: don't perform too many mate-sw
2013-02-25 11:56:02 -05:00
Heng Li
5ead86acd3
optionally mark split hit as secondary
2013-02-25 11:18:35 -05:00
Heng Li
514563bd0a
no poor hits with -a; reduce mapq for 2nd primary
2013-02-25 10:54:12 -05:00
Heng Li
29e41b592c
bugfix: isize is off by 1
2013-02-24 23:00:51 -05:00
Heng Li
85775c3384
output multiple hits
2013-02-24 13:23:43 -05:00
Heng Li
6bdccf2a8a
added a bit documentation
2013-02-24 13:09:29 -05:00
Heng Li
ee59a13109
simplified bwamem.h
...
Hide mem_seed_t and mem_chain_t. Don't expose unnecessary routines.
2013-02-24 12:17:29 -05:00
Heng Li
cda85be059
fixed a couple bugs identified by gcc
...
Recent gcc is better.
2013-02-23 17:15:07 -05:00
Heng Li
b4c38bcc1c
append fasta/q comment
2013-02-23 16:57:34 -05:00
Heng Li
ee4540c394
support read group in bwa-mem
2013-02-23 16:41:44 -05:00
Heng Li
67543f19a1
code refactoring
2013-02-23 15:55:55 -05:00
Heng Li
e613195e17
moved some common code to bwa.{c,h}
2013-02-23 15:30:46 -05:00
Heng Li
d460f2ec9e
bugfix in multi-threaded bwa-mem
2013-02-23 14:48:54 -05:00
Heng Li
904c3205c0
removed a few unused variables
...
These variables have been assigned but never actually used. Reported by
gcc-4.7. Lower version cannot give such warnings.
2013-02-23 13:26:50 -05:00
Heng Li
17c123d65a
pring paired-end SAM
2013-02-22 16:38:48 -05:00
Heng Li
ba15b787cb
rework PE mapq; don't know if better
2013-02-22 14:47:57 -05:00
Heng Li
c5ce72f593
scoring pairs by score, not by errors
...
This is important for bwa-mem which does local alignment. A short exact match
is worse than a long inexact match. Also fixed a bug in approximating mapping
quality.
2013-02-22 12:10:20 -05:00
Heng Li
d4cf6d97a6
bugfix: memory leak
2013-02-21 15:04:31 -05:00
Heng Li
a578688fa8
generate multiple alignments from one chain
2013-02-21 14:58:51 -05:00
Heng Li
cfbc4c89e3
perform extension when there are, say, 20bp tandem
2013-02-21 14:34:10 -05:00
Heng Li
54da54ffd4
extend more seeds (and thus slower...)
2013-02-21 12:52:00 -05:00
Heng Li
f8829318cf
weakened the chain filter
2013-02-21 12:25:20 -05:00
Heng Li
84a328764a
bugfix: mis-chaining caused by integer overflow
...
I really need to rewrite kbtree some time.
2013-02-21 11:42:30 -05:00
Heng Li
ea8f4f4d34
clean bill from valgrind
2013-02-20 20:26:57 -05:00
Heng Li
5626fe29b7
Well, at least output sth
2013-02-20 19:11:44 -05:00
Heng Li
a7d574d125
backup comments
2013-02-20 01:11:38 -05:00
Heng Li
688872fb1b
code backup
2013-02-19 00:50:39 -05:00
Heng Li
66585b7982
code backup
2013-02-18 16:33:06 -05:00
Heng Li
ea9fc7df48
keep the number of SW performed
2013-02-16 11:03:27 -05:00
Heng Li
5f8c6efbc3
forbid x-bounary bns_get_seq(); code backup
2013-02-16 09:48:44 -05:00
Heng Li
604e3d8da1
code backup; to upgrade ksw.{c,h}
2013-02-12 16:15:26 -05:00
Heng Li
325ba8213b
move mark primary to worker1()
2013-02-12 15:54:55 -05:00
Heng Li
cd0969332f
keep track of the "parent" of a secondary
2013-02-12 15:52:23 -05:00
Heng Li
22b79b3475
mark primary, instead of dropping secondary
2013-02-12 15:34:44 -05:00
Heng Li
2fc469d0c9
code backup
2013-02-12 12:09:36 -05:00
Heng Li
95d18449b3
merge bseq.{h,c} to utils.{h,c}
...
I do not like many small files.
2013-02-12 10:36:15 -05:00
Heng Li
13288e2dcd
code backup
2013-02-12 09:22:47 -05:00
Heng Li
99907c98fb
separated and improved SAM printing code
...
This is for the PE mode. The routines may also be useful for bwa-sw, but
probably I won't change the old code.
2013-02-11 15:29:03 -05:00
Heng Li
987d4b4205
fixed a stupid bug in fastq reading
2013-02-11 11:27:35 -05:00
Heng Li
59eaf650ac
code backup
2013-02-11 10:59:38 -05:00
Heng Li
f4c0672800
move sort_and_dedup() to worker1()
2013-02-10 12:55:19 -05:00
Heng Li
c310fb7424
a little refactoring for PE support
2013-02-10 12:24:33 -05:00
Heng Li
829664d6b5
missing identical hits; improved sub_n
2013-02-08 17:55:35 -05:00
Heng Li
b2c7148dc9
consider the number of suboptimal hits
2013-02-08 17:20:44 -05:00
Heng Li
39607065e0
allow more seeds to be seen (thus slower..)
2013-02-08 16:56:28 -05:00
Heng Li
2848d3045a
more accurate chain weight
2013-02-08 15:34:25 -05:00
Heng Li
220fc39e9d
the previous change does not work... Fixed.
2013-02-08 14:51:24 -05:00
Heng Li
fdb0a7405f
better dealing with microrepeat
2013-02-08 14:46:57 -05:00
Heng Li
057b292dde
exclude identical hits
2013-02-08 14:18:39 -05:00
Heng Li
1bf1a674a8
minor improvement to mapQ
2013-02-08 13:43:15 -05:00
Heng Li
245505deed
minor improvement to mapQ approx.
...
That is not good enough, but I am tired and need rest...
2013-02-07 22:09:58 -05:00
Heng Li
d8e4d57956
Don't use narrow band.
...
I may retry this feature if the profilter indicates that this greatly helps.
2013-02-07 21:22:54 -05:00
Heng Li
d890c7997c
better treatment for micro-repeat
2013-02-07 21:20:36 -05:00
Heng Li
45b0d3423a
bugfix: when no seed hits found
2013-02-07 20:07:31 -05:00
Heng Li
cd6bd524d4
discard internal seeds shorter than half
2013-02-07 19:50:37 -05:00
Heng Li
83a49f3210
compute mapQ; extend from the longest seed
2013-02-07 17:15:45 -05:00
Heng Li
6ba11ab68c
no effective changes
2013-02-07 16:42:01 -05:00
Heng Li
ff3fea115c
write soft clip; added debugging code
2013-02-07 16:27:11 -05:00
Heng Li
27fdf6397d
single-end working! no mapQ, though
2013-02-07 15:52:36 -05:00
Heng Li
49f2bcc015
CIGAR is wrong, but the rest is okay
2013-02-07 14:57:22 -05:00
Heng Li
1fd51fc3f7
code backup
2013-02-07 14:36:18 -05:00
Heng Li
bfeb37c4de
code backup
2013-02-07 13:29:01 -05:00
Heng Li
5dc398cdef
start to write CLI
2013-02-07 13:13:43 -05:00
Heng Li
5a0b32bfd2
updated to the latest kseq.h
2013-02-06 14:38:40 -05:00
Heng Li
a9292d674d
a bit code cleanup
2013-02-06 13:59:32 -05:00
Heng Li
797a8c147e
sorting chains while filtering chains
2013-02-05 21:58:33 -05:00
Heng Li
a61288c768
separate CIGAR generation
2013-02-05 21:49:19 -05:00
Heng Li
14e6a7bdb9
fixed a silly bug in ksw_extend()
...
Query return value is assigned to the target variable and vice versa...
2013-02-05 17:29:03 -05:00
Heng Li
1e16f3e701
calling ksw_global(); ksw_extend() is buggy!
2013-02-05 17:13:12 -05:00
Heng Li
7067af833d
fixed a silly bug on sorted merge
2013-02-05 00:41:07 -05:00
Heng Li
d6a73c9171
chain filtering apparently working
2013-02-05 00:17:20 -05:00
Heng Li
9d0cdb2d3c
unfinished chain filter
2013-02-04 17:23:06 -05:00
Heng Li
c589b42fb5
minor tuning for fewer identical hits
2013-02-04 16:48:11 -05:00
Heng Li
29c8546679
better ref extraction
2013-02-04 16:08:00 -05:00
Heng Li
788e9d1e3d
fixed a couple of leaks; buggy atm
2013-02-04 15:40:26 -05:00
Heng Li
f27bd18f20
check if every seed is included; not used for now
2013-02-04 15:09:47 -05:00
Heng Li
5bfa45a69b
write the mem_aln_t struct
2013-02-04 15:02:56 -05:00
Heng Li
666638a953
changed the default scoring
2013-02-04 14:51:51 -05:00
Heng Li
ba18db1a9f
sw extension works for the simplest case
2013-02-04 12:37:38 -05:00
Heng Li
d25a87cc50
code backup
2013-02-02 15:14:24 -05:00
Heng Li
00e5302219
routine to get subsequence from 2-bit pac
2013-02-01 16:39:50 -05:00
Heng Li
7ab4b3321f
bugfix: memory leak
2013-02-01 15:26:34 -05:00
Heng Li
f8f3b7577a
code cleanup; added a missing file
2013-02-01 14:38:44 -05:00
Heng Li
620ad6e5b9
reseed long SMEMs
2013-02-01 14:20:38 -05:00
Heng Li
5d372cef65
bugfix: wrong B-tree comparison
2013-01-31 16:39:24 -05:00
Heng Li
8977737460
basic chaining working
...
Definitely suboptimal in a lot of corner cases...
2013-01-31 16:26:05 -05:00
Heng Li
6c19c9640c
code backup
2013-01-31 15:55:22 -05:00
Heng Li
91debf412b
move smem iterators to bwamem.{c,h}
2013-01-31 13:59:48 -05:00