Commit Graph

251 Commits (8d2986ece28fe7774afbf65a15d64abc5d62d346)

Author SHA1 Message Date
Heng Li 5581cb9152 Release bwa-0.7.2-r351
For the TLEN sign fix. Sorry for the significant bug in 0.7.0/0.7.1
2013-03-09 18:15:41 -05:00
Heng Li 2d01a297fb Improving 'properly paired' flag.
If one end has a low quality tail that happens to have a score-20 hit,
the pair won't be flagged as properly paired because bwa-mem thought it has
multiple hits. By filtering with -T, we won't have this problem.
2013-03-09 18:05:50 -05:00
Heng Li 1d132a546d Release 0.7.1-r347 2013-03-08 15:30:06 -05:00
Heng Li 66c9783daf r345: bugfix in mem - wrong mate strand for unmap
Received a clean bill from Picard
2013-03-08 13:15:43 -05:00
Heng Li 274c0ac96c r343: bugfix in mem - wrong mate info for unmap
SAM generation is always among the nastiest bits. I would need to refactor at
some point (hardly happening).
2013-03-08 12:40:31 -05:00
Heng Li 017be45407 r342: bugfix in bwasw - AS is off by one
but I do not understand why the old code does not have the same problem.
2013-03-08 12:06:45 -05:00
Rob Davies aabd990e8f Merge branch 'master' into master_fixes
Conflicts:
	Makefile
	bwape.c
	bwase.c
	bwtsw2_aux.c
	stdaln.c
2013-03-08 16:46:45 +00:00
Heng Li b5b50ac8da r341: bugfix - wrong mate position
when one end is mapped with a score less than -T. Caused by the -T option.
2013-03-07 21:35:57 -05:00
Heng Li b0a76884e8 r340: feature freeze; updated the manpage
I will stop adding new features to bwa and prepare for the next release. I will
briefly evaluate the variant calling accuracy before the release.
2013-03-07 11:51:23 -05:00
Heng Li 503ca9ed2e r339: pemerge - expose some settings to CLI 2013-03-07 11:22:19 -05:00
Heng Li 1cadfa1552 r338: pemerge - fixed memory leaks; multithreading
pemerge is actually quite slow.
2013-03-07 11:14:52 -05:00
Heng Li 3e3236dfc4 r337: mem - always read even number of reads
In the old code, we may read odd number of reads from an interleaved fastq.
2013-03-07 11:00:15 -05:00
Heng Li 72817b664e r336: fine tuning pemerge 2013-03-06 23:38:07 -05:00
Heng Li 557d50c7e1 r335: fixed a compiling error
Caused by the last change
2013-03-06 21:57:13 -05:00
Heng Li 042e1f4442 r334: added pemerge to bwa 2013-03-06 21:55:02 -05:00
Heng Li 5fbd454682 r332: added output threshold
Otherwise there are far too many short hits
2013-03-05 22:49:38 -05:00
Heng Li 6476343a83 r331: rewrote CIGAR generation for bwa-short
When backtracking, bwa-short does not keep the detailed alignment or the exact
start and end positions. To find the boundary and the CIGAR, the old code does
a global alignment with a small end-gap penalty. It then deals with a lot of
special cases to derive the right position and CIGAR, which are actually not
always right. It is a mess.

As the new ksw.{c,h} does not support a different end-gap penalty, the old
strategy does not work. But we get something better. The new code finds the
boundaries with ksw_extend(). It is cleaner and gives more accurate CIGAR in
most cases.
2013-03-05 19:56:37 -05:00
Heng Li 98f8966750 r329: ditch stdaln.{c,h}; no changes to bwa-mem
stdaln.{c,h} was written ten years ago. Its local and SW extension code are
actually buggy (though that rarely happens and usually does not affect the
results too much). ksw.{c,h} is more concise, potentially faster, less buggy,
and richer in features.
2013-03-05 12:00:24 -05:00
Rob Davies 8a078cc16d Merge branch 'master' into master_fixes
Conflicts:
	bntseq.c
	bwamem.c
2013-03-05 10:21:07 +00:00
Heng Li efd9769b07 r324: a little code cleanup
The changes after r317 aim to improve the performance and accuracy for very
long query alignment. The short-read alignment should not be affected. The
changes include:

1) Z-dropoff. This is a variant of blast's X-dropoff. I orginally thought this
   heuristic only improves speed, but now I realize it also reduces poor
   alignment with long good flanking alignments. The difference from blast's
   X-dropoff is that Z-dropoff allows big gaps, but X-dropoff does not.

2) Band width doubling. When band width is too small, we will get a poor
   alignment in the middle. Sometimes such alignments cannot be fully excluded
   with Z-dropoff. Band width doubling is an alternative heuristic. It is based
   on the observation that the existing of close-to-boundary high score
   possibly implies inadequate band width. When we see such a signal, we double
   the band width.
2013-03-05 00:57:16 -05:00
Heng Li e0991d6a45 r323: added Z-dropoff, a variant of blast's X-drop 2013-03-05 00:34:33 -05:00
Heng Li 733410b50d r320: speed up very long sequence alignment
100-200bp read alignment should not be affected at all.
2013-03-04 14:43:49 -05:00
Heng Li 7e00dbcac5 r317: bugfix - out-of-range extension
This happens when target region crosses the forward-reverse boundary. This will
almost never happen to short-read alignment.
2013-03-04 11:35:23 -05:00
Heng Li d35f33b513 r316: don't allocate zero-length memory
It is not a bug, but Electric Fence does not like that.
2013-03-04 10:22:18 -05:00
Heng Li 35fb7f9fdf r315: move kopen.o out of libbwa.a 2013-03-01 11:47:51 -05:00
Heng Li 3e4a178e08 r314: cleanup bwamem API
Don't modify input sequences; more documentations
2013-03-01 11:14:51 -05:00
Rob Davies 6beab5f765 Merge branch 'master' into master_fixes
Merge changes to commit c5434ac (0.7.0 release)

Conflicts:
	Makefile
	bwamem.c
2013-03-01 10:22:49 +00:00
Rob Davies 3d33ab063e Merge branch 'master' into master_fixes
Merged to master version b621d3a

Conflicts:
	Makefile
	bntseq.c
	bwa.c
	bwase.c
	bwaseqio.c
	bwtaln.c
	bwtindex.c
	bwtio.c
	bwtmisc.c
	bwtsw2_aux.c
	cs2nt.c
	fastmap.c
	khash.h
	kseq.h
	ksw.c
	kvec.h
	simple_dp.c
	utils.c
	utils.h
2013-03-01 09:37:46 +00:00
Heng Li c5434ac865 r313: release bwa-0.7.0 2013-02-28 15:56:05 -05:00
Heng Li f3cff1c609 r311: even tighter bw for CIGAR 2013-02-27 23:59:50 -05:00
Heng Li 6a4d8c79d8 r309: bugfix - soft clipping missing in example.c 2013-02-27 22:45:18 -05:00
Heng Li df7c3f0000 r308: added a new API to convert region to CIGAR
and an example program demonstrating how to do single-end alignment in <50
lines of C code.
2013-02-27 22:28:29 -05:00
Heng Li 4bb0bdddca r306: introduce clipping penalty
More clipping leads to more severe reference bias. We should not clip the
alignment unless necessary.
2013-02-27 21:13:39 -05:00
Heng Li 292e92b602 r303: bugfix - wrong band width when CIGAR 2013-02-27 15:39:15 -05:00
Heng Li e620f0ff4e r302: updated the manpage 2013-02-27 13:16:22 -05:00
Heng Li b621d3ae38 r301: left-align indels
Don't know why the change is working...
2013-02-27 00:42:19 -05:00
Heng Li 65e099df34 r300: fixed an out-of-boundary bug in rare case 2013-02-27 00:37:17 -05:00
Heng Li 0b533385ef r299: better way to exclude seed 2013-02-27 00:29:11 -05:00
Heng Li acd1ab607b r297: reduce wasteful SW extension
This is particularly important for long sequences
2013-02-26 16:26:46 -05:00
Heng Li 98787f0ae0 r295: generate NM 2013-02-26 13:36:01 -05:00
Heng Li 32f2d60a2e r294: bugfix - -M not working 2013-02-26 13:14:33 -05:00
Heng Li 619ac4f93d r293: bugfix - wrong RG type in SAM output 2013-02-26 13:03:35 -05:00
Heng Li c6b226d719 r292: fixed a very stupid bug on CLI
I was thinking 0x10 or 16, but wrote 0x16...
2013-02-26 12:49:48 -05:00
Heng Li bfb2583d7f r291: summary - bwt.c micro optimization 2013-02-26 12:10:19 -05:00
Heng Li e70c7c2a71 r284: amend cross-reference hit
I really hate this: complex and twisted logic for a nasty scenario that almost
never happens to short reads - but it may become serious when the reference
genome consists of many contigs.

On toy examples, the code seems to work. Don't know if it really works...
2013-02-26 00:03:49 -05:00
Heng Li 61dd3bf13a r283: prepare for fixing cross-ref aln 2013-02-25 22:49:15 -05:00
Heng Li 77b5b586ad r282: set min split_len to read length 2013-02-25 17:29:35 -05:00
Heng Li 30cc8a95d1 fixed an unimportant memory leak 2013-02-25 16:34:19 -05:00
Heng Li d19e834d84 r280: align two ends in the same thread
Otherwise odd-number threads may be of different speed from even-number threads.
2013-02-25 15:40:15 -05:00
Heng Li 20aa848b3c r279: for PE mapq, consider the number of pairs
If there are a lot of proper pairs, it is more likely that the best pair is
wrong.
2013-02-25 13:00:35 -05:00
Heng Li 9957e04590 r278: don't perform too many mate-sw 2013-02-25 11:56:02 -05:00
Heng Li e9e5ee6a3d r277: updated the revision number 2013-02-25 11:34:06 -05:00
Heng Li 0b4a40dc25 updated revision number; to merge into master 2013-02-24 13:34:20 -05:00
Heng Li 545fb87feb removed another part related to color-space 2013-02-22 17:15:57 -05:00
Heng Li 6ad5a3c086 removed color-space support
which has been broken since 0.6.x
2013-02-12 10:21:17 -05:00
Heng Li 91debf412b move smem iterators to bwamem.{c,h} 2013-01-31 13:59:48 -05:00
Heng Li 292f9061ab r132: optionally copy FASTA/Q comment to SAM 2012-10-26 12:54:32 -04:00
Heng Li 3abfd0743a r131: r128 plus remote changes 2012-06-28 14:52:18 -04:00
Heng Li f44edd4fc9 r128: more conservative chaining filter 2012-06-28 14:51:02 -04:00
Heng Li 09ee115dcc r126: release bwa-0.6.2 2012-06-19 13:29:44 -04:00
Heng Li 29ed2d8287 rename the "api" branch as "master" 2012-06-19 13:13:29 -04:00
Heng Li d97ff6bf72 r124: updated version number 2012-04-17 20:45:07 -04:00
Heng Li 790df95e1a updated revision number 2012-04-02 11:43:32 -04:00
Heng Li bdc953cad9 Tim's suggestion suffix file name with .64 2012-03-29 12:22:51 -04:00
Heng Li 91a4a0c8ea Release bwa-0.6.1 2011-11-28 09:52:07 -05:00
Heng Li bf65b6463a fastmap: optionally output the original query seq 2011-11-24 19:44:21 -05:00
Heng Li b5170e0efa output the NM tag 2011-11-24 11:51:38 -05:00
Heng Li 196b50dde3 optionally mark multi-part hits as secondary 2011-11-23 23:39:59 -05:00
Heng Li 182cb2e89c use standard SW when no SSE2 2011-11-19 19:38:21 -05:00
Heng Li dc4008936c avoid duplicated XA tags 2011-11-19 14:52:47 -05:00
Heng Li 8f89f55484 fixed a segfault when there are too few good bases. 2011-11-17 22:13:38 -05:00
Heng Li 770a5f2ae0 Release BWA-0.6.0 2011-11-12 20:04:39 -05:00
Heng Li 7544aca718 updated revision number 2011-11-12 16:56:21 -05:00
Heng Li 8060693411 multithreading works again 2011-11-12 16:50:58 -05:00
Heng Li fa8cfe5567 bugfix: wrong mapping quality 2011-11-12 12:12:45 -05:00
Heng Li b42910ada6 proper mate information 2011-11-12 00:49:21 -05:00
Heng Li e06685db45 bwa-sw PE seems working (SAM is incorrect) 2011-11-07 00:51:43 -05:00
Heng Li 673ae4aaf8 throw an error if insufficient memory during index 2011-10-31 13:26:24 -04:00
Heng Li 02946df28a fixed a off-by-1 bug 2011-10-27 13:55:48 -04:00
Heng Li 7babb54e4c drop smem based mapping algorithm
While we can compute smems very efficiently, there is still a long way to get
the alignment. On simulated data, this smem-based algorithm is 4X faster than
bwasw and twice as fast as bowtie2, but the accuracy is far lower than bwasw
and even lower than bowtie2 in the high-mapQ range. I am kind of sure that if
we continue to increase the mapping accuracy, the speed will approach to bwasw,
if not slower.

Smem-based mapping algorithm is still interesting, but given that I am short of
time, I will not explore it further.
2011-10-27 10:56:09 -04:00
Heng Li 7664795ffb fixed a minor issue about +/-1 2011-10-25 13:00:41 -04:00
Heng Li 7168f5c10a updated revision number 2011-10-25 12:50:19 -04:00
Heng Li 22c2252e15 added bidirectional bwt; seems buggy 2011-10-25 00:22:28 -04:00
Heng Li 7b4266a6e5 bugfix: integer overflow and strand error in sampe 2011-10-24 17:07:12 -04:00
Heng Li b59fd2bf47 fixed an integer overflow 2011-10-24 14:39:57 -04:00
Heng Li 8f3c780552 fixed a potential int overflow 2011-10-24 14:22:39 -04:00
Heng Li 1f970b4557 updated revision number 2011-10-24 14:14:42 -04:00
Heng Li 26b77eabef updated version number 2011-10-21 12:32:00 -04:00
Heng Li 46123639cf removed reverse pac; bwa is not working right now 2011-10-20 12:09:35 -04:00
Heng Li d70754e234 update revision number 2011-10-14 10:32:31 -04:00
Heng Li 72563c38f3 automatically choose the algorithm for BWT 2011-06-09 17:33:25 -04:00
Heng Li a74523a68d increase maximum barcode length limit to 63bp 2011-06-09 17:17:13 -04:00
Heng Li 243e735431 applied patches from Alec Wysoker 2011-05-04 09:46:50 -04:00
Heng Li 87664941b0 Release bwa-0.5.9 (r16) 2011-01-24 22:00:24 -05:00
Heng Li 7fd8948689 Added recommendation for PacBio reads 2011-01-22 13:20:11 -05:00
Heng Li 1d7d8be9e8 Put BC: to both ends 2011-01-18 20:16:57 -05:00
Heng Li 51d354cd28 Added barcode support 2011-01-15 15:35:39 -05:00
Heng Li 10721ca602 Added an option to accept Illumina 1.3+ fastq 2011-01-15 14:07:08 -05:00
Heng Li f335b33624 fixed a bug in bwase: no RG for unmapped read pairs 2011-01-15 10:32:45 -05:00
Heng Li 5e30884730 Update to the latest modfication 0.5.9rc1-2. Update ChangeLog 2011-01-13 20:54:10 -05:00
Heng Li 007c3eb75d Imported from my local bwa repository, the master repository. 2011-01-13 20:52:12 -05:00