Commit Graph

369 Commits (c29b176cb6412fc9b2f6fafaa11e7bd4606334ff)

Author SHA1 Message Date
Heng Li c29b176cb6 r362: bugfix - occasionally wrong TLEN
Use the 0.7.2 way to compute TLEN
2013-03-12 00:14:36 -04:00
Heng Li aa7cdf4bb3 r361: flag proper pair even if multi-primary
Up to here, all the features in my checklist have been implemented.
2013-03-12 00:00:04 -04:00
Heng Li dab5b17c1a r360: output alternative primary alignments in XA 2013-03-11 23:43:58 -04:00
Heng Li 6c665189ad r359: identical output to 0.7.2 (without -a) 2013-03-11 23:16:18 -04:00
Heng Li 0f88103d2a SAM almost identical to 0.7.2 2013-03-11 23:01:51 -04:00
Heng Li 26f4c704ed drop the old SAM writer 2013-03-11 22:24:54 -04:00
Heng Li 0b0455ca51 replace PE; BUGGY right now!! 2013-03-11 22:18:23 -04:00
Heng Li ebb45dc42e new code works for SE 2013-03-11 21:59:15 -04:00
Heng Li c7edaa8e84 to test the new sam writer... 2013-03-11 21:55:52 -04:00
Heng Li 47952b6f3f drop an unnecessary member from mem_aln_t 2013-03-11 21:35:32 -04:00
Heng Li 8f0d439913 prepare to replace the SAM printing code
This move is dangerous as SAM printing is very complex, but it will benefit in
the long run. The planned change will reduce the redundancy, improves clarity
and most importantly makes it much easier to output multiple primary hits in an
optional tag.
2013-03-11 21:25:17 -04:00
Heng Li 5581cb9152 Release bwa-0.7.2-r351
For the TLEN sign fix. Sorry for the significant bug in 0.7.0/0.7.1
2013-03-09 18:15:41 -05:00
Heng Li 2d01a297fb Improving 'properly paired' flag.
If one end has a low quality tail that happens to have a score-20 hit,
the pair won't be flagged as properly paired because bwa-mem thought it has
multiple hits. By filtering with -T, we won't have this problem.
2013-03-09 18:05:50 -05:00
Heng Li 740d2c1314 Match to 'N' costs -1, instead of 0.
This is to prevent alignment through 'N'.
2013-03-09 18:03:57 -05:00
Heng Li 9ea7f83974 Emergent bugfix: wrong TLEN sign
It is interesting that Picard did not find the issue.
2013-03-09 18:03:15 -05:00
Heng Li 1d132a546d Release 0.7.1-r347 2013-03-08 15:30:06 -05:00
Heng Li 5370bb23a3 Updated NEWS; added stddef.h for size_t
I thought size_t is defined in stdlib.h, but it is not always.
2013-03-08 14:14:42 -05:00
Heng Li 66c9783daf r345: bugfix in mem - wrong mate strand for unmap
Received a clean bill from Picard
2013-03-08 13:15:43 -05:00
Heng Li af7b4d8980 gcc wrongly thinks a variable may be uninitialized
It should always be initialized. To avoid a warning, made a change.
2013-03-08 12:45:50 -05:00
Heng Li 274c0ac96c r343: bugfix in mem - wrong mate info for unmap
SAM generation is always among the nastiest bits. I would need to refactor at
some point (hardly happening).
2013-03-08 12:40:31 -05:00
Heng Li 017be45407 r342: bugfix in bwasw - AS is off by one
but I do not understand why the old code does not have the same problem.
2013-03-08 12:06:45 -05:00
Heng Li b5b50ac8da r341: bugfix - wrong mate position
when one end is mapped with a score less than -T. Caused by the -T option.
2013-03-07 21:35:57 -05:00
Heng Li b0a76884e8 r340: feature freeze; updated the manpage
I will stop adding new features to bwa and prepare for the next release. I will
briefly evaluate the variant calling accuracy before the release.
2013-03-07 11:51:23 -05:00
Heng Li 503ca9ed2e r339: pemerge - expose some settings to CLI 2013-03-07 11:22:19 -05:00
Heng Li 1cadfa1552 r338: pemerge - fixed memory leaks; multithreading
pemerge is actually quite slow.
2013-03-07 11:14:52 -05:00
Heng Li 3e3236dfc4 r337: mem - always read even number of reads
In the old code, we may read odd number of reads from an interleaved fastq.
2013-03-07 11:00:15 -05:00
Heng Li 72817b664e r336: fine tuning pemerge 2013-03-06 23:38:07 -05:00
Heng Li 557d50c7e1 r335: fixed a compiling error
Caused by the last change
2013-03-06 21:57:13 -05:00
Heng Li 042e1f4442 r334: added pemerge to bwa 2013-03-06 21:55:02 -05:00
Heng Li 773b86331b De-overlap paired-end reads 2013-03-06 19:23:45 -05:00
Heng Li 5fbd454682 r332: added output threshold
Otherwise there are far too many short hits
2013-03-05 22:49:38 -05:00
Heng Li 6476343a83 r331: rewrote CIGAR generation for bwa-short
When backtracking, bwa-short does not keep the detailed alignment or the exact
start and end positions. To find the boundary and the CIGAR, the old code does
a global alignment with a small end-gap penalty. It then deals with a lot of
special cases to derive the right position and CIGAR, which are actually not
always right. It is a mess.

As the new ksw.{c,h} does not support a different end-gap penalty, the old
strategy does not work. But we get something better. The new code finds the
boundaries with ksw_extend(). It is cleaner and gives more accurate CIGAR in
most cases.
2013-03-05 19:56:37 -05:00
Heng Li a76b75f41e Merge pull request #14 from drkeoni/master
Small fix for possible compile problem on Ubuntu systems
2013-03-05 12:57:10 -08:00
Jon Sorenson 25366c7220 Fixing problem with linking to libm on some Ubuntu systems (I see this on machine running 11.04, kernel 3.0.0-14-virtual). Changing order of -lm on the command line seems to do the trick and should be tolerated in other environments. 2013-03-05 20:48:16 +00:00
Heng Li 98f8966750 r329: ditch stdaln.{c,h}; no changes to bwa-mem
stdaln.{c,h} was written ten years ago. Its local and SW extension code are
actually buggy (though that rarely happens and usually does not affect the
results too much). ksw.{c,h} is more concise, potentially faster, less buggy,
and richer in features.
2013-03-05 12:00:24 -05:00
Heng Li bb37e14d02 replace aln_global in bwase.c 2013-03-05 10:38:47 -05:00
Heng Li e6c262594f bwa-sw: ditch stdaln 2013-03-05 10:12:38 -05:00
Heng Li 086c9d0e7d bwa-sw: use bwa_gen_cigar() for cigar generation 2013-03-05 09:54:49 -05:00
Heng Li 07921659cf move mem_fill_scmat() to bwa.{h,c} 2013-03-05 09:38:12 -05:00
Heng Li efd9769b07 r324: a little code cleanup
The changes after r317 aim to improve the performance and accuracy for very
long query alignment. The short-read alignment should not be affected. The
changes include:

1) Z-dropoff. This is a variant of blast's X-dropoff. I orginally thought this
   heuristic only improves speed, but now I realize it also reduces poor
   alignment with long good flanking alignments. The difference from blast's
   X-dropoff is that Z-dropoff allows big gaps, but X-dropoff does not.

2) Band width doubling. When band width is too small, we will get a poor
   alignment in the middle. Sometimes such alignments cannot be fully excluded
   with Z-dropoff. Band width doubling is an alternative heuristic. It is based
   on the observation that the existing of close-to-boundary high score
   possibly implies inadequate band width. When we see such a signal, we double
   the band width.
2013-03-05 00:57:16 -05:00
Heng Li e0991d6a45 r323: added Z-dropoff, a variant of blast's X-drop 2013-03-05 00:34:33 -05:00
Heng Li d6096c3f99 bugfix: caused by the latest change 2013-03-04 18:41:57 -05:00
Heng Li 59bc9341f6 code backup; more changes coming later 2013-03-04 17:29:07 -05:00
Heng Li 733410b50d r320: speed up very long sequence alignment
100-200bp read alignment should not be affected at all.
2013-03-04 14:43:49 -05:00
Heng Li 40f1214736 change to debugging code only 2013-03-04 11:52:11 -05:00
Heng Li 7e00dbcac5 r317: bugfix - out-of-range extension
This happens when target region crosses the forward-reverse boundary. This will
almost never happen to short-read alignment.
2013-03-04 11:35:23 -05:00
Heng Li 1a451df800 prepare to ditch stdaln.{h,c} 2013-03-04 10:32:33 -05:00
Heng Li d35f33b513 r316: don't allocate zero-length memory
It is not a bug, but Electric Fence does not like that.
2013-03-04 10:22:18 -05:00
Heng Li 35fb7f9fdf r315: move kopen.o out of libbwa.a 2013-03-01 11:47:51 -05:00
Heng Li 3e4a178e08 r314: cleanup bwamem API
Don't modify input sequences; more documentations
2013-03-01 11:14:51 -05:00