fast-bwa

Commit Graph

Author	SHA1	Message	Date
Heng Li	2087dc162f	r377: increased unpaired penalty from 9 to 17 This leads to more aggressive pairing - more properly paired reads. I have found a few cases where, for example, read1 is umambiguously mapped to chr20 while its 100bp mate has a perfect match to another chr but has 3 mismatches and 1 deletion when it is paired with read1 on chr20. With longer reads, it seems that the chr20 hit is correct, although it is not obvious how this happened in evolution.	2013-04-17 16:50:20 -04:00
Heng Li	499cf4c00d	r376: reduce wasteful seed extension mainly for contig alignment	2013-04-10 12:18:56 -04:00
Heng Li	3d8a8c1e37	r374: fix - clipping penalty not always working This only happens to gaps where mem underestimates the bandwidth without considering the clipping penalty.	2013-04-10 01:09:37 -04:00
Heng Li	d7ca0885eb	r371: extend overlapping seeds to avoid misalignment in tandem repeats	2013-04-04 00:43:43 -04:00
Heng Li	1e118e0823	r370: suppress "D" at the end of a cigar This is caused by seeds in tandem repeats, in which case, bwa-mem may not extend the true seed. The change in this commit is only a temporary cure.	2013-04-03 23:57:19 -04:00
Heng Li	8437cd4edd	r369: bugfix - segfault caused by the last change Sigh... Even the simplest change can lead to new bugs.	2013-03-19 01:04:57 -04:00
Heng Li	1e3cadbfc2	r368: bugfix - wrong CIGAR when bridging 3 contigs In this case, bwa_fix_xref() will return insane coordinates. The old version did not check the return status and write wrong CIGAR. This bug only happen to very short assembly contigs.	2013-03-18 20:49:32 -04:00
Heng Li	9346acde1b	Release bwa-0.7.3a-r367 In 0.7.3, the wrong CIGAR bug was only fixed in one scenario, but not fixed in another corner case.	2013-03-15 21:26:37 -04:00
Heng Li	dd51177837	r365: bugfix - wrong alignment (right mapping) The bug only happens when there is a 1bp del and 1bp ins which are close to the end and there are no other substitutions or indels. In this case, bwa mem gave a wrong band width.	2013-03-15 11:59:05 -04:00
Heng Li	bdf34f6ce7	r363: XA=>XP; output mapQ in XP In BWA, XA gives hits "shadowed" by the primary hit. In BWA-MEM, we output primary hits only. Primary hits may have non-zero mapping quality.	2013-03-12 09:56:04 -04:00
Heng Li	c29b176cb6	r362: bugfix - occasionally wrong TLEN Use the 0.7.2 way to compute TLEN	2013-03-12 00:14:36 -04:00
Heng Li	dab5b17c1a	r360: output alternative primary alignments in XA	2013-03-11 23:43:58 -04:00
Heng Li	6c665189ad	r359: identical output to 0.7.2 (without -a)	2013-03-11 23:16:18 -04:00
Heng Li	0f88103d2a	SAM almost identical to 0.7.2	2013-03-11 23:01:51 -04:00
Heng Li	26f4c704ed	drop the old SAM writer	2013-03-11 22:24:54 -04:00
Heng Li	ebb45dc42e	new code works for SE	2013-03-11 21:59:15 -04:00
Heng Li	c7edaa8e84	to test the new sam writer...	2013-03-11 21:55:52 -04:00
Heng Li	47952b6f3f	drop an unnecessary member from mem_aln_t	2013-03-11 21:35:32 -04:00
Heng Li	8f0d439913	prepare to replace the SAM printing code This move is dangerous as SAM printing is very complex, but it will benefit in the long run. The planned change will reduce the redundancy, improves clarity and most importantly makes it much easier to output multiple primary hits in an optional tag.	2013-03-11 21:25:17 -04:00
Heng Li	9ea7f83974	Emergent bugfix: wrong TLEN sign It is interesting that Picard did not find the issue.	2013-03-09 18:03:15 -05:00
Heng Li	66c9783daf	r345: bugfix in mem - wrong mate strand for unmap Received a clean bill from Picard	2013-03-08 13:15:43 -05:00
Heng Li	af7b4d8980	gcc wrongly thinks a variable may be uninitialized It should always be initialized. To avoid a warning, made a change.	2013-03-08 12:45:50 -05:00
Heng Li	274c0ac96c	r343: bugfix in mem - wrong mate info for unmap SAM generation is always among the nastiest bits. I would need to refactor at some point (hardly happening).	2013-03-08 12:40:31 -05:00
Heng Li	5fbd454682	r332: added output threshold Otherwise there are far too many short hits	2013-03-05 22:49:38 -05:00
Heng Li	07921659cf	move mem_fill_scmat() to bwa.{h,c}	2013-03-05 09:38:12 -05:00
Heng Li	efd9769b07	r324: a little code cleanup The changes after r317 aim to improve the performance and accuracy for very long query alignment. The short-read alignment should not be affected. The changes include: 1) Z-dropoff. This is a variant of blast's X-dropoff. I orginally thought this heuristic only improves speed, but now I realize it also reduces poor alignment with long good flanking alignments. The difference from blast's X-dropoff is that Z-dropoff allows big gaps, but X-dropoff does not. 2) Band width doubling. When band width is too small, we will get a poor alignment in the middle. Sometimes such alignments cannot be fully excluded with Z-dropoff. Band width doubling is an alternative heuristic. It is based on the observation that the existing of close-to-boundary high score possibly implies inadequate band width. When we see such a signal, we double the band width.	2013-03-05 00:57:16 -05:00
Heng Li	e0991d6a45	r323: added Z-dropoff, a variant of blast's X-drop	2013-03-05 00:34:33 -05:00
Heng Li	d6096c3f99	bugfix: caused by the latest change	2013-03-04 18:41:57 -05:00
Heng Li	59bc9341f6	code backup; more changes coming later	2013-03-04 17:29:07 -05:00
Heng Li	733410b50d	r320: speed up very long sequence alignment 100-200bp read alignment should not be affected at all.	2013-03-04 14:43:49 -05:00
Heng Li	40f1214736	change to debugging code only	2013-03-04 11:52:11 -05:00
Heng Li	7e00dbcac5	r317: bugfix - out-of-range extension This happens when target region crosses the forward-reverse boundary. This will almost never happen to short-read alignment.	2013-03-04 11:35:23 -05:00
Heng Li	3e4a178e08	r314: cleanup bwamem API Don't modify input sequences; more documentations	2013-03-01 11:14:51 -05:00
Heng Li	f3cff1c609	r311: even tighter bw for CIGAR	2013-02-27 23:59:50 -05:00
Heng Li	a33b9c0633	tighter bw for cigar SW	2013-02-27 23:40:46 -05:00
Heng Li	6a4d8c79d8	r309: bugfix - soft clipping missing in example.c	2013-02-27 22:45:18 -05:00
Heng Li	df7c3f0000	r308: added a new API to convert region to CIGAR and an example program demonstrating how to do single-end alignment in <50 lines of C code.	2013-02-27 22:28:29 -05:00
Heng Li	4bb0bdddca	r306: introduce clipping penalty More clipping leads to more severe reference bias. We should not clip the alignment unless necessary.	2013-02-27 21:13:39 -05:00
Heng Li	65e099df34	r300: fixed an out-of-boundary bug in rare case	2013-02-27 00:37:17 -05:00
Heng Li	0b533385ef	r299: better way to exclude seed	2013-02-27 00:29:11 -05:00
Heng Li	ee80fb8bd0	Test each seed to see if extension is needed The old version wastefully extends many seeds contained in an aligned region found before. While this wastes little time for short reads, it becomes a serious defect for long query sequences. This is an attempt to fix this problem, but more tuning are needed.	2013-02-26 22:55:44 -05:00
Heng Li	acd1ab607b	r297: reduce wasteful SW extension This is particularly important for long sequences	2013-02-26 16:26:46 -05:00
Heng Li	98787f0ae0	r295: generate NM	2013-02-26 13:36:01 -05:00
Heng Li	32f2d60a2e	r294: bugfix - -M not working	2013-02-26 13:14:33 -05:00
Heng Li	619ac4f93d	r293: bugfix - wrong RG type in SAM output	2013-02-26 13:03:35 -05:00
Heng Li	e70c7c2a71	r284: amend cross-reference hit I really hate this: complex and twisted logic for a nasty scenario that almost never happens to short reads - but it may become serious when the reference genome consists of many contigs. On toy examples, the code seems to work. Don't know if it really works...	2013-02-26 00:03:49 -05:00
Heng Li	77b5b586ad	r282: set min split_len to read length	2013-02-25 17:29:35 -05:00
Heng Li	d19e834d84	r280: align two ends in the same thread Otherwise odd-number threads may be of different speed from even-number threads.	2013-02-25 15:40:15 -05:00
Heng Li	20aa848b3c	r279: for PE mapq, consider the number of pairs If there are a lot of proper pairs, it is more likely that the best pair is wrong.	2013-02-25 13:00:35 -05:00
Heng Li	9957e04590	r278: don't perform too many mate-sw	2013-02-25 11:56:02 -05:00

1 2 3

137 Commits (8896cb942e320ccda75dd77d8b9aa5e009e80b21)