fast-bwa

Commit Graph

Author	SHA1	Message	Date
Heng Li	23e0e99ec0	dev-471: fixed a compiling error from last commit	2014-04-10 11:54:17 -04:00
Heng Li	ccbbe48c4f	dev-470: don't stop on bwa_fix_xref2() failures Peter Field has sent me an example caused by an alignment bridging three adjacent chromosomes/contigs. Bwa-mem always aligns the query to the contig covering the middle point of the alignment. In this example, it chooses the middle contig, which should not be aligned. This leads to weird things failing bwa_fix_xref2(), which cannot be fixed unless we build the contig boundaries into the FM-index. In the old code, bwa-mem halts when bwa_fix_xref2() fails. With this commit, bwa-mem will give a warning instead of halting.	2014-04-10 11:43:17 -04:00
Heng Li	99f6f9a0d1	dev-467: limit the max #chains to extend	2014-04-08 21:45:49 -04:00
Heng Li	c0a308a8b6	dev-466: simplified chain filtering	2014-04-08 17:33:07 -04:00
Heng Li	f12dfae772	dev-465: a new output format for read overlap Also moved a few functions to bwamem_extra.c. File bwamem.c is becoming far too long.	2014-04-08 16:29:36 -04:00
Heng Li	172ba83241	dev-463: added option -x to change multiple params I hate to copy-paste long command line options.	2014-04-07 11:29:36 -04:00
Heng Li	114901b005	dev-r462: refined setting for PacBio; weight flt The recommended setting in the last commit is wrong. If we can extend a random seed hit to the full length, we will force the read aligned through break points, which is wrong. The new setting is better but it may lead to a small fraction of fragmented alignments. In addition, I added a filter on the minimum chain weight and tied min_HSP_score to this filter. It doubles the mapping speed.	2014-04-04 17:01:04 -04:00
Heng Li	41f720dfa7	dev-461: added a heuristic for PacBio data See the comment above mem_test_chain_sw() for details.	2014-04-04 16:05:41 -04:00
Heng Li	b6bd33b26c	dev-459: don't hard code the drop ratio In the old code, if a secondary alignment is 50% worse, it won't be outputted.	2014-04-03 18:58:49 -04:00
Heng Li	b3225581be	dev-458: simplified the smem iterator simpler but less powful.	2014-04-03 15:23:48 -04:00
Heng Li	acfe7613db	dev-457: separated interval collection and seeding	2014-04-03 15:10:50 -04:00
Heng Li	9a5705289c	added more debugging infomation I can see a bug, but I do not know where it comes from.	2014-04-03 13:38:08 -04:00
Heng Li	9ce50a4e5e	dev-450: support diff ins/del penalties. NO TEST!!	2014-03-28 14:54:06 -04:00
Heng Li	8f9aeef4ec	Merge branch 'master' into dev Conflicts: main.c	2014-03-17 00:03:52 -04:00
Heng Li	e6931bec03	r445: unnecessarily large bandwidth in global	2014-03-17 00:01:00 -04:00
Heng Li	7d63e76245	r444: more debugging output in CIGAR generation Also found a potential issue which should not affect accuracy but may hurt speed. Will investigate later.	2014-03-16 23:25:04 -04:00
Heng Li	8929bd1c25	r443: more verbose debugging information	2014-03-16 15:18:58 -04:00
Heng Li	2e9463ebf1	dev-r442: suppress exact full-length matches	2014-02-26 22:04:19 -05:00
Heng Li	52391a9855	r437: print timing for each batch of reads	2014-02-19 10:54:26 -05:00
Heng Li	f524c7d3d8	r431: added the MD tag to bwa-mem	2014-01-29 12:05:11 -05:00
Heng Li	ea3dc2f003	r430: fix a bug producing incorrect alignment Ksw uses two rounds of SSE2-SW to find the boundaries of an alignment. If the second round gives a different score from the first round, it will fail. The fix checks if this happens, though I have not dig into an example to understand why this may happen in the first place.	2014-01-29 10:51:02 -05:00
Heng Li	10cb6b0507	r428: allow to change the default chain_drop_ratio	2013-12-30 16:18:45 -05:00
Heng Li	3afcdc7746	debugging code only: print seeds	2013-12-30 16:05:43 -05:00
Heng Li	74a1a53499	print debugging msg to stdout	2013-12-30 15:49:41 -05:00
Heng Li	4219e58623	r423: bugfix - SE hits not random	2013-11-23 09:36:26 -05:00
Heng Li	ff4762f3c7	r421: bw doubling in the final alignment In some cases, the band width used in the final alignment needs to be larger than the band width in extension.	2013-11-20 10:04:16 -05:00
Heng Li	6e3fa0515a	r420: inferred bandwidth is not used in the final	2013-11-20 09:50:46 -05:00
Heng Li	deb19593aa	r418: use the new mapQ estimator by default	2013-11-02 12:25:53 -04:00
Heng Li	19d33faa30	use kthread for multi-threading Bwa-mem should have better performance with many CPU cores.	2013-11-02 12:13:11 -04:00
Heng Li	7144a0cefc	r415: bug in the new (optional) mapQ computation I may use the new method as the default. Testing needed.	2013-09-09 17:51:05 -04:00
Heng Li	ebb7b02e9b	r414: fixed a bug caused by the last commit	2013-09-09 16:57:55 -04:00
Heng Li	b51a66e4c1	r413: fixed an issue causing redundant alignment I have seen a fosmid aligned to the same position but with two slightly different CIGARs: 30000M and 29900M50D100M, possibly caused by tandem repeats. 0.7.5a will regard them as two distinct alignments and generates a very small mapping quality. However, these two are essentially the same. Although there is ambiguity in aligning the end of the fosmid, we should not penalize the entire alignment with a small mapQ. This commit fixes this issue. More testing is needed, though.	2013-09-09 11:36:50 -04:00
Heng Li	1e2cff20ba	more conservative mapQ	2013-09-09 08:57:45 -04:00
Heng Li	1346f03ff1	use the old mapQ by default the new mapQ overestimate	2013-09-06 14:04:41 -04:00
Heng Li	451d60f3be	slight modification	2013-09-06 12:37:38 -04:00
Heng Li	623da055e1	alternative way to estimate mapQ the old mapQ estimate is too conservative	2013-09-06 12:31:47 -04:00
Heng Li	3b84c03c1e	r406: allow to use diff clipping penalties for 5'-end or for 3'-end	2013-08-28 15:59:05 -04:00
Heng Li	bde5005f39	r396: er... the new tag is named SA not SP	2013-05-23 12:48:18 -04:00
Heng Li	3d2450ed97	r395: bugfix - hard clipping not applied on revaln	2013-05-23 12:45:14 -04:00
Heng Li	9441bb7f2a	r394: added future plan	2013-05-22 20:02:53 -04:00
Heng Li	0e759bc1f5	removed a redundant flag	2013-05-22 19:55:07 -04:00
Heng Li	9735d7a31a	conform to the latest (unpublished) SAM spec for chimeric alignments	2013-05-22 19:45:16 -04:00
Heng Li	9a6abe51b6	r391: better method to resolve xref alignment The old method does not work when the alignment bridges three chr. This may actually happen often. The new method does not work all the time, either, but should be better than the old one. It is also simpler, arguably.	2013-05-22 18:57:51 -04:00
Rob Davies	96e445d9e4	Reduce dependency on utils.h - new malloc wrapping scheme. Remove xmalloc, xcalloc, xrealloc and xstrdup from utils.h and revert calls to the normal malloc, calloc, realloc, strdup. Add new files malloc_wrap.[ch] with the wrapper functions. malloc_wrap.h #defines malloc etc. to the wrapper, but only if USE_MALLOC_WRAPPERS has been defined. Put #include "malloc_wrap.h" in any file that uses *alloc or strdup. This is also in a #ifdef USE_MALLOC_WRAPPERS ... #endif block to make using the wrappers optional. Add -DUSE_MALLOC_WRAPPERS into the makefile so they should normally get added. This is an improvement on the previous method as we now don't need to worry about stray function calls that were not changed to the wrapped version and the code will still work even if the wrapping is disabled. Other possible methods of doing this are using malloc_hook (glibc-specific), adding -include malloc_wrap.h to the gcc command-line (somewhat gcc-specific) or making our own malloc function and using dlopen (scary). This way is probably the most portable.	2013-05-02 15:12:01 +01:00
Rob Davies	e88529687f	Merge branch 'master' into master_fixes. Merged up to r389. Conflicts: bwamem.c kopen.c	2013-04-29 12:09:30 +01:00
Heng Li	19cb7cd7ed	r388: cleanup mem_process_seqs() interface Print output outside the function and allow to feed insert size distribution.	2013-04-26 12:31:18 -04:00
Rob Davies	4cb5110d03	Merge branch 'master' into master_fixes	2013-04-22 09:51:07 +01:00
Heng Li	2087dc162f	r377: increased unpaired penalty from 9 to 17 This leads to more aggressive pairing - more properly paired reads. I have found a few cases where, for example, read1 is umambiguously mapped to chr20 while its 100bp mate has a perfect match to another chr but has 3 mismatches and 1 deletion when it is paired with read1 on chr20. With longer reads, it seems that the chr20 hit is correct, although it is not obvious how this happened in evolution.	2013-04-17 16:50:20 -04:00
Rob Davies	3dd10bd7db	Merge branch 'master' into master_fixes	2013-04-12 16:20:13 +01:00
Rob Davies	90ecd344ba	Merge branch 'master' into master_fixes. Merged up to master r375. Conflicts: bwt.c	2013-04-11 11:15:39 +01:00
Heng Li	499cf4c00d	r376: reduce wasteful seed extension mainly for contig alignment	2013-04-10 12:18:56 -04:00
Heng Li	3d8a8c1e37	r374: fix - clipping penalty not always working This only happens to gaps where mem underestimates the bandwidth without considering the clipping penalty.	2013-04-10 01:09:37 -04:00
Heng Li	d7ca0885eb	r371: extend overlapping seeds to avoid misalignment in tandem repeats	2013-04-04 00:43:43 -04:00
Heng Li	1e118e0823	r370: suppress "D" at the end of a cigar This is caused by seeds in tandem repeats, in which case, bwa-mem may not extend the true seed. The change in this commit is only a temporary cure.	2013-04-03 23:57:19 -04:00
Rob Davies	c89756e2b0	Merge branch 'master' into master_fixes	2013-03-19 12:11:51 +00:00
Heng Li	8437cd4edd	r369: bugfix - segfault caused by the last change Sigh... Even the simplest change can lead to new bugs.	2013-03-19 01:04:57 -04:00
Heng Li	1e3cadbfc2	r368: bugfix - wrong CIGAR when bridging 3 contigs In this case, bwa_fix_xref() will return insane coordinates. The old version did not check the return status and write wrong CIGAR. This bug only happen to very short assembly contigs.	2013-03-18 20:49:32 -04:00
Rob Davies	c862a1a396	Merge branch 'master' into master_fixes	2013-03-18 13:35:12 +00:00
Heng Li	9346acde1b	Release bwa-0.7.3a-r367 In 0.7.3, the wrong CIGAR bug was only fixed in one scenario, but not fixed in another corner case.	2013-03-15 21:26:37 -04:00
Heng Li	dd51177837	r365: bugfix - wrong alignment (right mapping) The bug only happens when there is a 1bp del and 1bp ins which are close to the end and there are no other substitutions or indels. In this case, bwa mem gave a wrong band width.	2013-03-15 11:59:05 -04:00
Rob Davies	cca27c1ef5	Merge branch 'master' into master_fixes Conflicts: bwamem.c bwamem_pair.c example.c	2013-03-13 12:12:28 +00:00
Heng Li	bdf34f6ce7	r363: XA=>XP; output mapQ in XP In BWA, XA gives hits "shadowed" by the primary hit. In BWA-MEM, we output primary hits only. Primary hits may have non-zero mapping quality.	2013-03-12 09:56:04 -04:00
Heng Li	c29b176cb6	r362: bugfix - occasionally wrong TLEN Use the 0.7.2 way to compute TLEN	2013-03-12 00:14:36 -04:00
Heng Li	dab5b17c1a	r360: output alternative primary alignments in XA	2013-03-11 23:43:58 -04:00
Heng Li	6c665189ad	r359: identical output to 0.7.2 (without -a)	2013-03-11 23:16:18 -04:00
Heng Li	0f88103d2a	SAM almost identical to 0.7.2	2013-03-11 23:01:51 -04:00
Heng Li	26f4c704ed	drop the old SAM writer	2013-03-11 22:24:54 -04:00
Heng Li	ebb45dc42e	new code works for SE	2013-03-11 21:59:15 -04:00
Heng Li	c7edaa8e84	to test the new sam writer...	2013-03-11 21:55:52 -04:00
Heng Li	47952b6f3f	drop an unnecessary member from mem_aln_t	2013-03-11 21:35:32 -04:00
Heng Li	8f0d439913	prepare to replace the SAM printing code This move is dangerous as SAM printing is very complex, but it will benefit in the long run. The planned change will reduce the redundancy, improves clarity and most importantly makes it much easier to output multiple primary hits in an optional tag.	2013-03-11 21:25:17 -04:00
Rob Davies	9228e48efd	Merge branch 'master' into master_fixes Conflicts: Makefile	2013-03-11 13:50:49 +00:00
Heng Li	9ea7f83974	Emergent bugfix: wrong TLEN sign It is interesting that Picard did not find the issue.	2013-03-09 18:03:15 -05:00
Heng Li	66c9783daf	r345: bugfix in mem - wrong mate strand for unmap Received a clean bill from Picard	2013-03-08 13:15:43 -05:00
Heng Li	af7b4d8980	gcc wrongly thinks a variable may be uninitialized It should always be initialized. To avoid a warning, made a change.	2013-03-08 12:45:50 -05:00
Heng Li	274c0ac96c	r343: bugfix in mem - wrong mate info for unmap SAM generation is always among the nastiest bits. I would need to refactor at some point (hardly happening).	2013-03-08 12:40:31 -05:00
Rob Davies	aabd990e8f	Merge branch 'master' into master_fixes Conflicts: Makefile bwape.c bwase.c bwtsw2_aux.c stdaln.c	2013-03-08 16:46:45 +00:00
Heng Li	5fbd454682	r332: added output threshold Otherwise there are far too many short hits	2013-03-05 22:49:38 -05:00
Heng Li	07921659cf	move mem_fill_scmat() to bwa.{h,c}	2013-03-05 09:38:12 -05:00
Rob Davies	8a078cc16d	Merge branch 'master' into master_fixes Conflicts: bntseq.c bwamem.c	2013-03-05 10:21:07 +00:00
Heng Li	efd9769b07	r324: a little code cleanup The changes after r317 aim to improve the performance and accuracy for very long query alignment. The short-read alignment should not be affected. The changes include: 1) Z-dropoff. This is a variant of blast's X-dropoff. I orginally thought this heuristic only improves speed, but now I realize it also reduces poor alignment with long good flanking alignments. The difference from blast's X-dropoff is that Z-dropoff allows big gaps, but X-dropoff does not. 2) Band width doubling. When band width is too small, we will get a poor alignment in the middle. Sometimes such alignments cannot be fully excluded with Z-dropoff. Band width doubling is an alternative heuristic. It is based on the observation that the existing of close-to-boundary high score possibly implies inadequate band width. When we see such a signal, we double the band width.	2013-03-05 00:57:16 -05:00
Heng Li	e0991d6a45	r323: added Z-dropoff, a variant of blast's X-drop	2013-03-05 00:34:33 -05:00
Heng Li	d6096c3f99	bugfix: caused by the latest change	2013-03-04 18:41:57 -05:00
Heng Li	59bc9341f6	code backup; more changes coming later	2013-03-04 17:29:07 -05:00
Heng Li	733410b50d	r320: speed up very long sequence alignment 100-200bp read alignment should not be affected at all.	2013-03-04 14:43:49 -05:00
Heng Li	40f1214736	change to debugging code only	2013-03-04 11:52:11 -05:00
Heng Li	7e00dbcac5	r317: bugfix - out-of-range extension This happens when target region crosses the forward-reverse boundary. This will almost never happen to short-read alignment.	2013-03-04 11:35:23 -05:00
Heng Li	3e4a178e08	r314: cleanup bwamem API Don't modify input sequences; more documentations	2013-03-01 11:14:51 -05:00
Rob Davies	6beab5f765	Merge branch 'master' into master_fixes Merge changes to commit `c5434ac` (0.7.0 release) Conflicts: Makefile bwamem.c	2013-03-01 10:22:49 +00:00
Rob Davies	3d33ab063e	Merge branch 'master' into master_fixes Merged to master version `b621d3a` Conflicts: Makefile bntseq.c bwa.c bwase.c bwaseqio.c bwtaln.c bwtindex.c bwtio.c bwtmisc.c bwtsw2_aux.c cs2nt.c fastmap.c khash.h kseq.h ksw.c kvec.h simple_dp.c utils.c utils.h	2013-03-01 09:37:46 +00:00
Heng Li	f3cff1c609	r311: even tighter bw for CIGAR	2013-02-27 23:59:50 -05:00
Heng Li	a33b9c0633	tighter bw for cigar SW	2013-02-27 23:40:46 -05:00
Heng Li	6a4d8c79d8	r309: bugfix - soft clipping missing in example.c	2013-02-27 22:45:18 -05:00
Heng Li	df7c3f0000	r308: added a new API to convert region to CIGAR and an example program demonstrating how to do single-end alignment in <50 lines of C code.	2013-02-27 22:28:29 -05:00
Heng Li	4bb0bdddca	r306: introduce clipping penalty More clipping leads to more severe reference bias. We should not clip the alignment unless necessary.	2013-02-27 21:13:39 -05:00
Heng Li	65e099df34	r300: fixed an out-of-boundary bug in rare case	2013-02-27 00:37:17 -05:00
Heng Li	0b533385ef	r299: better way to exclude seed	2013-02-27 00:29:11 -05:00
Heng Li	ee80fb8bd0	Test each seed to see if extension is needed The old version wastefully extends many seeds contained in an aligned region found before. While this wastes little time for short reads, it becomes a serious defect for long query sequences. This is an attempt to fix this problem, but more tuning are needed.	2013-02-26 22:55:44 -05:00
Heng Li	acd1ab607b	r297: reduce wasteful SW extension This is particularly important for long sequences	2013-02-26 16:26:46 -05:00
Heng Li	98787f0ae0	r295: generate NM	2013-02-26 13:36:01 -05:00
Heng Li	32f2d60a2e	r294: bugfix - -M not working	2013-02-26 13:14:33 -05:00
Heng Li	619ac4f93d	r293: bugfix - wrong RG type in SAM output	2013-02-26 13:03:35 -05:00
Heng Li	e70c7c2a71	r284: amend cross-reference hit I really hate this: complex and twisted logic for a nasty scenario that almost never happens to short reads - but it may become serious when the reference genome consists of many contigs. On toy examples, the code seems to work. Don't know if it really works...	2013-02-26 00:03:49 -05:00
Heng Li	77b5b586ad	r282: set min split_len to read length	2013-02-25 17:29:35 -05:00
Heng Li	d19e834d84	r280: align two ends in the same thread Otherwise odd-number threads may be of different speed from even-number threads.	2013-02-25 15:40:15 -05:00
Heng Li	20aa848b3c	r279: for PE mapq, consider the number of pairs If there are a lot of proper pairs, it is more likely that the best pair is wrong.	2013-02-25 13:00:35 -05:00
Heng Li	9957e04590	r278: don't perform too many mate-sw	2013-02-25 11:56:02 -05:00
Heng Li	5ead86acd3	optionally mark split hit as secondary	2013-02-25 11:18:35 -05:00
Heng Li	514563bd0a	no poor hits with -a; reduce mapq for 2nd primary	2013-02-25 10:54:12 -05:00
Heng Li	29e41b592c	bugfix: isize is off by 1	2013-02-24 23:00:51 -05:00
Heng Li	85775c3384	output multiple hits	2013-02-24 13:23:43 -05:00
Heng Li	6bdccf2a8a	added a bit documentation	2013-02-24 13:09:29 -05:00
Heng Li	ee59a13109	simplified bwamem.h Hide mem_seed_t and mem_chain_t. Don't expose unnecessary routines.	2013-02-24 12:17:29 -05:00
Heng Li	cda85be059	fixed a couple bugs identified by gcc Recent gcc is better.	2013-02-23 17:15:07 -05:00
Heng Li	b4c38bcc1c	append fasta/q comment	2013-02-23 16:57:34 -05:00
Heng Li	ee4540c394	support read group in bwa-mem	2013-02-23 16:41:44 -05:00
Heng Li	67543f19a1	code refactoring	2013-02-23 15:55:55 -05:00
Heng Li	e613195e17	moved some common code to bwa.{c,h}	2013-02-23 15:30:46 -05:00
Heng Li	d460f2ec9e	bugfix in multi-threaded bwa-mem	2013-02-23 14:48:54 -05:00
Heng Li	904c3205c0	removed a few unused variables These variables have been assigned but never actually used. Reported by gcc-4.7. Lower version cannot give such warnings.	2013-02-23 13:26:50 -05:00
Heng Li	17c123d65a	pring paired-end SAM	2013-02-22 16:38:48 -05:00
Heng Li	ba15b787cb	rework PE mapq; don't know if better	2013-02-22 14:47:57 -05:00
Heng Li	c5ce72f593	scoring pairs by score, not by errors This is important for bwa-mem which does local alignment. A short exact match is worse than a long inexact match. Also fixed a bug in approximating mapping quality.	2013-02-22 12:10:20 -05:00
Heng Li	d4cf6d97a6	bugfix: memory leak	2013-02-21 15:04:31 -05:00
Heng Li	a578688fa8	generate multiple alignments from one chain	2013-02-21 14:58:51 -05:00
Heng Li	cfbc4c89e3	perform extension when there are, say, 20bp tandem	2013-02-21 14:34:10 -05:00
Heng Li	54da54ffd4	extend more seeds (and thus slower...)	2013-02-21 12:52:00 -05:00
Heng Li	f8829318cf	weakened the chain filter	2013-02-21 12:25:20 -05:00
Heng Li	84a328764a	bugfix: mis-chaining caused by integer overflow I really need to rewrite kbtree some time.	2013-02-21 11:42:30 -05:00
Heng Li	ea8f4f4d34	clean bill from valgrind	2013-02-20 20:26:57 -05:00
Heng Li	5626fe29b7	Well, at least output sth	2013-02-20 19:11:44 -05:00
Heng Li	a7d574d125	backup comments	2013-02-20 01:11:38 -05:00
Heng Li	688872fb1b	code backup	2013-02-19 00:50:39 -05:00
Heng Li	66585b7982	code backup	2013-02-18 16:33:06 -05:00
Heng Li	ea9fc7df48	keep the number of SW performed	2013-02-16 11:03:27 -05:00
Heng Li	5f8c6efbc3	forbid x-bounary bns_get_seq(); code backup	2013-02-16 09:48:44 -05:00
Heng Li	604e3d8da1	code backup; to upgrade ksw.{c,h}	2013-02-12 16:15:26 -05:00
Heng Li	325ba8213b	move mark primary to worker1()	2013-02-12 15:54:55 -05:00
Heng Li	cd0969332f	keep track of the "parent" of a secondary	2013-02-12 15:52:23 -05:00
Heng Li	22b79b3475	mark primary, instead of dropping secondary	2013-02-12 15:34:44 -05:00
Heng Li	2fc469d0c9	code backup	2013-02-12 12:09:36 -05:00
Heng Li	95d18449b3	merge bseq.{h,c} to utils.{h,c} I do not like many small files.	2013-02-12 10:36:15 -05:00
Heng Li	13288e2dcd	code backup	2013-02-12 09:22:47 -05:00
Heng Li	99907c98fb	separated and improved SAM printing code This is for the PE mode. The routines may also be useful for bwa-sw, but probably I won't change the old code.	2013-02-11 15:29:03 -05:00
Heng Li	987d4b4205	fixed a stupid bug in fastq reading	2013-02-11 11:27:35 -05:00
Heng Li	59eaf650ac	code backup	2013-02-11 10:59:38 -05:00
Heng Li	f4c0672800	move sort_and_dedup() to worker1()	2013-02-10 12:55:19 -05:00
Heng Li	c310fb7424	a little refactoring for PE support	2013-02-10 12:24:33 -05:00
Heng Li	829664d6b5	missing identical hits; improved sub_n	2013-02-08 17:55:35 -05:00
Heng Li	b2c7148dc9	consider the number of suboptimal hits	2013-02-08 17:20:44 -05:00

1 2 3 4 5 ...

294 Commits (main)