fast-bwa

Commit Graph

Author	SHA1	Message	Date
zzh	7d085962a2	开始改成sbwa那种batch模式	2024-03-07 18:23:21 +08:00
zzh	6e1dd08fb6	将seed和extend部分修改成了batch模式，好像没啥效果	2024-02-23 01:09:08 +08:00
zzh	463f7da138	将smem1函数用fmt结构实现了，结果基本正确	2024-02-07 22:08:51 +08:00
Heng Li	02a9add042	added MIT license to some non-GPL source files	2020-07-01 23:02:01 -04:00
Heng Li	11d53b06fb	Merge branch 'dev' into XB	2018-04-02 10:47:45 -04:00
Heng Li	eb7dbc1429	optionally write XB to include alignment score request from 4DN-DCIC	2018-04-02 10:43:41 -04:00
Heng Li	b582816211	r1187: a typo in command line help	2017-09-26 09:35:19 -04:00
Heng Li	f123871451	r1142: added option -5 for Hi-C data	2016-05-31 11:01:36 -04:00
Heng Li	1247dc2346	removed the pbread mode It is not working well. Not at all.	2016-01-26 22:37:01 -05:00
Heng Li	80e4ecfa79	r998: smart pairing; allow mixture of SE/PE reads	2014-11-18 14:30:22 -05:00
Heng Li	76a15ea91b	r933: with bwa-postalt ready, drop option -g	2014-10-21 00:23:14 -04:00
Heng Li	a03d01f944	r878: XA is given to the best alignment Non-ALT hits may get ALT hits in the XA tag. This will simplify haplotype assignment.	2014-09-30 13:50:51 -04:00
Heng Li	9af36064e8	r867: fixed a few bugs; added ALT hits to XA	2014-09-19 16:50:21 -04:00
Heng Li	c982443210	r854: improved the calculation of pa and build pa filtering into BWA-MEM	2014-09-17 16:26:28 -04:00
Heng Li	6f37c14f26	r848: tag alignments with primary ALT	2014-09-16 18:52:49 -04:00
Heng Li	ca61fe3ad5	code backup	2014-09-08 08:52:02 -04:00
Heng Li	35ac99b4f7	r815: optionally output ref fasta header Also fixed a bug in reading .ann files	2014-08-29 10:51:23 -04:00
Heng Li	b5cba257c1	r809: new strategy for the -a mode	2014-08-25 11:59:27 -04:00
Heng Li	39a6cd5bb0	r762: cleanup for the new release; unfinished It will take to make the documentation ready.	2014-05-11 15:15:44 -04:00
Heng Li	43b498a37e	r759: bugfix - frac_rep not working Also added commented code for a 3rd round seeding. Not used.	2014-05-09 14:56:59 -04:00
Heng Li	6db761e269	r746: tuned heuristic for GRCh38 Reduced -c to 500 by default. As a compensation, we choose up to 1000 positions if a seed has 500 or more occurrences. In addition, a read with big portion from such seeds will have lower mapping quality.	2014-05-02 16:06:27 -04:00
Heng Li	c6c943f9d7	r738: output multi-map in the XA tag (SE only) ... PE support coming soon	2014-04-30 16:46:05 -04:00
Heng Li	88f89be60e	r736: improved in low-complexity regions Example: GGAGGGGAAGGGTGGGCTGGAGGGGACGGGTGGGCTGGAGGGGAAGGGTGTGCTGGAGGGAAAAGGTGGACTGGAGGGGAAGGGTGGGCTGGAGGGGAAGG This read has 5 chains, two of which are: weight=80 26;26;0,4591439948(10:-3095894) 23;23;27,4591439957(10:-3095888) 31;31;70,4591439964(10:-3095873) weight=50 45;45;51,4591440017(10:-3095806) 50;50;51,4591440017(10:-3095801) 31;31;70,4591440090(10:-3095747) Extension from the 26bp seed in the 1st chain gives an alignment [0,101) <=> [4591439948,4591440067), which contains the 50bp seed in the second chain. However, if we extend the 50bp seed, it yields a better alignment [0,101) <=> [4591439966,4591440067) with a different starting position. The 26bp seed is wrong. This commit adds a heuristic to fix this issue.	2014-04-30 14:14:20 -04:00
Heng Li	b92bbb47e5	Merge branch '0.7.7-softclip' into layout Conflicts: Makefile bwamem.h fastmap.c main.c	2014-04-24 12:24:49 -04:00
Heng Li	8c12ec4a4b	r725: optionally disable hard clipping as is reqested by the cancer group	2014-04-24 11:56:43 -04:00
Heng Li	b93fca2b2e	r723: merge adjacent hits	2014-04-16 16:38:50 -04:00
Heng Li	00a07f61bf	r721: merge overlapping hits by default	2014-04-15 16:16:04 -04:00
Heng Li	4e22270eba	r718: merge alnregs overlapping on both query/ref	2014-04-14 17:01:17 -04:00
Heng Li	8638cfadc8	dev-472: get rid of bwa_fix_xref() This function causes all kinds of problems when the reference genome consists of many short reads/contigs/chromsomes. Some of the problems are nearly unfixable at the point where bwa_fix_xref() gets called. This commit attempts to fix the problem at the root. It disallows chains spanning multiple contigs and never retrieves sequences bridging two adjacent contigs. Thus all the chaining, extension, SW and global alignments are confined to on contig only. This commit brings many changes. I have tested it on a couple examples including Peter Field's PacBio example. It works well so far.	2014-04-10 20:54:27 -04:00
Heng Li	ccbbe48c4f	dev-470: don't stop on bwa_fix_xref2() failures Peter Field has sent me an example caused by an alignment bridging three adjacent chromosomes/contigs. Bwa-mem always aligns the query to the contig covering the middle point of the alignment. In this example, it chooses the middle contig, which should not be aligned. This leads to weird things failing bwa_fix_xref2(), which cannot be fixed unless we build the contig boundaries into the FM-index. In the old code, bwa-mem halts when bwa_fix_xref2() fails. With this commit, bwa-mem will give a warning instead of halting.	2014-04-10 11:43:17 -04:00
Heng Li	99f6f9a0d1	dev-467: limit the max #chains to extend	2014-04-08 21:45:49 -04:00
Heng Li	f12dfae772	dev-465: a new output format for read overlap Also moved a few functions to bwamem_extra.c. File bwamem.c is becoming far too long.	2014-04-08 16:29:36 -04:00
Heng Li	114901b005	dev-r462: refined setting for PacBio; weight flt The recommended setting in the last commit is wrong. If we can extend a random seed hit to the full length, we will force the read aligned through break points, which is wrong. The new setting is better but it may lead to a small fraction of fragmented alignments. In addition, I added a filter on the minimum chain weight and tied min_HSP_score to this filter. It doubles the mapping speed.	2014-04-04 17:01:04 -04:00
Heng Li	41f720dfa7	dev-461: added a heuristic for PacBio data See the comment above mem_test_chain_sw() for details.	2014-04-04 16:05:41 -04:00
Heng Li	b3225581be	dev-458: simplified the smem iterator simpler but less powful.	2014-04-03 15:23:48 -04:00
Heng Li	9ce50a4e5e	dev-450: support diff ins/del penalties. NO TEST!!	2014-03-28 14:54:06 -04:00
Heng Li	2e9463ebf1	dev-r442: suppress exact full-length matches	2014-02-26 22:04:19 -05:00
Heng Li	4219e58623	r423: bugfix - SE hits not random	2013-11-23 09:36:26 -05:00
Heng Li	b51a66e4c1	r413: fixed an issue causing redundant alignment I have seen a fosmid aligned to the same position but with two slightly different CIGARs: 30000M and 29900M50D100M, possibly caused by tandem repeats. 0.7.5a will regard them as two distinct alignments and generates a very small mapping quality. However, these two are essentially the same. Although there is ambiguity in aligning the end of the fosmid, we should not penalize the entire alignment with a small mapQ. This commit fixes this issue. More testing is needed, though.	2013-09-09 11:36:50 -04:00
Heng Li	623da055e1	alternative way to estimate mapQ the old mapQ estimate is too conservative	2013-09-06 12:31:47 -04:00
Heng Li	3b84c03c1e	r406: allow to use diff clipping penalties for 5'-end or for 3'-end	2013-08-28 15:59:05 -04:00
Heng Li	9735d7a31a	conform to the latest (unpublished) SAM spec for chimeric alignments	2013-05-22 19:45:16 -04:00
Heng Li	19cb7cd7ed	r388: cleanup mem_process_seqs() interface Print output outside the function and allow to feed insert size distribution.	2013-04-26 12:31:18 -04:00
Heng Li	53bb846407	r373: optionally distable mate rescue	2013-04-09 16:13:55 -04:00
Heng Li	9346acde1b	Release bwa-0.7.3a-r367 In 0.7.3, the wrong CIGAR bug was only fixed in one scenario, but not fixed in another corner case.	2013-03-15 21:26:37 -04:00
Heng Li	26f4c704ed	drop the old SAM writer	2013-03-11 22:24:54 -04:00
Heng Li	47952b6f3f	drop an unnecessary member from mem_aln_t	2013-03-11 21:35:32 -04:00
Heng Li	8f0d439913	prepare to replace the SAM printing code This move is dangerous as SAM printing is very complex, but it will benefit in the long run. The planned change will reduce the redundancy, improves clarity and most importantly makes it much easier to output multiple primary hits in an optional tag.	2013-03-11 21:25:17 -04:00
Heng Li	5fbd454682	r332: added output threshold Otherwise there are far too many short hits	2013-03-05 22:49:38 -05:00
Heng Li	efd9769b07	r324: a little code cleanup The changes after r317 aim to improve the performance and accuracy for very long query alignment. The short-read alignment should not be affected. The changes include: 1) Z-dropoff. This is a variant of blast's X-dropoff. I orginally thought this heuristic only improves speed, but now I realize it also reduces poor alignment with long good flanking alignments. The difference from blast's X-dropoff is that Z-dropoff allows big gaps, but X-dropoff does not. 2) Band width doubling. When band width is too small, we will get a poor alignment in the middle. Sometimes such alignments cannot be fully excluded with Z-dropoff. Band width doubling is an alternative heuristic. It is based on the observation that the existing of close-to-boundary high score possibly implies inadequate band width. When we see such a signal, we double the band width.	2013-03-05 00:57:16 -05:00

1 2 3

101 Commits (7d085962a26cab160a07a42a461378f575ff011a)