Commit Graph

104 Commits (8c2a90e3e24bb1f25bcccbbb8c61fe51ce82f9a7)

Author SHA1 Message Date
zzh 8c2a90e3e2 修改了双线程解压读写fastq.gz的bug,修改了一些测试代码 2024-06-24 18:11:54 +08:00
zzh 1e3965cb7d 添加了可同时读写的pipeline,优化了时间统计 2024-03-24 04:40:09 +08:00
zzh 856a0e0c01 改成了batch模式,对na12878有效果 2024-03-09 11:39:40 +08:00
zzh 7d085962a2 开始改成sbwa那种batch模式 2024-03-07 18:23:21 +08:00
zzh 6e1dd08fb6 将seed和extend部分修改成了batch模式,好像没啥效果 2024-02-23 01:09:08 +08:00
zzh 463f7da138 将smem1函数用fmt结构实现了,结果基本正确 2024-02-07 22:08:51 +08:00
Heng Li 02a9add042 added MIT license to some non-GPL source files 2020-07-01 23:02:01 -04:00
Heng Li 11d53b06fb Merge branch 'dev' into XB 2018-04-02 10:47:45 -04:00
Heng Li eb7dbc1429 optionally write XB to include alignment score
request from 4DN-DCIC
2018-04-02 10:43:41 -04:00
Heng Li b582816211 r1187: a typo in command line help 2017-09-26 09:35:19 -04:00
Heng Li f123871451 r1142: added option -5 for Hi-C data 2016-05-31 11:01:36 -04:00
Heng Li 1247dc2346 removed the pbread mode
It is not working well. Not at all.
2016-01-26 22:37:01 -05:00
Heng Li 80e4ecfa79 r998: smart pairing; allow mixture of SE/PE reads 2014-11-18 14:30:22 -05:00
Heng Li 76a15ea91b r933: with bwa-postalt ready, drop option -g 2014-10-21 00:23:14 -04:00
Heng Li a03d01f944 r878: XA is given to the best alignment
Non-ALT hits may get ALT hits in the XA tag. This will simplify haplotype
assignment.
2014-09-30 13:50:51 -04:00
Heng Li 9af36064e8 r867: fixed a few bugs; added ALT hits to XA 2014-09-19 16:50:21 -04:00
Heng Li c982443210 r854: improved the calculation of pa
and build pa filtering into BWA-MEM
2014-09-17 16:26:28 -04:00
Heng Li 6f37c14f26 r848: tag alignments with primary ALT 2014-09-16 18:52:49 -04:00
Heng Li ca61fe3ad5 code backup 2014-09-08 08:52:02 -04:00
Heng Li 35ac99b4f7 r815: optionally output ref fasta header
Also fixed a bug in reading .ann files
2014-08-29 10:51:23 -04:00
Heng Li b5cba257c1 r809: new strategy for the -a mode 2014-08-25 11:59:27 -04:00
Heng Li 39a6cd5bb0 r762: cleanup for the new release; unfinished
It will take to make the documentation ready.
2014-05-11 15:15:44 -04:00
Heng Li 43b498a37e r759: bugfix - frac_rep not working
Also added commented code for a 3rd round seeding. Not used.
2014-05-09 14:56:59 -04:00
Heng Li 6db761e269 r746: tuned heuristic for GRCh38
Reduced -c to 500 by default. As a compensation, we choose up to 1000 positions
if a seed has 500 or more occurrences. In addition, a read with big portion
from such seeds will have lower mapping quality.
2014-05-02 16:06:27 -04:00
Heng Li c6c943f9d7 r738: output multi-map in the XA tag (SE only)
... PE support coming soon
2014-04-30 16:46:05 -04:00
Heng Li 88f89be60e r736: improved in low-complexity regions
Example: GGAGGGGAAGGGTGGGCTGGAGGGGACGGGTGGGCTGGAGGGGAAGGGTGTGCTGGAGGGAAAAGGTGGACTGGAGGGGAAGGGTGGGCTGGAGGGGAAGG

This read has 5 chains, two of which are:

weight=80  26;26;0,4591439948(10:-3095894)  23;23;27,4591439957(10:-3095888)  31;31;70,4591439964(10:-3095873)
weight=50  45;45;51,4591440017(10:-3095806) 50;50;51,4591440017(10:-3095801)  31;31;70,4591440090(10:-3095747)

Extension from the 26bp seed in the 1st chain gives an alignment [0,101) <=> [4591439948,4591440067), which
contains the 50bp seed in the second chain. However, if we extend the 50bp seed, it yields a better alignment
[0,101) <=> [4591439966,4591440067) with a different starting position. The 26bp seed is wrong. This commit
adds a heuristic to fix this issue.
2014-04-30 14:14:20 -04:00
Heng Li b92bbb47e5 Merge branch '0.7.7-softclip' into layout
Conflicts:
	Makefile
	bwamem.h
	fastmap.c
	main.c
2014-04-24 12:24:49 -04:00
Heng Li 8c12ec4a4b r725: optionally disable hard clipping
as is reqested by the cancer group
2014-04-24 11:56:43 -04:00
Heng Li b93fca2b2e r723: merge adjacent hits 2014-04-16 16:38:50 -04:00
Heng Li 00a07f61bf r721: merge overlapping hits by default 2014-04-15 16:16:04 -04:00
Heng Li 4e22270eba r718: merge alnregs overlapping on both query/ref 2014-04-14 17:01:17 -04:00
Heng Li 8638cfadc8 dev-472: get rid of bwa_fix_xref()
This function causes all kinds of problems when the reference genome consists
of many short reads/contigs/chromsomes. Some of the problems are nearly
unfixable at the point where bwa_fix_xref() gets called. This commit attempts
to fix the problem at the root. It disallows chains spanning multiple contigs
and never retrieves sequences bridging two adjacent contigs. Thus all the
chaining, extension, SW and global alignments are confined to on contig only.

This commit brings many changes. I have tested it on a couple examples
including Peter Field's PacBio example. It works well so far.
2014-04-10 20:54:27 -04:00
Heng Li ccbbe48c4f dev-470: don't stop on bwa_fix_xref2() failures
Peter Field has sent me an example caused by an alignment bridging three
adjacent chromosomes/contigs. Bwa-mem always aligns the query to the contig
covering the middle point of the alignment. In this example, it chooses the
middle contig, which should not be aligned. This leads to weird things failing
bwa_fix_xref2(), which cannot be fixed unless we build the contig boundaries
into the FM-index.

In the old code, bwa-mem halts when bwa_fix_xref2() fails. With this commit,
bwa-mem will give a warning instead of halting.
2014-04-10 11:43:17 -04:00
Heng Li 99f6f9a0d1 dev-467: limit the max #chains to extend 2014-04-08 21:45:49 -04:00
Heng Li f12dfae772 dev-465: a new output format for read overlap
Also moved a few functions to bwamem_extra.c. File bwamem.c is becoming far too
long.
2014-04-08 16:29:36 -04:00
Heng Li 114901b005 dev-r462: refined setting for PacBio; weight flt
The recommended setting in the last commit is wrong. If we can extend a random
seed hit to the full length, we will force the read aligned through break
points, which is wrong. The new setting is better but it may lead to a small
fraction of fragmented alignments.

In addition, I added a filter on the minimum chain weight and tied
min_HSP_score to this filter. It doubles the mapping speed.
2014-04-04 17:01:04 -04:00
Heng Li 41f720dfa7 dev-461: added a heuristic for PacBio data
See the comment above mem_test_chain_sw() for details.
2014-04-04 16:05:41 -04:00
Heng Li b3225581be dev-458: simplified the smem iterator
simpler but less powful.
2014-04-03 15:23:48 -04:00
Heng Li 9ce50a4e5e dev-450: support diff ins/del penalties. NO TEST!! 2014-03-28 14:54:06 -04:00
Heng Li 2e9463ebf1 dev-r442: suppress exact full-length matches 2014-02-26 22:04:19 -05:00
Heng Li 4219e58623 r423: bugfix - SE hits not random 2013-11-23 09:36:26 -05:00
Heng Li b51a66e4c1 r413: fixed an issue causing redundant alignment
I have seen a fosmid aligned to the same position but with two slightly
different CIGARs: 30000M and 29900M50D100M, possibly caused by tandem repeats.
0.7.5a will regard them as two distinct alignments and generates a very small
mapping quality. However, these two are essentially the same. Although there is
ambiguity in aligning the end of the fosmid, we should not penalize the entire
alignment with a small mapQ. This commit fixes this issue. More testing is
needed, though.
2013-09-09 11:36:50 -04:00
Heng Li 623da055e1 alternative way to estimate mapQ
the old mapQ estimate is too conservative
2013-09-06 12:31:47 -04:00
Heng Li 3b84c03c1e r406: allow to use diff clipping penalties
for 5'-end or for 3'-end
2013-08-28 15:59:05 -04:00
Heng Li 9735d7a31a conform to the latest (unpublished) SAM spec
for chimeric alignments
2013-05-22 19:45:16 -04:00
Heng Li 19cb7cd7ed r388: cleanup mem_process_seqs() interface
Print output outside the function and allow to feed insert size distribution.
2013-04-26 12:31:18 -04:00
Heng Li 53bb846407 r373: optionally distable mate rescue 2013-04-09 16:13:55 -04:00
Heng Li 9346acde1b Release bwa-0.7.3a-r367
In 0.7.3, the wrong CIGAR bug was only fixed in one scenario, but not fixed
in another corner case.
2013-03-15 21:26:37 -04:00
Heng Li 26f4c704ed drop the old SAM writer 2013-03-11 22:24:54 -04:00
Heng Li 47952b6f3f drop an unnecessary member from mem_aln_t 2013-03-11 21:35:32 -04:00