Commit Graph

259 Commits (d58beeee56d2ee9b6b4f0f7eb2396d0337aa85fb)

Author SHA1 Message Date
Heng Li 3d129be642 r943: change the default -y to 20, but ...
for GRCh38 ALT, this is not enough. We need -y at least 40 to get high accuracy
because a locus at chr19 has 35 copies.
2014-10-22 12:42:58 -04:00
Heng Li 4177d6c2c7 r942: ignore ALT hits when counting n_sub for ...
non-ALT hits. Counting leads to underestimated mapQ.
2014-10-22 10:24:16 -04:00
Heng Li 60b728487a r941: set a min length for 3rd-round seeding 2014-10-21 13:15:42 -04:00
Heng Li 282130a64e r940: fixed a bug - missing primary hit 2014-10-21 12:57:49 -04:00
Heng Li 76a15ea91b r933: with bwa-postalt ready, drop option -g 2014-10-21 00:23:14 -04:00
Heng Li a6b5a30dab r930: use 3rd round seeding by default
This strategy is similar to the seeding heuristic used by LAST. When it is used
alone, it is not as accurate as the current seeding strategy at least for short
reads. However, it may do a better job for a long contig mapped to multiple ALT
contigs. This seeding strategy is also relatively cheap to perform.
2014-10-20 17:34:15 -04:00
Heng Li 038af2a551 r929: added simplified LAST-like seeding 2014-10-20 17:00:31 -04:00
Heng Li 3370ae9e35 r926: prepare to move -g to bwa-postalt.js 2014-10-19 20:43:53 -04:00
Heng Li 76a365a95f r907: revert to -g.8 by default 2014-10-16 15:56:33 -04:00
Heng Li d8d8b230d1 r906: don't reduce non-ALT mapQ by default 2014-10-16 15:15:23 -04:00
Heng Li 2a18fa114f r895: increase the default max_XA_hits_alt to 200
Because there are >100 HLA haplotypes
2014-10-14 16:58:42 -04:00
Heng Li a03d01f944 r878: XA is given to the best alignment
Non-ALT hits may get ALT hits in the XA tag. This will simplify haplotype
assignment.
2014-09-30 13:50:51 -04:00
Heng Li dae4ca3ced r875: invalid SAM output for ALT hits 2014-09-26 15:29:08 -04:00
Heng Li 7426a750ec r868: use soft clip for ALT hits 2014-09-19 16:58:18 -04:00
Heng Li 9af36064e8 r867: fixed a few bugs; added ALT hits to XA 2014-09-19 16:50:21 -04:00
Heng Li a41afe4c97 These files were committed on a wrong branch 2014-09-18 10:49:35 -04:00
Heng Li c982443210 r854: improved the calculation of pa
and build pa filtering into BWA-MEM
2014-09-17 16:26:28 -04:00
Heng Li 825ae92e58 r849: the pa tag now gives a number
... which is the ratio of this hit to the best ALT hit.
2014-09-17 13:05:35 -04:00
Heng Li 6f37c14f26 r848: tag alignments with primary ALT 2014-09-16 18:52:49 -04:00
Heng Li 4b6eeb34c8 r830: optionally fixed chunk size 2014-09-15 23:42:24 -04:00
Heng Li 624687b072 r829: killed a harmless gcc warning 2014-09-15 23:33:22 -04:00
Heng Li b07587f806 r827: an alt hit as good as a pri hit as supp 2014-09-15 16:07:51 -04:00
Heng Li aee53f1334 r824: ALT mapping seems working 2014-09-15 00:29:05 -04:00
Heng Li 015ab3f6c3 r823: towards ALT support 2014-09-14 16:41:14 -04:00
Heng Li 8116bcc786 Merge branch 'dev' into alt 2014-09-14 15:40:52 -04:00
Heng Li 8d2b93156b r821: more relax on containing seeds 2014-09-12 10:35:49 -04:00
Heng Li 6739b713dd Merge branch 'hotfix-utgaln' into dev
Conflicts:
	main.c
2014-09-08 12:44:42 -04:00
Heng Li f4aedddee6 r819: bugfix - added too many sub-SMEMs 2014-09-08 11:32:48 -04:00
Heng Li ca61fe3ad5 code backup 2014-09-08 08:52:02 -04:00
Heng Li 1934f0cf24 code backup 2014-09-05 13:20:52 -04:00
Heng Li 35ac99b4f7 r815: optionally output ref fasta header
Also fixed a bug in reading .ann files
2014-08-29 10:51:23 -04:00
Heng Li b5cba257c1 r809: new strategy for the -a mode 2014-08-25 11:59:27 -04:00
Heng Li 7fd6a11569 r788: segfault when the last ref is "weird"
mem_patch_reg() did not check if two hits are on the same strand, which may
lead to an alignment bridging the forward-backward boundary.
2014-07-10 10:53:56 -04:00
Heng Li cffff4338f r787: use mem_seed_sw() also for non-PacBio reads
In the previous version, mem_seed_sw() is only used for PacBio reads to filter
bad seeds. For non-PacBio long queries, bwa-mem uses mem_chain2aln_short() for
a similar purpose. However, it turns out that mem_chain2aln_short() is not
effective given long near-tandem repeats. Bwa-mem still wastes a lot of time
of futile ref substring and extensions.

In this commit, mem_chain2aln_short() has been removed. mem_seed_sw() is used
if the query sequence is long enough (~700bp). For shorter reads, the results
should be almost identical to the previous version.
2014-07-10 10:30:22 -04:00
Heng Li e4752b321b Release bwa-0.7.9-r782 2014-05-19 09:08:07 -04:00
Heng Li f00cc94e1d r779: fixed a memory leak in SE 2014-05-16 00:06:34 -04:00
Heng Li a5ad0cff7f r778: reduced the number of alloc() calls a bit 2014-05-15 23:23:04 -04:00
Heng Li 061c63f36a r766: removed useless code 2014-05-13 13:09:29 -04:00
Heng Li 39a6cd5bb0 r762: cleanup for the new release; unfinished
It will take to make the documentation ready.
2014-05-11 15:15:44 -04:00
Heng Li cfe6996173 r760: removed commented code
It is slow and is not very effective. And I hate useless code.
2014-05-09 14:59:07 -04:00
Heng Li 43b498a37e r759: bugfix - frac_rep not working
Also added commented code for a 3rd round seeding. Not used.
2014-05-09 14:56:59 -04:00
Heng Li c9b33502f3 r758: fixed a typo
mostly negligible in practice
2014-05-07 15:07:29 -04:00
Heng Li ce3c198245 r749: max_hits tunable on CMD; default to 5 2014-05-04 10:17:03 -04:00
Heng Li f21d6498bc r748: reduced the default -m to 50 2014-05-02 16:49:19 -04:00
Heng Li e8f28cb529 r747: fixed a minor issue in the last (mis)commit 2014-05-02 16:17:50 -04:00
Heng Li 6db761e269 r746: tuned heuristic for GRCh38
Reduced -c to 500 by default. As a compensation, we choose up to 1000 positions
if a seed has 500 or more occurrences. In addition, a read with big portion
from such seeds will have lower mapping quality.
2014-05-02 16:06:27 -04:00
Heng Li fa20c71920 r742: further control the max bandwidth
I am looking at 6kb bandwidth...
2014-05-01 14:27:38 -04:00
Heng Li 4b2441069f r740: don't attempt merge if bandwidth too large
Sometimes the bandwidth can be >10k.
2014-05-01 11:01:52 -04:00
Heng Li c6c943f9d7 r738: output multi-map in the XA tag (SE only)
... PE support coming soon
2014-04-30 16:46:05 -04:00
Heng Li 88f89be60e r736: improved in low-complexity regions
Example: GGAGGGGAAGGGTGGGCTGGAGGGGACGGGTGGGCTGGAGGGGAAGGGTGTGCTGGAGGGAAAAGGTGGACTGGAGGGGAAGGGTGGGCTGGAGGGGAAGG

This read has 5 chains, two of which are:

weight=80  26;26;0,4591439948(10:-3095894)  23;23;27,4591439957(10:-3095888)  31;31;70,4591439964(10:-3095873)
weight=50  45;45;51,4591440017(10:-3095806) 50;50;51,4591440017(10:-3095801)  31;31;70,4591440090(10:-3095747)

Extension from the 26bp seed in the 1st chain gives an alignment [0,101) <=> [4591439948,4591440067), which
contains the 50bp seed in the second chain. However, if we extend the 50bp seed, it yields a better alignment
[0,101) <=> [4591439966,4591440067) with a different starting position. The 26bp seed is wrong. This commit
adds a heuristic to fix this issue.
2014-04-30 14:14:20 -04:00