Commit Graph

349 Commits (main)

Author SHA1 Message Date
Heng Li c280274331 r911: check before attempting adding 2014-10-17 12:13:09 -04:00
Heng Li 76a365a95f r907: revert to -g.8 by default 2014-10-16 15:56:33 -04:00
Heng Li d8d8b230d1 r906: don't reduce non-ALT mapQ by default 2014-10-16 15:15:23 -04:00
Heng Li e318d8e7e5 r905: lower peak RAM for "shm -f" 2014-10-16 11:22:09 -04:00
Heng Li ad0da1418f r904: optionally create tmp files for shm staging 2014-10-16 10:52:59 -04:00
Heng Li 97a3102c89 r903: updated revision number 2014-10-16 10:26:25 -04:00
Heng Li 6a0952948d shared memory 2014-10-15 14:44:08 -04:00
Heng Li c5e859b49f r898: read the index into a single memory block
Prepare for shared memory. Not used now.
2014-10-15 12:27:45 -04:00
Heng Li 71277f0fea r896: more flexible ALT reading 2014-10-14 23:37:24 -04:00
Heng Li 2a18fa114f r895: increase the default max_XA_hits_alt to 200
Because there are >100 HLA haplotypes
2014-10-14 16:58:42 -04:00
Heng Li df20911110 r890: bns_intv2rid() may wrongly return -1 2014-10-14 14:49:53 -04:00
Heng Li 7b62fbb4ba r880: bug in writing .ann file 2014-10-02 15:34:49 -04:00
Heng Li a03d01f944 r878: XA is given to the best alignment
Non-ALT hits may get ALT hits in the XA tag. This will simplify haplotype
assignment.
2014-09-30 13:50:51 -04:00
Heng Li 0a2cf98293 r876: optionally ignore idxbase.alt file 2014-09-27 23:21:50 -04:00
Heng Li dae4ca3ced r875: invalid SAM output for ALT hits 2014-09-26 15:29:08 -04:00
Heng Li 7426a750ec r868: use soft clip for ALT hits 2014-09-19 16:58:18 -04:00
Heng Li 9af36064e8 r867: fixed a few bugs; added ALT hits to XA 2014-09-19 16:50:21 -04:00
Heng Li a41afe4c97 These files were committed on a wrong branch 2014-09-18 10:49:35 -04:00
Heng Li a32d44d8d6 r855: show ALT hits in the PE mode, too
In the previous version, it does not
2014-09-17 23:07:56 -04:00
Heng Li c982443210 r854: improved the calculation of pa
and build pa filtering into BWA-MEM
2014-09-17 16:26:28 -04:00
Heng Li 825ae92e58 r849: the pa tag now gives a number
... which is the ratio of this hit to the best ALT hit.
2014-09-17 13:05:35 -04:00
Heng Li 6f37c14f26 r848: tag alignments with primary ALT 2014-09-16 18:52:49 -04:00
Heng Li a458442b24 r845: updated NEWS
I will use the new version for a while and then release it.
2014-09-16 14:38:41 -04:00
Heng Li 92bc6849a3 r844: added intra-species contig mapping mode 2014-09-16 10:53:07 -04:00
Heng Li 90518f11e3 r843: presetting for ONT 2d reads
Somewhat working for 1d reads, but not very well
2014-09-16 10:38:15 -04:00
Heng Li 4b6eeb34c8 r830: optionally fixed chunk size 2014-09-15 23:42:24 -04:00
Heng Li 624687b072 r829: killed a harmless gcc warning 2014-09-15 23:33:22 -04:00
Heng Li 5d26ab0ee3 r828: changed the default scoring for pacbio 2014-09-15 23:22:05 -04:00
Heng Li b07587f806 r827: an alt hit as good as a pri hit as supp 2014-09-15 16:07:51 -04:00
Heng Li bd85af08ab r826: improved alt mapping for PE 2014-09-15 12:13:04 -04:00
Heng Li aee53f1334 r824: ALT mapping seems working 2014-09-15 00:29:05 -04:00
Heng Li 015ab3f6c3 r823: towards ALT support 2014-09-14 16:41:14 -04:00
Heng Li 8d2b93156b r821: more relax on containing seeds 2014-09-12 10:35:49 -04:00
Heng Li f4aedddee6 r819: bugfix - added too many sub-SMEMs 2014-09-08 11:32:48 -04:00
Heng Li 35ac99b4f7 r815: optionally output ref fasta header
Also fixed a bug in reading .ann files
2014-08-29 10:51:23 -04:00
Heng Li 1e611b235c r810: add err_puts()
puts() adds '\n', but fputs() does not.
2014-08-26 11:07:24 -04:00
Heng Li b5cba257c1 r809: new strategy for the -a mode 2014-08-25 11:59:27 -04:00
Heng Li bf7d1d46ca r808: a minor bug with the new index -b 2014-08-25 10:36:24 -04:00
Heng Li 1bba5ef20e r807: allow to change block size in bwt_gen
For a very large reference genome, the default is too small.
2014-08-25 10:31:54 -04:00
Heng Li 705aa53894 Released 0.7.10 2014-07-13 22:57:27 -04:00
Heng Li 7fd6a11569 r788: segfault when the last ref is "weird"
mem_patch_reg() did not check if two hits are on the same strand, which may
lead to an alignment bridging the forward-backward boundary.
2014-07-10 10:53:56 -04:00
Heng Li cffff4338f r787: use mem_seed_sw() also for non-PacBio reads
In the previous version, mem_seed_sw() is only used for PacBio reads to filter
bad seeds. For non-PacBio long queries, bwa-mem uses mem_chain2aln_short() for
a similar purpose. However, it turns out that mem_chain2aln_short() is not
effective given long near-tandem repeats. Bwa-mem still wastes a lot of time
of futile ref substring and extensions.

In this commit, mem_chain2aln_short() has been removed. mem_seed_sw() is used
if the query sequence is long enough (~700bp). For shorter reads, the results
should be almost identical to the previous version.
2014-07-10 10:30:22 -04:00
Heng Li 3efc33160c 0.7.9a-r786: fixed a segfault in a rare case
More likely to happen given a circular genome
2014-05-19 16:47:25 -04:00
Heng Li 031d3d83ce Wrong release number: 0.7.8 => 0.7.9 2014-05-19 09:49:26 -04:00
Heng Li be74dbc00c Release bwa-0.7.9-r783 2014-05-19 09:09:11 -04:00
Heng Li e4752b321b Release bwa-0.7.9-r782 2014-05-19 09:08:07 -04:00
Heng Li f00cc94e1d r779: fixed a memory leak in SE 2014-05-16 00:06:34 -04:00
Heng Li a5ad0cff7f r778: reduced the number of alloc() calls a bit 2014-05-15 23:23:04 -04:00
Heng Li 8d2986ece2 r770: fixed a compiling warning 2014-05-14 14:44:03 -04:00
Heng Li 061c63f36a r766: removed useless code 2014-05-13 13:09:29 -04:00
Heng Li 0168f39eeb r765: fixed a declaration error
Reported by Andreas Tile from Debian
2014-05-13 12:54:23 -04:00
Heng Li 08517ac09b r764: changed -c in "-x pacbio" to 500 2014-05-13 12:53:24 -04:00
Heng Li 39a6cd5bb0 r762: cleanup for the new release; unfinished
It will take to make the documentation ready.
2014-05-11 15:15:44 -04:00
Heng Li cfe6996173 r760: removed commented code
It is slow and is not very effective. And I hate useless code.
2014-05-09 14:59:07 -04:00
Heng Li 43b498a37e r759: bugfix - frac_rep not working
Also added commented code for a 3rd round seeding. Not used.
2014-05-09 14:56:59 -04:00
Heng Li c9b33502f3 r758: fixed a typo
mostly negligible in practice
2014-05-07 15:07:29 -04:00
Heng Li 6ac8dd5840 r754: added command msg for -h 2014-05-06 16:15:14 -04:00
Heng Li ce3c198245 r749: max_hits tunable on CMD; default to 5 2014-05-04 10:17:03 -04:00
Heng Li f21d6498bc r748: reduced the default -m to 50 2014-05-02 16:49:19 -04:00
Heng Li e8f28cb529 r747: fixed a minor issue in the last (mis)commit 2014-05-02 16:17:50 -04:00
Heng Li 6db761e269 r746: tuned heuristic for GRCh38
Reduced -c to 500 by default. As a compensation, we choose up to 1000 positions
if a seed has 500 or more occurrences. In addition, a read with big portion
from such seeds will have lower mapping quality.
2014-05-02 16:06:27 -04:00
Heng Li b7076848ab r744: int overflow given MB query 2014-05-01 15:30:36 -04:00
Heng Li fa20c71920 r742: further control the max bandwidth
I am looking at 6kb bandwidth...
2014-05-01 14:27:38 -04:00
Heng Li 7954e77a1b r741: fixed segfault in rare cases 2014-05-01 11:13:05 -04:00
Heng Li 4b2441069f r740: don't attempt merge if bandwidth too large
Sometimes the bandwidth can be >10k.
2014-05-01 11:01:52 -04:00
Heng Li 5aedc978d1 r739: output suboptimal hits in the PE mode
However, PE information is not used for suboptimal hits
2014-04-30 23:23:54 -04:00
Heng Li c6c943f9d7 r738: output multi-map in the XA tag (SE only)
... PE support coming soon
2014-04-30 16:46:05 -04:00
Heng Li d59d78838c r737: fixed an assertion when failed to convert sa
A bug pointed out by Mikkle Schubert
2014-04-30 14:55:44 -04:00
Heng Li 88f89be60e r736: improved in low-complexity regions
Example: GGAGGGGAAGGGTGGGCTGGAGGGGACGGGTGGGCTGGAGGGGAAGGGTGTGCTGGAGGGAAAAGGTGGACTGGAGGGGAAGGGTGGGCTGGAGGGGAAGG

This read has 5 chains, two of which are:

weight=80  26;26;0,4591439948(10:-3095894)  23;23;27,4591439957(10:-3095888)  31;31;70,4591439964(10:-3095873)
weight=50  45;45;51,4591440017(10:-3095806) 50;50;51,4591440017(10:-3095801)  31;31;70,4591440090(10:-3095747)

Extension from the 26bp seed in the 1st chain gives an alignment [0,101) <=> [4591439948,4591440067), which
contains the 50bp seed in the second chain. However, if we extend the 50bp seed, it yields a better alignment
[0,101) <=> [4591439966,4591440067) with a different starting position. The 26bp seed is wrong. This commit
adds a heuristic to fix this issue.
2014-04-30 14:14:20 -04:00
Heng Li 11698fc4e5 r735: fixed a bug caused by merge 2014-04-30 13:12:43 -04:00
Heng Li b603fed39c r733: bugfix - seed score unset when no -W 2014-04-29 14:58:53 -04:00
Heng Li 44754cd615 r731: separate layouter 2014-04-28 10:39:29 -04:00
Heng Li dadd5d6281 r730: more permissive about merging overlapping 2014-04-28 10:01:54 -04:00
Heng Li 76bb49e01b r729: halved band width; doubled patch band width 2014-04-24 16:06:01 -04:00
Heng Li 6052d3015b r728: sorting the end in mem_sort_dedup_patch()
The older version does this, which is correct.
2014-04-24 15:44:59 -04:00
Heng Li df65893fb5 r727: extend seeds with SW 2014-04-24 14:28:40 -04:00
Heng Li b92bbb47e5 Merge branch '0.7.7-softclip' into layout
Conflicts:
	Makefile
	bwamem.h
	fastmap.c
	main.c
2014-04-24 12:24:49 -04:00
Heng Li 8c12ec4a4b r725: optionally disable hard clipping
as is reqested by the cancer group
2014-04-24 11:56:43 -04:00
Heng Li b93fca2b2e r723: merge adjacent hits 2014-04-16 16:38:50 -04:00
Heng Li 48847af2fc code backup 2014-04-16 12:00:13 -04:00
Heng Li 00a07f61bf r721: merge overlapping hits by default 2014-04-15 16:16:04 -04:00
Heng Li 45f24b4ae8 r720: improved overlap hit merging 2014-04-15 16:09:42 -04:00
Heng Li bdb7b000cd r719: more stringent overlap merge
Will consider to make it the default
2014-04-15 14:52:17 -04:00
Heng Li 4e22270eba r718: merge alnregs overlapping on both query/ref 2014-04-14 17:01:17 -04:00
Heng Li 6d4a6debdc r716: changed -x pbread 2014-04-14 16:04:29 -04:00
Heng Li bbcabfe342 r707: change params for pacbio-to-pacbio 2014-04-10 21:53:52 -04:00
Heng Li 658f27eae4 Merge branch 'dev' into layout
Conflicts:
	main.c
2014-04-10 21:48:47 -04:00
Heng Li 6fda93502f r705: pairing performed on one chr only
Change of versioning: the revision number is acquired with:

  git rev-list --all --count

This counts the total number of commits across all branches.
2014-04-10 21:38:14 -04:00
Heng Li db4b171fa6 Merge branch 'dev' into layout
Conflicts:
	main.c
2014-04-10 21:09:47 -04:00
Heng Li 07182d9061 dev-475: -F outputs unit score, not raw score 2014-04-10 21:09:06 -04:00
Heng Li 7d25fe2de3 Merge branch 'dev' into layout
Conflicts:
	main.c
2014-04-10 21:07:16 -04:00
Heng Li e80bccc923 dev-474: fixed a typo 2014-04-10 21:04:02 -04:00
Heng Li f02cd42679 dev-473: added a few assertions
to make sure the new change works as is expected
2014-04-10 21:03:13 -04:00
Heng Li 8638cfadc8 dev-472: get rid of bwa_fix_xref()
This function causes all kinds of problems when the reference genome consists
of many short reads/contigs/chromsomes. Some of the problems are nearly
unfixable at the point where bwa_fix_xref() gets called. This commit attempts
to fix the problem at the root. It disallows chains spanning multiple contigs
and never retrieves sequences bridging two adjacent contigs. Thus all the
chaining, extension, SW and global alignments are confined to on contig only.

This commit brings many changes. I have tested it on a couple examples
including Peter Field's PacBio example. It works well so far.
2014-04-10 20:54:27 -04:00
Heng Li e2d0c996e9 layout-477: output unit score, not the raw score 2014-04-10 18:03:28 -04:00
Heng Li 0eeacbbe39 Merge branch 'dev' into layout 2014-04-10 17:56:24 -04:00
Heng Li 23e0e99ec0 dev-471: fixed a compiling error from last commit 2014-04-10 11:54:17 -04:00
Heng Li ccbbe48c4f dev-470: don't stop on bwa_fix_xref2() failures
Peter Field has sent me an example caused by an alignment bridging three
adjacent chromosomes/contigs. Bwa-mem always aligns the query to the contig
covering the middle point of the alignment. In this example, it chooses the
middle contig, which should not be aligned. This leads to weird things failing
bwa_fix_xref2(), which cannot be fixed unless we build the contig boundaries
into the FM-index.

In the old code, bwa-mem halts when bwa_fix_xref2() fails. With this commit,
bwa-mem will give a warning instead of halting.
2014-04-10 11:43:17 -04:00
Heng Li 8220008564 an attempt to layout tool 2014-04-09 16:11:52 -04:00
Heng Li db58392e9b dev-469: fixed wrong command line prompt 2014-04-09 13:20:04 -04:00