Commit Graph

97 Commits (ee80fb8bd07451f0eba4d7dc9f76d507ed325a13)

Author SHA1 Message Date
Heng Li ee80fb8bd0 Test each seed to see if extension is needed
The old version wastefully extends many seeds contained in an aligned region
found before. While this wastes little time for short reads, it becomes a
serious defect for long query sequences.

This is an attempt to fix this problem, but more tuning are needed.
2013-02-26 22:55:44 -05:00
Heng Li acd1ab607b r297: reduce wasteful SW extension
This is particularly important for long sequences
2013-02-26 16:26:46 -05:00
Heng Li 98787f0ae0 r295: generate NM 2013-02-26 13:36:01 -05:00
Heng Li 32f2d60a2e r294: bugfix - -M not working 2013-02-26 13:14:33 -05:00
Heng Li 619ac4f93d r293: bugfix - wrong RG type in SAM output 2013-02-26 13:03:35 -05:00
Heng Li e70c7c2a71 r284: amend cross-reference hit
I really hate this: complex and twisted logic for a nasty scenario that almost
never happens to short reads - but it may become serious when the reference
genome consists of many contigs.

On toy examples, the code seems to work. Don't know if it really works...
2013-02-26 00:03:49 -05:00
Heng Li 77b5b586ad r282: set min split_len to read length 2013-02-25 17:29:35 -05:00
Heng Li d19e834d84 r280: align two ends in the same thread
Otherwise odd-number threads may be of different speed from even-number threads.
2013-02-25 15:40:15 -05:00
Heng Li 20aa848b3c r279: for PE mapq, consider the number of pairs
If there are a lot of proper pairs, it is more likely that the best pair is
wrong.
2013-02-25 13:00:35 -05:00
Heng Li 9957e04590 r278: don't perform too many mate-sw 2013-02-25 11:56:02 -05:00
Heng Li 5ead86acd3 optionally mark split hit as secondary 2013-02-25 11:18:35 -05:00
Heng Li 514563bd0a no poor hits with -a; reduce mapq for 2nd primary 2013-02-25 10:54:12 -05:00
Heng Li 29e41b592c bugfix: isize is off by 1 2013-02-24 23:00:51 -05:00
Heng Li 85775c3384 output multiple hits 2013-02-24 13:23:43 -05:00
Heng Li 6bdccf2a8a added a bit documentation 2013-02-24 13:09:29 -05:00
Heng Li ee59a13109 simplified bwamem.h
Hide mem_seed_t and mem_chain_t. Don't expose unnecessary routines.
2013-02-24 12:17:29 -05:00
Heng Li cda85be059 fixed a couple bugs identified by gcc
Recent gcc is better.
2013-02-23 17:15:07 -05:00
Heng Li b4c38bcc1c append fasta/q comment 2013-02-23 16:57:34 -05:00
Heng Li ee4540c394 support read group in bwa-mem 2013-02-23 16:41:44 -05:00
Heng Li 67543f19a1 code refactoring 2013-02-23 15:55:55 -05:00
Heng Li e613195e17 moved some common code to bwa.{c,h} 2013-02-23 15:30:46 -05:00
Heng Li d460f2ec9e bugfix in multi-threaded bwa-mem 2013-02-23 14:48:54 -05:00
Heng Li 904c3205c0 removed a few unused variables
These variables have been assigned but never actually used. Reported by
gcc-4.7. Lower version cannot give such warnings.
2013-02-23 13:26:50 -05:00
Heng Li 17c123d65a pring paired-end SAM 2013-02-22 16:38:48 -05:00
Heng Li ba15b787cb rework PE mapq; don't know if better 2013-02-22 14:47:57 -05:00
Heng Li c5ce72f593 scoring pairs by score, not by errors
This is important for bwa-mem which does local alignment. A short exact match
is worse than a long inexact match. Also fixed a bug in approximating mapping
quality.
2013-02-22 12:10:20 -05:00
Heng Li d4cf6d97a6 bugfix: memory leak 2013-02-21 15:04:31 -05:00
Heng Li a578688fa8 generate multiple alignments from one chain 2013-02-21 14:58:51 -05:00
Heng Li cfbc4c89e3 perform extension when there are, say, 20bp tandem 2013-02-21 14:34:10 -05:00
Heng Li 54da54ffd4 extend more seeds (and thus slower...) 2013-02-21 12:52:00 -05:00
Heng Li f8829318cf weakened the chain filter 2013-02-21 12:25:20 -05:00
Heng Li 84a328764a bugfix: mis-chaining caused by integer overflow
I really need to rewrite kbtree some time.
2013-02-21 11:42:30 -05:00
Heng Li ea8f4f4d34 clean bill from valgrind 2013-02-20 20:26:57 -05:00
Heng Li 5626fe29b7 Well, at least output sth 2013-02-20 19:11:44 -05:00
Heng Li a7d574d125 backup comments 2013-02-20 01:11:38 -05:00
Heng Li 688872fb1b code backup 2013-02-19 00:50:39 -05:00
Heng Li 66585b7982 code backup 2013-02-18 16:33:06 -05:00
Heng Li ea9fc7df48 keep the number of SW performed 2013-02-16 11:03:27 -05:00
Heng Li 5f8c6efbc3 forbid x-bounary bns_get_seq(); code backup 2013-02-16 09:48:44 -05:00
Heng Li 604e3d8da1 code backup; to upgrade ksw.{c,h} 2013-02-12 16:15:26 -05:00
Heng Li 325ba8213b move mark primary to worker1() 2013-02-12 15:54:55 -05:00
Heng Li cd0969332f keep track of the "parent" of a secondary 2013-02-12 15:52:23 -05:00
Heng Li 22b79b3475 mark primary, instead of dropping secondary 2013-02-12 15:34:44 -05:00
Heng Li 2fc469d0c9 code backup 2013-02-12 12:09:36 -05:00
Heng Li 95d18449b3 merge bseq.{h,c} to utils.{h,c}
I do not like many small files.
2013-02-12 10:36:15 -05:00
Heng Li 13288e2dcd code backup 2013-02-12 09:22:47 -05:00
Heng Li 99907c98fb separated and improved SAM printing code
This is for the PE mode. The routines may also be useful for bwa-sw, but
probably I won't change the old code.
2013-02-11 15:29:03 -05:00
Heng Li 987d4b4205 fixed a stupid bug in fastq reading 2013-02-11 11:27:35 -05:00
Heng Li 59eaf650ac code backup 2013-02-11 10:59:38 -05:00
Heng Li f4c0672800 move sort_and_dedup() to worker1() 2013-02-10 12:55:19 -05:00