Commit Graph

261 Commits (aa2d9d4e1bf9ec8ec2a16f371d18c87ec2aa89a5)

Author SHA1 Message Date
Heng Li c672690564 r218: increase the frequency of SW slightly 2017-07-28 13:30:42 -04:00
Heng Li f4fee60188 r217: ignore tandem seeds during alignment
This helps a tiny bit.
2017-07-28 12:26:56 -04:00
Heng Li 254280b8af r216: a bit cleanup; identical output to r215 2017-07-28 11:54:18 -04:00
Heng Li fc965805f7 r215: bring back a log gap component
Otherwise chaining may more often break a long gap into several gaps.
2017-07-28 00:17:19 -04:00
Heng Li 2c79580649 r213: more careful solution to wrong seeds
a little better, but not good enough!
2017-07-27 13:19:11 -04:00
Heng Li b927838495 r212: better heuristic to fix wrong seeding
but not good enough. Will explore more.
2017-07-27 11:24:51 -04:00
Heng Li 371e20cc7c r211: a better heurstic to reduce false seeds 2017-07-26 23:56:38 -04:00
Heng Li a01d758af6 r206: mapq penalize short chains further
The old code penalized at the log() scale. Now added a linear-scaled factor. If
the chain consists of few minimizers, its quality is really not good.
2017-07-26 11:50:04 -04:00
Heng Li e9dc1ce2b6 r205: when computing mapq, consider min_chain_sc
Not doing this was a mistake.
2017-07-26 11:34:14 -04:00
Heng Li 00c6db5073 r203: check more subopt aln if score small 2017-07-25 20:02:44 -04:00
Heng Li f2ef48878a r202: trim bad chain ends before extension
This fixes a few more FP long INDELs towards the end of alignments.
2017-07-25 19:53:19 -04:00
Heng Li 21ca564112 r201: fixed a minor chaining issue
Chaining looked at the end of a chain, but the end may not be the best. We now
go back to find the max.
2017-07-25 18:26:51 -04:00
Heng Li 215e92ed7b r200: reduce long gaps in chaining
Every seed can initiate a chain.
2017-07-25 17:32:54 -04:00
Heng Li b530ade333 r199: changed to linear gap cost for chaining
The old cost doesn't penalize long gaps enough. Will also drop seeds close to
the edge in the next commit.
2017-07-25 15:35:10 -04:00
Heng Li f81f37fef1 r197: allocate index seq names from kalloc
to reduce malloc() overhead.
2017-07-24 19:36:05 -04:00
Heng Li 5c4d040b13 r191: warning if CLI index opt diff from prebuilt
Also added index testing API (moved from main.c to index.c)
2017-07-19 10:25:11 -04:00
Heng Li 4aff301ef4 r190: default -k to 15; added -x map-ont 2017-07-19 10:11:14 -04:00
Heng Li 470021fd27 r189: sync with ksw2 (no effective changes) 2017-07-19 09:28:25 -04:00
Heng Li 71c988f6ab r188: renamed bseq* to mm_bseq*
to avoid naming collisions between minimap2 and bwa/fermi-lite/etc
2017-07-19 09:26:46 -04:00
Heng Li b9b0b6f49c r187: fixed non-terminated sam output (#3)
Only happen to unmapped read, with quality, and in the SAM output
2017-07-18 15:20:29 -04:00
Heng Li 495a78e40a Get documentation ready for release 2017-07-18 11:04:09 -04:00
Heng Li 71e2a97a4c r180: changed -x asm5 settings 2017-07-18 00:00:36 -04:00
Heng Li 941059292e r179: changed the preset for assembly alignment 2017-07-17 22:41:46 -04:00
Heng Li 38aa66fa30 r178: fixed integer overflow in mapq calculation 2017-07-16 21:45:39 -04:00
Heng Li b4280d186f r176: removed seedcov_ratio; changed default opt
min_seedcov_ratio is not used
2017-07-12 12:47:46 -04:00
Heng Li 52caf79395 r175: halved max-chain-skip in the ava mode 2017-07-12 10:42:19 -04:00
Heng Li eeeb2ffb68 r174: make max-chain-skip work
The max-chain-skip heuristics did not work due to a bug. Without this
heuristics, chaining is too slow for long-read overlap.
2017-07-12 10:08:06 -04:00
Heng Li 33451aba45 r173: changed the debugging output format 2017-07-11 15:23:28 -04:00
Heng Li cfa083a98b r172: separated PacBio and ONT read overlapping
HPC k-mer works better for PacBio, but worse for ONT. Interesting...
2017-07-11 15:12:35 -04:00
Heng Li 7598809577 r171: reduced log gap cost at chaining
The cost is so large that it discards too many valid seeds without HPC k-mers.
This change may introduce false long gaps to reference mapping. We have another
mechanism mm_filter_bad_seeds() to protect against this. In addition, minimap2
is not that bad to have long gaps. Some other aligners are worse.

Still need tuning in future.
2017-07-11 14:57:49 -04:00
Heng Li 826c8ba892 r170: added a debugging flag
something wrong with chaining
2017-07-11 14:47:35 -04:00
Heng Li 801bc84b01 r169: output more accurate col. 10&11 to PAF
In r168, col.10 is smaller than what it should be. This confuses miniasm.
2017-07-11 14:09:51 -04:00
Heng Li 782449975d r168: fixed a bug in long join: a[] not sorted
Also added length requirement for long join and changed -g in the ava mode
2017-07-09 12:14:20 -04:00
Heng Li 1ac48556ae r167: long join threshold depends on gap
also caught a bug for reverse strand join
2017-07-09 10:38:51 -04:00
Heng Li 4ee3202539 r164: unmapped read not properly flagged 2017-07-08 18:16:18 -04:00
Heng Li 42846ce65d r163: reduced long join score requirement
because the chaining score is generally smaller with the last few commits.
2017-07-08 15:51:52 -04:00
Heng Li 3f6a0b0b5c r162: improved chaining accuracy 2017-07-08 14:29:36 -04:00
Heng Li 38b2830e18 r161: filter bad seeds; changed default -g/-r 2017-07-08 13:31:27 -04:00
Heng Li 1fee5f8edc r160: -O and -E accept two numbers 2017-07-08 11:34:52 -04:00
Heng Li cc554aee43 r159: use two-piece gap penalty 2017-07-08 10:26:00 -04:00
Heng Li 9823317e8f r158: optionally ignore base quality 2017-07-05 18:23:50 -04:00
Heng Li e07daad7ad r153: sam primary record not set sometimes 2017-07-03 13:18:57 -04:00
Heng Li a94bc31311 r151: documentations 2017-07-03 12:11:07 -04:00
Heng Li b625247300 r150: mm_sync_regs() doesn't work with negative id 2017-07-03 11:36:34 -04:00
Heng Li 53c4bf5e4f r149: introduced debugging flags on CLI 2017-07-03 11:02:32 -04:00
Heng Li 2e4fd9f1d0 r148: revamped regs handling after cigar 2017-07-03 10:44:26 -04:00
Heng Li e06c342659 r146: in filtering, drop children if parent out
This has been causing several segfaults.
2017-07-03 00:28:12 -04:00
Heng Li 51cfb60520 r145: changed default -p from 2 to 0.8
For long reads, secondary alignments can be very information.
2017-07-02 22:51:45 -04:00
Heng Li 632b8638d2 r144: adjust primary aln after cigar 2017-07-02 22:43:02 -04:00
Heng Li 2b45ba7a0b r143: fixed a segfault and incorrect .parent 2017-07-02 19:56:21 -04:00
Heng Li 74d306a596 fixed bug when retaining 2ndary aln; still buggy 2017-07-02 19:08:30 -04:00
Heng Li da90b614db r141: replaced -b with -a (for SAM output)
-b sounds like BAM. I like -a better.
2017-07-01 16:54:59 -04:00
Heng Li 2338e887d9 finished the first draft of manpage 2017-07-01 11:25:54 -04:00
Heng Li a9f089f0aa r131: wrong EOF test; make mb_size <= batch_size 2017-07-01 09:26:09 -04:00
Heng Li 41efd03d7a r129: fixed memory leak caused by qualities 2017-06-30 23:48:00 -04:00
Heng Li 426c2975f6 r126: filter by fraction of seed coverage
otherwise we may get too many poor overlap mappings.
2017-06-30 22:15:45 -04:00
Heng Li d73bb28097 r125: changed CLI options 2017-06-30 19:08:47 -04:00
Heng Li b08591c7a0 r124: a bit better CLI prompt 2017-06-30 15:46:52 -04:00
Heng Li 3a5486325a r123: fixed a mem leak; more presets 2017-06-30 15:39:05 -04:00
Heng Li 646a746cdc r122: filter contained aln after DP extension 2017-06-30 15:23:30 -04:00
Heng Li fce87ce7bd r121: output QUAL and unmapped to SAM 2017-06-30 14:40:54 -04:00
Heng Li d11049eb32 r120: use max-scoring seg to control output
much better now
2017-06-30 14:21:44 -04:00
Heng Li 08a61c3cfc r119: fixed a bug hidden by a previous bug 2017-06-30 13:27:47 -04:00
Heng Li 1a903486b9 r118: bugfix - regs unsorted before filtering 2017-06-30 12:52:28 -04:00
Heng Li 5dcd8f8965 r117: fixed a bug in logic 2017-06-30 11:52:42 -04:00
Heng Li 91e1c4d6db r116: fixed another bug caused by refactoring 2017-06-30 00:03:45 -04:00
Heng Li 52b4d8e2c9 r115: set primary tag; still buggy 2017-06-29 23:48:35 -04:00
Heng Li c4871f380c r114: make SAM output better 2017-06-29 23:08:41 -04:00
Heng Li 03267e8fa7 r113: fixed a sam header bug 2017-06-29 22:43:06 -04:00
Heng Li 11167f511b r112: output z-drop 2017-06-29 22:08:46 -04:00
Heng Li 3825feeeac r111: changed the default z-drop to 200 2017-06-29 21:37:56 -04:00
Heng Li e2b86d0332 r110: fixed a bug caused by refactoring 2017-06-29 21:12:31 -04:00
Heng Li 08cbb09fcc r109: changed the default scoring 2017-06-29 20:21:57 -04:00
Heng Li 4cd456b9ba r108: refactoring, move reg1 routines to hit.c 2017-06-29 19:44:11 -04:00
Heng Li 337c2a21cd r105: fixed a bug in repeated right ext when zdrop 2017-06-29 15:45:07 -04:00
Heng Li b9075d39a8 r104: long gap patching 2017-06-29 14:54:54 -04:00
Heng Li 9fbf7e41e1 r99: report progress 2017-06-28 23:56:33 -04:00
Heng Li 38070e8a05 r98: fixed segfault for certain scoring
due to unsigned comparisons between -1 and chromosome length
2017-06-28 22:18:51 -04:00
Heng Li a25866c25c r96: min_cnt still wrong in chaining 2017-06-28 11:03:03 -04:00
Heng Li bf0e8199e2 r94: min_cnt is tested in a wrong way in chain 2017-06-28 10:39:27 -04:00
Heng Li bcd9b1c621 r93: fixed various small issues 2017-06-28 10:35:21 -04:00
Heng Li cdc2a1e29f r92: fixed a bug for overlapping alignment
On the PBcR example E. coli reads, miniasm gives one circular unitig.
2017-06-27 22:03:31 -04:00
Heng Li 51057ab673 expose scoring 2017-06-27 21:37:25 -04:00
Heng Li 533150d49d r90: revert default band width to 1000
10000 is excessively tolerant with bad hits.
2017-06-27 20:29:39 -04:00
Heng Li fa80177e58 r89: added minimal number of minimizer counts 2017-06-27 18:43:15 -04:00
Heng Li 8977f07269 r88: fixed an out-of-boundary bug in ksw2 2017-06-27 14:50:31 -04:00
Heng Li 42283ef10c r87: fixed a bug in ksw2 2017-06-27 13:29:48 -04:00
Heng Li c02ff4662c r85: two-round z-drop 2017-06-27 10:36:24 -04:00
Heng Li 99c57b86c5 r79: drop bad hits 2017-06-26 15:28:04 -04:00
Heng Li 5b614ae828 r78: fixed a split bug 2017-06-26 14:45:23 -04:00
Heng Li de54c9dac2 r77: fixed an index loading bug (offset not set) 2017-06-26 13:56:25 -04:00
Heng Li 10644f2165 r76: missing header file 2017-06-26 12:36:37 -04:00
Heng Li 4b8e88a5f4 use long options 2017-06-26 12:31:36 -04:00
Heng Li 640b1a1727 command-line option to control CIGAR output 2017-06-26 11:41:09 -04:00
Heng Li b1077ff14c sam output 2017-06-25 22:05:20 -04:00
Heng Li 72dfb0c99e fixed a bug in ksw2 2017-06-25 10:22:13 -04:00
Heng Li b04e4b9215 r36: bring back primary; don't output all mappings 2017-06-08 15:28:19 -04:00
Heng Li 19e43571c1 r34: removed a bit unused code 2017-06-07 14:35:57 -04:00
Heng Li d816e48fce fixed a bug in chaining 2017-06-06 14:33:43 -04:00
Heng Li 6d4348db44 dp chaining mostly works, but fails sometimes
which means there are bugs that need to be fixed
2017-06-06 14:19:50 -04:00
Heng Li 1a9fc04cf0 backup 2017-06-06 10:16:33 -04:00
Heng Li acc7382a30 backup 2017-06-04 16:09:45 -04:00
Heng Li 06adabd0dc clean bill from valgrind 2017-05-04 12:44:49 +08:00
Heng Li f2ae8eb670 mostly debugging code 2017-05-01 16:50:09 +08:00
Heng Li 7b7fabef4d added idx_stat 2017-04-26 22:52:28 +08:00
Heng Li de367a340c compilable again 2017-04-26 19:36:46 +08:00
Heng Li 56723ad580 moved `sum_len` out of the index
as it can be inferred.
2017-04-19 11:06:24 -04:00
Heng Li f35e152e99 fixed a few memory leaks 2017-04-13 23:05:19 -04:00
Heng Li 79c9478f46 backup 2017-04-09 14:59:39 -04:00
Heng Li 8c230563cc can be compiled 2017-04-07 15:56:10 -04:00
Heng Li f5cdd3f72f is_hpc is a property of the index 2017-04-07 15:42:33 -04:00