Commit Graph

178 Commits (6205fa6f219e1ddeacd2faba70f8a96986f6dd4f)

Author SHA1 Message Date
Heng Li 7e0d70bfd3 r445: pair coordinate adjustment working
Next: mapq adjustment, which will be tricky...
2017-09-27 15:38:18 -04:00
Heng Li a349d85280 r444: changed the way orientation is specified
The old model doesn't work with RF or RR orientation. The new model only works
with paired-end reads. For >2 segments, only FF is supported.
2017-09-27 12:33:10 -04:00
Heng Li f611edf6f2 r443: don't filter small cm for split seg 2017-09-26 16:17:58 -04:00
Heng Li 1b1dd0cd57 r442: default max_gap to 200 in the sr mode 2017-09-26 13:31:01 -04:00
Heng Li 55d1e4f638 r440: better chain filtering for PE reads 2017-09-26 11:03:36 -04:00
Heng Li 8f25cfa36e r437: fixed uninialized memory on rep_len 2017-09-25 14:22:45 -04:00
Heng Li 81008dd371 r436: working on short reads
The result is mixed - lots of room for tuning
2017-09-25 14:06:29 -04:00
Heng Li 3bb66e1ed3 multi-seg working on toy examples 2017-09-25 13:42:04 -04:00
Heng Li a742f10164 get multi-seg code ready; probably not working yet 2017-09-24 15:17:17 -04:00
Heng Li f0951141a1 allow to read multiple files interleaved 2017-09-24 14:33:05 -04:00
Heng Li 19d8eca3a1 moved array shrinking into chain_dp() 2017-09-20 14:58:57 -04:00
Heng Li 9943e5fdd0 backup 2017-09-20 14:35:46 -04:00
Heng Li 5b39a1b34b Merge branch 'master' into sr 2017-09-20 12:24:08 -04:00
Heng Li e3b5802b2e r424: reduce memory for long query seqs 2017-09-20 12:22:13 -04:00
Heng Li 03d6894517 backup 2017-09-20 11:47:46 -04:00
Heng Li 645db3350e Merge branch 'master' into sr 2017-09-20 11:15:14 -04:00
Heng Li 75e6bbc9f6 r421: removed the MM_F_SPLICE_BOTH mode
In the default splice mode, minimap2 applies two rounds of spliced alignment:
first assuming GT-AG to be the splice signal across all splicing sites and then
assuming CT-AC to be the signal. This is the idea strategy.

In the MM_F_SPLICE_BOTH mode, minimap2 applies one round of spliced alignment,
assuming GT-AG and CT-AC to be the splice signals AT THE SAME TIME. This will
be faster but less accurate. I don't think anyone would like to run minimap2 in
this mode, so I am removing it for clarity.
2017-09-20 11:11:53 -04:00
Heng Li 7a9b4db874 replaced --approx-ext with --sr
--sr disables Z-drop and may come with other heurstics
2017-09-20 10:51:18 -04:00
Heng Li fd14618e61 no effective changes 2017-09-20 10:11:05 -04:00
Heng Li 56014ba3db avoid assertion failure given 0-length reads 2017-09-19 22:30:32 -04:00
Heng Li b99c22840f r414: avoid assertion failure for 0-length reads 2017-09-19 22:21:27 -04:00
Heng Li c04420698e fixed an uninitialized value 2017-09-19 16:21:21 -04:00
Heng Li fb1bcc0084 early exploration 2017-09-19 16:18:28 -04:00
Heng Li e2823d4aee r367: index reader optionally writes index 2017-09-14 21:18:13 -04:00
Heng Li eb00521d9b redesigned indexing and option APIs 2017-09-14 17:02:01 -04:00
Heng Li 4d3768bf26 r364: improved the mapq heuristics
* use repetitive seed lengths, not counts
* compute n_sub to higher accuracy
* use bwa-mem mapq heuristic as a backup

For short single-end reads, minimap2's ROC is not as good as bwa-mem's, but is
close.
2017-09-14 12:37:03 -04:00
Heng Li 6a82a21dee r361: improved mapq for short reads 2017-09-13 15:32:39 -04:00
Heng Li 3c91d652dd r360: allow to set integer max occ 2017-09-13 11:37:00 -04:00
Heng Li c7c3585531 r347: merged mm_map_frag() into mm_map()
mm_map_frag() was separated due to an earlier design that has been rejected.
2017-09-10 15:02:55 -04:00
Heng Li 59c822b722 removed some commented code
which *might* return at some time later
2017-09-09 08:38:39 -04:00
Heng Li f422175e4e r344: avoid unnecessary refName retrieval 2017-09-08 22:44:14 -04:00
Heng Li 101b8bb97d r335: report an error if query can't be opened 2017-09-03 11:54:38 -04:00
Heng Li 0fe1a224ab r309: improved SAM header output 2017-08-25 10:35:58 +08:00
Heng Li 993a2bb521 r301: separate introns from deletions
When an intron is adjacent to a deletion, the old code count both as introns,
which lead to an inaccurate exon boundary.
2017-08-18 15:31:15 +08:00
Heng Li 64c1389e1a Merge branch 'master' into splice 2017-08-17 23:39:27 +08:00
Heng Li bbb37d95f2 support inserting RG lines 2017-08-17 23:34:09 +08:00
Heng Li 2cde8d257c r297: bidirectional RNA alignment 2017-08-17 06:02:44 -04:00
Heng Li d240318741 r287: refined CLI options and manpage 2017-08-12 12:26:04 -04:00
Heng Li 0f4c823b0c r286: ignore introns when computing max seg score 2017-08-12 10:58:16 -04:00
Heng Li c59b0781bc r280: output introns as "N" in the cdna mode 2017-08-09 11:45:02 -04:00
Heng Li 1a7d782131 r273: cdna mapping mode for testing
Differences from the typical mapping mode:

* banded alignment disabled
* log gap cost during chaining
* zero long-gap extension during alignment
* up to 100kb (by default) reference gap
* bad seeding not filtered (to tune later)
2017-08-08 11:31:49 -04:00
Heng Li 4c0713ee14 r235: optionally output tag cs in PAF
cs encodes the query, the reference sequence and CIGAR.
2017-07-31 12:06:49 -04:00
Heng Li 19d6ec885e r224: inversion alignment around Z-drop break 2017-07-29 13:09:10 -04:00
Heng Li 2179e9e24b r221: output SA in the SAM output 2017-07-28 23:08:39 -04:00
Heng Li 254280b8af r216: a bit cleanup; identical output to r215 2017-07-28 11:54:18 -04:00
Heng Li b927838495 r212: better heuristic to fix wrong seeding
but not good enough. Will explore more.
2017-07-27 11:24:51 -04:00
Heng Li e9dc1ce2b6 r205: when computing mapq, consider min_chain_sc
Not doing this was a mistake.
2017-07-26 11:34:14 -04:00
Heng Li 00c6db5073 r203: check more subopt aln if score small 2017-07-25 20:02:44 -04:00
Heng Li 71c988f6ab r188: renamed bseq* to mm_bseq*
to avoid naming collisions between minimap2 and bwa/fermi-lite/etc
2017-07-19 09:26:46 -04:00
Heng Li 71e2a97a4c r180: changed -x asm5 settings 2017-07-18 00:00:36 -04:00
Heng Li b4280d186f r176: removed seedcov_ratio; changed default opt
min_seedcov_ratio is not used
2017-07-12 12:47:46 -04:00
Heng Li 52caf79395 r175: halved max-chain-skip in the ava mode 2017-07-12 10:42:19 -04:00
Heng Li eeeb2ffb68 r174: make max-chain-skip work
The max-chain-skip heuristics did not work due to a bug. Without this
heuristics, chaining is too slow for long-read overlap.
2017-07-12 10:08:06 -04:00
Heng Li 33451aba45 r173: changed the debugging output format 2017-07-11 15:23:28 -04:00
Heng Li 826c8ba892 r170: added a debugging flag
something wrong with chaining
2017-07-11 14:47:35 -04:00
Heng Li 1ac48556ae r167: long join threshold depends on gap
also caught a bug for reverse strand join
2017-07-09 10:38:51 -04:00
Heng Li 42846ce65d r163: reduced long join score requirement
because the chaining score is generally smaller with the last few commits.
2017-07-08 15:51:52 -04:00
Heng Li 38b2830e18 r161: filter bad seeds; changed default -g/-r 2017-07-08 13:31:27 -04:00
Heng Li cc554aee43 r159: use two-piece gap penalty 2017-07-08 10:26:00 -04:00
Heng Li 9823317e8f r158: optionally ignore base quality 2017-07-05 18:23:50 -04:00
Heng Li e07daad7ad r153: sam primary record not set sometimes 2017-07-03 13:18:57 -04:00
Heng Li b625247300 r150: mm_sync_regs() doesn't work with negative id 2017-07-03 11:36:34 -04:00
Heng Li 53c4bf5e4f r149: introduced debugging flags on CLI 2017-07-03 11:02:32 -04:00
Heng Li 2e4fd9f1d0 r148: revamped regs handling after cigar 2017-07-03 10:44:26 -04:00
Heng Li 51cfb60520 r145: changed default -p from 2 to 0.8
For long reads, secondary alignments can be very information.
2017-07-02 22:51:45 -04:00
Heng Li 74d306a596 fixed bug when retaining 2ndary aln; still buggy 2017-07-02 19:08:30 -04:00
Heng Li 41efd03d7a r129: fixed memory leak caused by qualities 2017-06-30 23:48:00 -04:00
Heng Li 426c2975f6 r126: filter by fraction of seed coverage
otherwise we may get too many poor overlap mappings.
2017-06-30 22:15:45 -04:00
Heng Li 646a746cdc r122: filter contained aln after DP extension 2017-06-30 15:23:30 -04:00
Heng Li fce87ce7bd r121: output QUAL and unmapped to SAM 2017-06-30 14:40:54 -04:00
Heng Li d11049eb32 r120: use max-scoring seg to control output
much better now
2017-06-30 14:21:44 -04:00
Heng Li 1a903486b9 r118: bugfix - regs unsorted before filtering 2017-06-30 12:52:28 -04:00
Heng Li 03267e8fa7 r113: fixed a sam header bug 2017-06-29 22:43:06 -04:00
Heng Li 3825feeeac r111: changed the default z-drop to 200 2017-06-29 21:37:56 -04:00
Heng Li 08cbb09fcc r109: changed the default scoring 2017-06-29 20:21:57 -04:00
Heng Li 4cd456b9ba r108: refactoring, move reg1 routines to hit.c 2017-06-29 19:44:11 -04:00
Heng Li ecedfe5788 code refactoring for mm_reg1_t 2017-06-29 19:35:38 -04:00
Heng Li cc67f1b781 compute mapq; not working for z-split yet 2017-06-29 17:52:48 -04:00
Heng Li b9075d39a8 r104: long gap patching 2017-06-29 14:54:54 -04:00
Heng Li 17944a75c2 merge into map.c 2017-06-29 14:20:14 -04:00
Heng Li c8d122bcdb backup 2017-06-29 11:11:15 -04:00
Heng Li 9fbf7e41e1 r99: report progress 2017-06-28 23:56:33 -04:00
Heng Li b696d5fe5e disable kalloc in the debugging mode 2017-06-28 10:50:27 -04:00
Heng Li bcd9b1c621 r93: fixed various small issues 2017-06-28 10:35:21 -04:00
Heng Li cdc2a1e29f r92: fixed a bug for overlapping alignment
On the PBcR example E. coli reads, miniasm gives one circular unitig.
2017-06-27 22:03:31 -04:00
Heng Li 533150d49d r90: revert default band width to 1000
10000 is excessively tolerant with bad hits.
2017-06-27 20:29:39 -04:00
Heng Li fa80177e58 r89: added minimal number of minimizer counts 2017-06-27 18:43:15 -04:00
Heng Li 99c57b86c5 r79: drop bad hits 2017-06-26 15:28:04 -04:00
Heng Li 5b614ae828 r78: fixed a split bug 2017-06-26 14:45:23 -04:00
Heng Li de54c9dac2 r77: fixed an index loading bug (offset not set) 2017-06-26 13:56:25 -04:00
Heng Li 640b1a1727 command-line option to control CIGAR output 2017-06-26 11:41:09 -04:00
Heng Li b1077ff14c sam output 2017-06-25 22:05:20 -04:00
Heng Li 39083be9ab separated formating printing
for SAM in future; and for performance
2017-06-25 16:13:54 -04:00
Heng Li f20d550a59 fixed the NM bug
due to reversed CIGAR
2017-06-25 11:24:39 -04:00
Heng Li ef5dd318ca implemented chain splitting; NOT tested!!! 2017-06-24 22:57:43 -04:00
Heng Li aa5881e7bb backup 2017-06-24 22:51:31 -04:00
Heng Li 2d8fda9586 an alternative strategy to fix for HPC
it makes the result better, but still not quite right.
2017-06-24 09:26:24 -04:00
Heng Li 0c274280ae output the original score 2017-06-23 22:56:50 -04:00
Heng Li 2987be288c output cigar 2017-06-23 22:53:47 -04:00
Heng Li 35b84f88c6 backup 2017-06-23 22:42:15 -04:00