r538: fixed a long existing bug in HPC k-mer (#47)

This bug may lead to a wrong minimizer when a HPC k-mer is longer than 256bp.
When there is a seed match involving this wrong HPC k-mer, the correct seed
sequences do not match in fact. This violates the assumption in align.c and
subsequently causes a segfault, which is what #47 has caught. This bug lurked
in the earliest piece of code and affected all released minimap2 versions so
far. It is extremely rare and does not affect the prebuilt GRCh37/38 indices.
This commit is contained in:
Heng Li 2017-10-28 19:21:10 -04:00
parent 79b0caca95
commit f22a94e868
2 changed files with 3 additions and 3 deletions

2
main.c
View File

@ -6,7 +6,7 @@
#include "mmpriv.h"
#include "getopt.h"
#define MM_VERSION "2.3-r537-dirty"
#define MM_VERSION "2.3-r538-dirty"
#ifdef __linux__
#include <sys/resource.h>

View File

@ -101,13 +101,13 @@ void mm_sketch(void *km, const char *str, int len, int w, int k, uint32_t rid, i
tq_push(&tq, skip_len);
kmer_span += skip_len;
if (tq.count > k) kmer_span -= tq_shift(&tq);
if (kmer_span >= 256) continue; // make sure $kmer_span does not take more than 8 bits
} else kmer_span = l + 1 < k? l + 1 : k;
kmer[0] = (kmer[0] << 2 | c) & mask; // forward k-mer
kmer[1] = (kmer[1] >> 2) | (3ULL^c) << shift1; // reverse k-mer
if (kmer[0] == kmer[1]) continue; // skip "symmetric k-mers" as we don't know it strand
z = kmer[0] < kmer[1]? 0 : 1; // strand
if (++l >= k) {
++l;
if (l >= k && kmer_span < 256) {
info.x = hash64(kmer[z], mask) << 8 | kmer_span;
info.y = (uint64_t)rid<<32 | (uint32_t)i<<1 | z;
}