Many Intel intrinsics have a corresponding Neon equivalent.
Other cases are more interesting:
* Neon's vmaxvq directly selects the maximum entry in a vector,
so can be used to implement both the __max_16/__max_8 macros
and the _mm_movemask_epi8 early loop exit. Introduce additional
helper macros alongside __max_16/__max_8 so that the early loop
exit can similarly be implemented differently on the two platforms.
* Full-width shifts can be done via vextq. This is defined close to
the ksw_u8()/ksw_i16() functions (rather than in neon_sse.h) as it
implicitly uses one of their local variables.
* ksw_i16() uses saturating *signed* 16-bit operations apart from
_mm_subs_epu16; presumably the data is effectively still signed but
we wish to keep it non-negative. The ARM intrinsics are more careful
about type checking, so this requires an extra U16() helper macro.
The previous code implicitly caused a load; change it so the load
intrinsic is explicitly invoked, as the others are. (This in fact
makes no difference to the generated code.)
1. the first cell in a row is not always right
2. prevent from H->H extension from H=0 cells
3. replaced the band narrowing heuristic with always correct one
Global alignment does not allow contiguous insertions and deletions, but local
alignment and extension allow such CIGARs. The optimal global alignment may
have a lower score than extension, which actually happens often for PacBio
data. This commit disallows a CIGAR like 20M2D2I30M to fix this inconsistency.
Local alignment has not been changed.
Remove xmalloc, xcalloc, xrealloc and xstrdup from utils.h and revert calls
to the normal malloc, calloc, realloc, strdup. Add new files malloc_wrap.[ch]
with the wrapper functions. malloc_wrap.h #defines malloc etc. to the
wrapper, but only if USE_MALLOC_WRAPPERS has been defined.
Put #include "malloc_wrap.h" in any file that uses *alloc or strdup. This
is also in a #ifdef USE_MALLOC_WRAPPERS ... #endif block to make using the
wrappers optional. Add -DUSE_MALLOC_WRAPPERS into the makefile so they
should normally get added.
This is an improvement on the previous method as we now don't need to
worry about stray function calls that were not changed to the wrapped version
and the code will still work even if the wrapping is disabled.
Other possible methods of doing this are using malloc_hook (glibc-specific),
adding -include malloc_wrap.h to the gcc command-line (somewhat
gcc-specific) or making our own malloc function and using dlopen (scary).
This way is probably the most portable.
Added a new utils.c wrapper err_gzclose and changed gzclose calls to use it.
Put in some more err_fflush calls before files being written are closed.
Made err_fflush call fsync. This is useful for remote filesystems where
errors may not be reported on fflush or fclose as problems at the server
end may only be detected after they have returned. If bwa is being used
only to write to local filesystems, calling fsync is not really necessary.
To disable it, comment out #define FSYNC_ON_FLUSH in utils.c.