Many Intel intrinsics have a corresponding Neon equivalent.
Other cases are more interesting:
* Neon's vmaxvq directly selects the maximum entry in a vector,
so can be used to implement both the __max_16/__max_8 macros
and the _mm_movemask_epi8 early loop exit. Introduce additional
helper macros alongside __max_16/__max_8 so that the early loop
exit can similarly be implemented differently on the two platforms.
* Full-width shifts can be done via vextq. This is defined close to
the ksw_u8()/ksw_i16() functions (rather than in neon_sse.h) as it
implicitly uses one of their local variables.
* ksw_i16() uses saturating *signed* 16-bit operations apart from
_mm_subs_epu16; presumably the data is effectively still signed but
we wish to keep it non-negative. The ARM intrinsics are more careful
about type checking, so this requires an extra U16() helper macro.
The bwa makefile doesn't set these two itself, but the environment
or make command line might set any of CC/CPPFLAGS/CFLAGS/LDFLAGS/LIBS.
Use $(CPPFLAGS) when compiling and $(LDFLAGS) when linking so they can
be used to customise the build. Remove $(DFLAGS) from link commands as
these preprocessor options are irrelevant for linking.
The old method does not work when the alignment bridges three chr. This may
actually happen often. The new method does not work all the time, either, but
should be better than the old one. It is also simpler, arguably.
bamlite.c now includes some wrappers for gzopen/gzread/gzclose that print
messages when errors occur. They do not attempt to quit the program but
pass on the return code. bwaseqio.c now checks the return codes from
bam_open, bam_close and bam_read1.
Code in bwt_gen.c now checks for IO errors itself instead of using the
wrappers. A benefit of this is it can now say which file had a problem.
Removed call to err_fatal_simple in is_bwt and unnecessary inclusion of
malloc_wrap.h in ksw.h.
Remove xmalloc, xcalloc, xrealloc and xstrdup from utils.h and revert calls
to the normal malloc, calloc, realloc, strdup. Add new files malloc_wrap.[ch]
with the wrapper functions. malloc_wrap.h #defines malloc etc. to the
wrapper, but only if USE_MALLOC_WRAPPERS has been defined.
Put #include "malloc_wrap.h" in any file that uses *alloc or strdup. This
is also in a #ifdef USE_MALLOC_WRAPPERS ... #endif block to make using the
wrappers optional. Add -DUSE_MALLOC_WRAPPERS into the makefile so they
should normally get added.
This is an improvement on the previous method as we now don't need to
worry about stray function calls that were not changed to the wrapped version
and the code will still work even if the wrapping is disabled.
Other possible methods of doing this are using malloc_hook (glibc-specific),
adding -include malloc_wrap.h to the gcc command-line (somewhat
gcc-specific) or making our own malloc function and using dlopen (scary).
This way is probably the most portable.
stdaln.{c,h} was written ten years ago. Its local and SW extension code are
actually buggy (though that rarely happens and usually does not affect the
results too much). ksw.{c,h} is more concise, potentially faster, less buggy,
and richer in features.
1. Removed bwa.{h,c}. I am not going to finish them anyway.
2. Updated to the latest khash.h, which should be faster.
3. Define 64-bit vector and 128-bit integer/vector in utils.h.