fast-bwa

Commit Graph

Author	SHA1	Message	Date
John Marshall	b64ccddda7	On ARM, rewrite SSE2 SIMD calls using Neon intrinsics Many Intel intrinsics have a corresponding Neon equivalent. Other cases are more interesting: * Neon's vmaxvq directly selects the maximum entry in a vector, so can be used to implement both the __max_16/__max_8 macros and the _mm_movemask_epi8 early loop exit. Introduce additional helper macros alongside __max_16/__max_8 so that the early loop exit can similarly be implemented differently on the two platforms. * Full-width shifts can be done via vextq. This is defined close to the ksw_u8()/ksw_i16() functions (rather than in neon_sse.h) as it implicitly uses one of their local variables. * ksw_i16() uses saturating signed 16-bit operations apart from _mm_subs_epu16; presumably the data is effectively still signed but we wish to keep it non-negative. The ARM intrinsics are more careful about type checking, so this requires an extra U16() helper macro.	2022-06-20 20:43:17 +01:00

Author

SHA1

Message

Date

John Marshall

b64ccddda7

On ARM, rewrite SSE2 SIMD calls using Neon intrinsics

Many Intel intrinsics have a corresponding Neon equivalent.
Other cases are more interesting:

* Neon's vmaxvq directly selects the maximum entry in a vector,
  so can be used to implement both the __max_16/__max_8 macros
  and the _mm_movemask_epi8 early loop exit. Introduce additional
  helper macros alongside __max_16/__max_8 so that the early loop
  exit can similarly be implemented differently on the two platforms.

* Full-width shifts can be done via vextq. This is defined close to
  the ksw_u8()/ksw_i16() functions (rather than in neon_sse.h) as it
  implicitly uses one of their local variables.

* ksw_i16() uses saturating *signed* 16-bit operations apart from
  _mm_subs_epu16; presumably the data is effectively still signed but
  we wish to keep it non-negative. The ARM intrinsics are more careful
  about type checking, so this requires an extra U16() helper macro.

2022-06-20 20:43:17 +01:00

1 Commits (b64ccddda7ddc73053f116841079d421d2ae990f)