In the bwa.c and bwase.c calls, rlen is an int64_t returned from
bns_get_seq() and is the number of reference bases covered by the
alignment; l_query/len is an int and the query length of the alignment;
and the result is an int given to an int parameter of ksw_global[2]().
As even the result is int and as rlen is effectively bounded by the
maximum length of a reference sequence, we maintain the status quo in
this code and simply cast rlen to int to silence Clang's "use llabs()"
(llabs() would not be a great answer given an int64_t anyway).
The bwtsw2_pair.c call needs to remain fabs() so both divisions are
done in floating point; cast to double to prevent Clang suggesting
changing the call to integer abs().
1. Check .sai versioning
2. Keep track of #ins and #del during backtrack
3. Use info above to get accurate aligned regions; don't call SW extension any more
4. Identify alignment crossing the for-rev boundary
5. Fixed a bug in printing the XA tag: ungapped alignments missing
Remove xmalloc, xcalloc, xrealloc and xstrdup from utils.h and revert calls
to the normal malloc, calloc, realloc, strdup. Add new files malloc_wrap.[ch]
with the wrapper functions. malloc_wrap.h #defines malloc etc. to the
wrapper, but only if USE_MALLOC_WRAPPERS has been defined.
Put #include "malloc_wrap.h" in any file that uses *alloc or strdup. This
is also in a #ifdef USE_MALLOC_WRAPPERS ... #endif block to make using the
wrappers optional. Add -DUSE_MALLOC_WRAPPERS into the makefile so they
should normally get added.
This is an improvement on the previous method as we now don't need to
worry about stray function calls that were not changed to the wrapped version
and the code will still work even if the wrapping is disabled.
Other possible methods of doing this are using malloc_hook (glibc-specific),
adding -include malloc_wrap.h to the gcc command-line (somewhat
gcc-specific) or making our own malloc function and using dlopen (scary).
This way is probably the most portable.
Added missing default cases in option scanning.
Ensure exit value is 1 if bwa_idx_load or bwa_idx_infer_prefix fail.
These changes extend the previous one, which only fixed the mem aligner.
When backtracking, bwa-short does not keep the detailed alignment or the exact
start and end positions. To find the boundary and the CIGAR, the old code does
a global alignment with a small end-gap penalty. It then deals with a lot of
special cases to derive the right position and CIGAR, which are actually not
always right. It is a mess.
As the new ksw.{c,h} does not support a different end-gap penalty, the old
strategy does not work. But we get something better. The new code finds the
boundaries with ksw_extend(). It is cleaner and gives more accurate CIGAR in
most cases.
stdaln.{c,h} was written ten years ago. Its local and SW extension code are
actually buggy (though that rarely happens and usually does not affect the
results too much). ksw.{c,h} is more concise, potentially faster, less buggy,
and richer in features.
Added wrappers err_fputc and err_fputs to catch failures in fput and fputs.
Macros err_putchar and err_puts call the new wrappers and can be used in
place of putchar and puts.
To avoid having to make millions of function calls when printing out
sequences, the code to print them in bwa_print_sam1 using putchar has
been replaced by a new version in bwa_print_seq that puts the sequence
into a buffer and then outputs the lot with err_fwrite. In testing, the
new code was slightly faster than the old version, with the added benefit
that it will stop promptly if IO problems are detected.