On FreeBSD, as well as on the Solaris < 10, weak pthread_once stub is
always exported from libc. But it does nothing, which means that if
threaded library is not loaded, then pthread_once() call do not actually
call the initializer finction. The construct
if (likely (pthread_once != 0))
{
pthread_once(&trace_cache_once, &trace_cache_init_once);
then fails to initialize the trace cache on x86_64.
Work around by checking that the initializer was indeed called.
Note that this can break if libthr is loaded dynamically, but my belief
is that there is no platforms which allow dynamic loading of the threading
library.
Insert static branch prediction predicates in useful places and avoid
unnecessary code in the hottest paths. Bypass unnecessary indirect
calls, in particular to access_mem(), when known to be safe.
Dropping the extra frame for unw_backtrace itself using unw_step is
approximately 15% slower than skipping the frame in tdep_trace. So
drop the frame in the latter, and make the function a private
implementation detail for libunwind, not an exported interface.
Also moves unw_getcontext call back into unw_backtrace to avoid an
extra call frame in case slow_backtrace does not get inlined into
unw_backtrace.
Adds new function to perform a pure stack walk without unwinding,
functionally similar to backtrace() but accelerated by an address
attribute cache the caller maintains across calls.