Insert static branch prediction predicates in useful places and avoid
unnecessary code in the hottest paths. Bypass unnecessary indirect
calls, in particular to access_mem(), when known to be safe.
Dropping the extra frame for unw_backtrace itself using unw_step is
approximately 15% slower than skipping the frame in tdep_trace. So
drop the frame in the latter, and make the function a private
implementation detail for libunwind, not an exported interface.
Also moves unw_getcontext call back into unw_backtrace to avoid an
extra call frame in case slow_backtrace does not get inlined into
unw_backtrace.
Adds new function to perform a pure stack walk without unwinding,
functionally similar to backtrace() but accelerated by an address
attribute cache the caller maintains across calls.