GCC versions 4.9~current will often generate stack alignment prologues like:
lea 0x8(%rsp),%r10
and $0xfffffffffffffff0,%rsp
...
push %rbp
mov %rsp, %rbp
push %r10
resulting in dwarf expressions:
DW_CFA_def_cfa_expression (DW_OP_breg6: -8; DW_OP_deref)
DW_CFA_expression: r6 (rbp) (DW_OP_breg6: 0)
These prologues seem to be generated for SSE/AVX code, but sometimes
other times as well.
tdep_trace fastpath currently falls back to the slow dwarf parsing path
if it encounters any cfa_expressions. Unfortunately this is happening
often enough in our codebase to cause perf issues. We could also fix the
fallback path (make the rs cache bigger, lock-free instead of locking, etc),
but that seems like a separate issue, and it will ever be as fast as the tracing
code. Our binaries each have at least ~100 functions in them like this.
This patch teaches the tdep_trace about the two specific cfa_expressions,
which really just result in a single extra memory dereference of the stack
at a fixed offset from rbp.
Ubuntu's libc-bin (2.15-0ubuntu20.2) on x86_64 uses DW_CFA_val_expression
in describing the pthread spinlock operations __lll_unlock_wake() and
__lll_lock_wait(). libunwind 1.1 doesn't understand that opcode and
so backtraces from those operations are truncated.
This changeset adds basic support for it, by adding a new type to
dwarf_loc_t that describes the register's actual contents rather than
its location. I've only implemented the new type for x86_64, and
stubbed it out for all other architectures -- it looks like a lot
of that code is duplicated so oughtn't to be that hard, but I don't
have test cases for them.
Tested that DW_CFA_val_expression works on x86_64 (by using
https://code.google.com/p/gperftools/ on a lock-heavy program).
Build-tested on x86, x86_64 and arm. The unit tests don't pass for me
on any of those archs, but this cset doesn't break anything that was
passing before.
Signed-off-by: Tim Deegan <tjd@phlegethon.org>
- Add tdep macro for {dwarf,ia64}_find_unwind_table so that ia64
doesn't try to use dwarf code.
- Fix extraneous #if.
- Fix mistyped filename in Makefile.am.
- Link ia64-specific tests with correct libraries.
Signed-off-by: Martin Milata <mmilata@redhat.com>
Insert static branch prediction predicates in useful places and avoid
unnecessary code in the hottest paths. Bypass unnecessary indirect
calls, in particular to access_mem(), when known to be safe.
Since the fast unwinding code path doesn't need the full context,
a faster target dependent getcontext is implemented.
Signed-off-by: Lassi Tuura <lat@cern.ch>
Dropping the extra frame for unw_backtrace itself using unw_step is
approximately 15% slower than skipping the frame in tdep_trace. So
drop the frame in the latter, and make the function a private
implementation detail for libunwind, not an exported interface.
Also moves unw_getcontext call back into unw_backtrace to avoid an
extra call frame in case slow_backtrace does not get inlined into
unw_backtrace.
Adds new function to perform a pure stack walk without unwinding,
functionally similar to backtrace() but accelerated by an address
attribute cache the caller maintains across calls.
bad/missing unwind information, which could result in libunwind
dereferencing bad pointers. This mechanism is based on msync(2) system
call and significantly reduces the chances of a bad pointer
dereference in libunwind.
The original idea was to turn this mechanism on only when necessary
i.e. libunwind didn't find proper unwind information for a IP.
There are a couple of problems in the current implementation.
* The flag is global and is modified without locking
* The flag isn't reset when starting a new unwind
The attached patch makes ->validate a per-thread setting by moving it
into struct cursor from unw_local_addr_space and resets it to false
when starting a new unwind. As a result, cursor->as_arg points to the
cursor itself instead of the ucontext (for the local case).
This was found to reduce the number of msync() system calls from an
application using libunwind significantly.
Signed-off-by: Paul Pluzhnikov <ppluzhnikov@google.com>
Signed-off-by: Arun Sharma <arun.sharma@google.com>
include/dwarf.h: Declare dwarf_reg_state_pool and dwarf_cie_info_pool.
include/dwarf_i.h: Include libunwind_i.h instead of tdep.h.
Make dwarf_to_unw_regnum() a macro so it doesn't get compiled
into an object file merely because it include dwarf_i.h (important
when optimization is turned off).
(dwarf_read_encoded_pointer_inlined): New function.
include/tdep-x86/libunwind_i.h: Add include of "mempool.h".
include/tdep-x86_64/libunwind_i.h: Add include of "mempool.h".