The ARM Thumb implementation of unw_tdep_getcontext switches to ARM
mode (".code 32"), but doesn't switch back to Thumb mode.
In gcc, this is fine (it automatically switches back to Thumb mode
at the end of an asm block), but in clang, this causes bad assembly
output (thumb instructions generated by C/C++ code later on are
interpreted as ARM mode assembly, which can't work).
Switching back to Thumb mode manually fixes clang, and is a no-op for gcc.
Ben Avison (bavison@riscopen.org) has observed that when a synthetic
eh_frame_hdr is generated, there is no space in it for the eh_frame,
so the eh_frame value is written to, and later read from, memory that
is not assigned to this purpose, with unpredictable results.
This change adds a new field to the dwarf_eh_frame_hdr type, to
make room for that value, and adds the (packed) attribute to the
struct defintion to avoid a problem with unused space in the struct.
This allows libunwind to work in situations with limited stack size (ie. fibers). 512 is still more than enough for storing the cursor with some slack. (#25)
Rather than using a copy of dwarf_find_proc_info that differs from it slightly.
By using dwarf_find_proc_info, a potential search of the di table is
allowed, where it is omitted now. Also, for ARM, avoid runtime
checks about which kind of unwind table search to use after dl_iterate_phdr.
A couple of Debug() warnings about ip lookup failure are lost here.
The dwarf callback struct defintion is moved to Gfind_proc_info-lsb.c,
which becomes the only source file that needs it.
so that it will get saved and restored with the register state. Initialize
the rs_state version of ret_addr_column at the some time the dwarf_cursor
version is initialized, and don't bother copying ret_addr_column explicitly
from cursor to cache since it's copied implicitly as part of reg_state.
Use the reg_state version in apply_reg_state, instead of the cursor version.
Which brings up the question: why do we have ret_addr_column in the dwarf_cursor?
We call find_reg_state before calling apply_reg_state, so the value of ret_addr_column
in the cursor when dwarf_step gets called gets overwritten before it is used. So
it's initial value doesn't matter. But some architectures do funky things with
cursor->ret_addr_column, even though I don't see how they matter.
So I'm not deleting dwarf_cursor->ret_addr_column, even though I suspect this
patch makes it useless.
In src/ppc64/Gstep.c, we use fetch32 to fetch instruction from the
inferior process. In UNW_REMOTE case, fetch32 asserts that the
address we are fetching from is aligned.
But if the inferior is corrupt, we can get unaligned IP, and hit the assert.
Attached patch removes the assert, and makes fetch32 (and fetch16)
return -UNW_EINVAL instead.
The attached patch fixes a problem with Xorg on MIPS64 n32 which is explained here:
https://bugs.freedesktop.org/show_bug.cgi?id=79939
Basically, libunwind is using a word size of 64-bit for all MIPS variants.
Then, Xorg does a casting to (void *) of a 64-bit variable provided by
libunwind. Given that the size of the pointers in MIPS64 n32 is 32-bit wide,
that casting causes an error like this one:
backtrace.c:90:20: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
libunwind already had support for local unwind on a MIPS. This patch makes
support for remote unwinding on a MIPS.
I should add a few words to the changes to _UPT_access_mem.c: On MIPS, an
unw_word_t is defined as a 64-bit integer whether it's compiled for a 32- or a
64-bit MIPS.
When doing remote unwinding using the default _UPT_accessors, dwarf_readu8()
therefore expects _UPT_access_mem() to return a 64-bit integer. However, if
compiled on a 32-bit MIPS, only 32 bits are valid upon return from
_UPT_access_mem(). The patch detects this and will in this case perform two
calls to ptrace(PTRACE_POKE/PEEK_DATA) and organize the return value according
to endianness.
Add interface for configurable dwarf cache size
* Use item size and round up to nearest power of 2.
* Initial cache still exists in BSS. Without this, it means we would fail
backtrace when out of memory. The test-mem test fails without this
GCC versions 4.9~current will often generate stack alignment prologues like:
lea 0x8(%rsp),%r10
and $0xfffffffffffffff0,%rsp
...
push %rbp
mov %rsp, %rbp
push %r10
resulting in dwarf expressions:
DW_CFA_def_cfa_expression (DW_OP_breg6: -8; DW_OP_deref)
DW_CFA_expression: r6 (rbp) (DW_OP_breg6: 0)
These prologues seem to be generated for SSE/AVX code, but sometimes
other times as well.
tdep_trace fastpath currently falls back to the slow dwarf parsing path
if it encounters any cfa_expressions. Unfortunately this is happening
often enough in our codebase to cause perf issues. We could also fix the
fallback path (make the rs cache bigger, lock-free instead of locking, etc),
but that seems like a separate issue, and it will ever be as fast as the tracing
code. Our binaries each have at least ~100 functions in them like this.
This patch teaches the tdep_trace about the two specific cfa_expressions,
which really just result in a single extra memory dereference of the stack
at a fixed offset from rbp.
By default, the start_ip_offset in libunwind's table_entry struct is
relative to the unw_dyn_info_t's segbase. This presents a problem
for us in conjunction with using LLVM's MCJIT because it likes to
spread text sections and the corresponding eh_frame sections quite
far apart. This represents my attempt to support this use case in the
simplest manner that is backwards compatible, by adding a new format
kind (UNW_INFO_FORMAT_REMOTE_TABLE2) that indicates that the
`start_ip_offset` should be interpreted as relative to `start_ip`
rather than segbase.
Due to a bug in the gold linker[1], the .eh_frame and .eh_frame_hdr
sections contains garbage. When dwarf_extract_proc_info_from_fde tried
to look up the begin of the CIE subsection, it would underflow the
.eh_frame segment, resulting in a crash[2].
This patch avoids that crash by checking whether the CIE pointer is
located after the begin of the .eh_frame section. The variable "base"
was misused in various places as a boolean (decode as .debug_frame or
decode as .eh_frame). These instances have been renamed to
is_debug_frame where applicable.
Tested on Linux x86_64.
[1]: https://sourceware.org/bugzilla/show_bug.cgi?id=17639
[2]: http://lists.nongnu.org/archive/html/libunwind-devel/2014-11/msg00009.html
Signed-off-by: Peter Wu <peter@lekensteyn.nl>
After the following change the Lrs-race test case starts to intermittently
fails:
eac65dc Add basic support for the QNX operating system
When we include "config.h" into the "libunwind_i.h" we undefine
the HAVE___THREAD macro in a few lines below in #include "config.h"
pragma. The change eac65dc includes "config.h" into the "dwarf.h"
but forgets to undefine HAVE___THREAD. So now this macro has inconsistent
state among the code. Somewhere it is defined, somewhere not. In particular
it becomes defined in the mi/Gset_caching_policy.c and we do not replace
UNW_CACHE_PER_THREAD caching policy by the UNW_CACHE_GLOBAL.
The fix is rather dirty. It adds the code to undefine HAVE___THREAD in
the "dwarf.h" like we do that in the "libunwind_i.h". Probably the ideal
solution should fix per-thread caching implementation or turned it off
at all on platforms where it is not completely and correctly supported.
Signed-off-by: Simon Atanasyan <simon@atanasyan.com>
Hi,
there is a compilation issue with Clang and latest libunwind - It's
about "typedef struct unw_tdep_save_loc" and one more struct:
include/libunwind-x86_64.h:111:9: error: empty struct has size 0 in C,
size 1 in C++ [-Werror,-Wextern-c-compat]
The solution is very simple:
Ubuntu's libc-bin (2.15-0ubuntu20.2) on x86_64 uses DW_CFA_val_expression
in describing the pthread spinlock operations __lll_unlock_wake() and
__lll_lock_wait(). libunwind 1.1 doesn't understand that opcode and
so backtraces from those operations are truncated.
This changeset adds basic support for it, by adding a new type to
dwarf_loc_t that describes the register's actual contents rather than
its location. I've only implemented the new type for x86_64, and
stubbed it out for all other architectures -- it looks like a lot
of that code is duplicated so oughtn't to be that hard, but I don't
have test cases for them.
Tested that DW_CFA_val_expression works on x86_64 (by using
https://code.google.com/p/gperftools/ on a lock-heavy program).
Build-tested on x86, x86_64 and arm. The unit tests don't pass for me
on any of those archs, but this cset doesn't break anything that was
passing before.
Signed-off-by: Tim Deegan <tjd@phlegethon.org>
This patch adds support for the powerpc64le-linux platform. It consists
of two main features:
- Support little-endian byte order
This is done via a "big_endian" member of struct unw_addr_space,
which is evaluated by common code via the dwarf_is_big_endian
macro, and also in endian-aware code in unw_is_signal_frame.
- Support the ELFv2 ABI
This is done via an "abi" member of struct unw_addr_space. This
is currently only needed in tdep_get_func_addr, since the ELFv2
ABI does not use function descriptors.
Both new members are initialized in unw_create_addr_space and
ppc64_local_addr_space_init, following the mips precedent.
Since ppc32 and ppc64 now no longer share the unw_create_addr_space
implementation, the file is duplicated from the ppc directory into
ppc32/ppc64.
Tested on powerpc64-linux and powerpc64le-linux. Support on LE
seems to be as good as existing BE support; I have not attempted to
fix the existing shortcomings of PPC support that already cause a
number to tests to fail due to unimplemented features.
Signed-off-by: Ulrich Weigand <uweigand@de.ibm.com>