dwarf: If the dwarf_readu8 call to set op fails, and if there are register
states pushed onto the stack, the stack is not emptied before the function
returns. This change addresses that.
Most of the rest is eliminating ‘goto fail’ from the code.
0-valued hints are used when they are just initial values with no use as
hints. And I think there were some other problems as well.
This patch cleans up and stores hint values with 1 added, so that 0-valued
hints can be ignored.
unw_get_proc_info must always load the unwind info so that unw_resume
works with GNU_args_size expressions, but must not update
use_prev_expr unless we are unw_step()ing.
blame rev: 4b63a536ee
reported-by: Doug Moore <dougm@rice.edu>
It is possible to have multiple CFA_args_size adjustments for a single
frame. If the CFA_args_size adjustment is immediately following the
return from a function which can raise an exception, it is possible to
incorrectly adjust the stack pointer. Consider the following:
...
.cfi_escape 0x2e, 0x00
call f
.Ltmp:
.cfi_escape 0x2e, 0x10
lea label@GOTOFF(%ebx), %eax
...
Because we process the CFI program up to and *INCLUDING* IP, where the
IP is the RA, we would process the associated DW_CFA_GNU_args_size for
the post-call instruction. The result would be a DW_CFA_GNU_args_size
of 0x10 rather than 0x00, resulting in an incorrect stack adjustment.
Handle this by processing the CFI operation but not adjusting the state
record unless we are below the current IP.
Add interface for configurable dwarf cache size
* Use item size and round up to nearest power of 2.
* Initial cache still exists in BSS. Without this, it means we would fail
backtrace when out of memory. The test-mem test fails without this
When resuming execution, DW_CFA_GNU_args_size from the current frame
must be added back to the stack pointer. Clang now generates these frequently
at -O3. A simple repro for x86_64, that will crash with clang ~3.9 or newer:
void f(int, int,int,int,int,int,int,int,int);
int main() {
try {
f(0,1,2,3,4,5,6,7,8);
} catch (int) {
return 0;
}
return 1;
}
Where f is something that throws an int, but in a different translation unit to
prevent optimization.
This results in cfi instructions before the call:
.cfi_escape 0x2e, 0x20
Grabbing the args_size means fully parsing the cfi in the current frame, which
is unfortunate because it means nearly twice the work at each step. The logic
to grab args_size can be in unw_step or get_proc_info (since this is always
called before resuming in stack unwinding). Putting it in get_proc_info allows
the more common unw_step code to remain fast.
It would potentially fit in nicely with a proc info cache (as mentioned in the
if0 comment block)
Ubuntu's libc-bin (2.15-0ubuntu20.2) on x86_64 uses DW_CFA_val_expression
in describing the pthread spinlock operations __lll_unlock_wake() and
__lll_lock_wait(). libunwind 1.1 doesn't understand that opcode and
so backtraces from those operations are truncated.
This changeset adds basic support for it, by adding a new type to
dwarf_loc_t that describes the register's actual contents rather than
its location. I've only implemented the new type for x86_64, and
stubbed it out for all other architectures -- it looks like a lot
of that code is duplicated so oughtn't to be that hard, but I don't
have test cases for them.
Tested that DW_CFA_val_expression works on x86_64 (by using
https://code.google.com/p/gperftools/ on a lock-heavy program).
Build-tested on x86, x86_64 and arm. The unit tests don't pass for me
on any of those archs, but this cset doesn't break anything that was
passing before.
Signed-off-by: Tim Deegan <tjd@phlegethon.org>
The DWARF code allocates its unwind_info objects out of a
memory pool. The code which frees the object therefore calls
the mempool freeing code. However, there are cases where the
free code will be run with an unwind_info that was allocated
through a different mechanism (e.g. an ARM exidx table entry).
In these cases, the object should not be freed through the
mempool code.
To correct this, a check was added to ensure that the unwind_info
is of the appropriate type before passing the object along to the
mempool to be freed.
Currently the expression evaluation always succeeds,
and possible error is not propagated to the caller.
The ',' operator makes the condition always return 0.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Paul Pluzhnikov <ppluzhnikov@google.com>
Adds new function to perform a pure stack walk without unwinding,
functionally similar to backtrace() but accelerated by an address
attribute cache the caller maintains across calls.
the instruction after the call for a normal frame. libunwind uses
IP-1 to lookup unwind information. However, this is not necessary for
interrupted frames such as signal frames (or interrupt frames) in
the kernel context.
This patch handles both cases correctly.
Based on work by Mark Wielaard <mwielaard@redhat.com>
Original code was accessing rs_cache memory without holding a lock
in some cases. If there was sufficient cache pressure, entry being
accessed may be overwritten by another thread, resulting in a data
race.
We now make a thread local copy of the data, before releasing the
lock. If we end up supporting UNW_CACHE_PER_THREAD properly
in the future, this memcpy should be unnecessary.
Greetings,
Attached patch gets rid of additional unnecessary branch (rs_get_cache
can not return NULL unless caching_policy is UNW_CACHE_NONE), gets rid of
goto's, and makes apply_reg_state (major CPU consumer) execute with cache
lock not held (before the patch, apply_reg_state was called with lock held
for newly-inserted entries, but not for found-in-cache entries).
Tested on Linux/x86_64 with no regressions.
Thanks,
--
Paul Pluzhnikov
Greetings,
Attached patch is rather on the obvious side:
- rs1 can't be NULL since it's assigned on previous line
- rs_new never returns NULL, and if it ever did, we'd crash on memcpy that
preceeds the NULL check.
Tested on Linux/x86_64 with no regressions.
Thanks,
--
Paul Pluzhnikov
The behavior on wait vs abort unwind depends on the locking primitive
chosen by the user. This makes the API consistent and independent of
the locking primitive.
Greetings,
We use libunwind just for stack traces (I suspect many others do as well).
The use pattern is:
GetStackTrace(void** result, int max_depth)
{
...
unw_getcontext(&uc);
unw_init_local(&cursor, &uc);
while (n < max_depth) {
if (unw_get_reg(&cursor, UNW_REG_IP, (unw_word_t *) &ip) < 0) {
break;
}
result[n++] = ip;
if (unw_step(&cursor) <= 0) {
break;
}
}
Given this usage, it is quite convenient for us to block signals (or
prevent signal handlers from re-entering libunwind by other means) at the
"top level", which makes most of the sigprocmask calls performed by
libunwind itself unneccessary.
The second patch in this series adds a configure option which removes most
of the sigprocmask calls.
Attached patch is a preliminary for it -- consolidating all of the
"sigprocmask; mutex_lock;" sequences into lock_acquire and "mutex_unlock;
sigprocmask;" sequences into lock_release.
Thanks,
--
Paul Pluzhnikov
commit 402d15b123d54a7669db7cf17a76dd315094e472
Author: Paul Pluzhnikov <ppluzhnikov@google.com>
Date: Mon Sep 21 10:18:28 2009 -0700
Replace "sigprocmask + mutext_lock" with a single lock_acquire.
Likewise, replace "mutext_unlock + sigprocmask" with lock_release.
Now that dwarf_find_save_locs() not just finds the save-locations but
also updates the cursor-state, document this fact (the function really
is misnamed now).