This is rather on the obvious side.
While doing strace on an executable using libunwind, I noticed a
lot of:
msync(0, 1, MS_SYNC) = -1 ENOMEM (Cannot allocate memory)
Since we know that the first page isn't mapped (or at least doesn't
contain the data we are looking for), we can eliminate all such
msync calls.
Tested on Linux/x86_64 with no regressions.
Original code was accessing rs_cache memory without holding a lock
in some cases. If there was sufficient cache pressure, entry being
accessed may be overwritten by another thread, resulting in a data
race.
We now make a thread local copy of the data, before releasing the
lock. If we end up supporting UNW_CACHE_PER_THREAD properly
in the future, this memcpy should be unnecessary.
Greetings,
Attached patch is rather on the obvious side: setting caching policy and
than doing nothing is pointless; we'd better acutally test that it works!
Tested on Linux/x86_64.
Thanks,
--
Paul Pluzhnikov
Greetings,
Attached patch gets rid of additional unnecessary branch (rs_get_cache
can not return NULL unless caching_policy is UNW_CACHE_NONE), gets rid of
goto's, and makes apply_reg_state (major CPU consumer) execute with cache
lock not held (before the patch, apply_reg_state was called with lock held
for newly-inserted entries, but not for found-in-cache entries).
Tested on Linux/x86_64 with no regressions.
Thanks,
--
Paul Pluzhnikov
Greetings,
Attached patch is rather on the obvious side:
- rs1 can't be NULL since it's assigned on previous line
- rs_new never returns NULL, and if it ever did, we'd crash on memcpy that
preceeds the NULL check.
Tested on Linux/x86_64 with no regressions.
Thanks,
--
Paul Pluzhnikov
Currently, libunwind allocates several PATH_MAX entries on stack, while
trying to find a binary via /proc/.../maps.
However stack space may be at premium (especially when sigaltstack is used),
and PATH_MAX on Linux is 4096, while SIGSTKSZ is only 8192 on x86.
Attached patch eliminates multiple PATH_MAX stack allocations, and simplifies
code in maps_next, at the cost of being unable to do anything if we can't
mmap one page. It appears to me that under such low-memory conditions,
libunwind will fail shortly elsewhere anyway.
This patch also disables more of debug_frame-handling code when
CONFIG_DEBUG_FRAME is undefined.
Tested on Linux/x86_64 with and without CONFIG_DEBUG_FRAME, no regressions.
The behavior on wait vs abort unwind depends on the locking primitive
chosen by the user. This makes the API consistent and independent of
the locking primitive.
Greetings,
Here is the second part, actually implementing the configure option.
Thanks,
--
Paul Pluzhnikov
commit cf823ed0d4d2447aa91af0e3cb5fbb6a6cba5068
Author: Paul Pluzhnikov <ppluzhnikov@google.com>
Date: Mon Sep 21 11:37:38 2009 -0700
New configure option to allow caller to block signals.
Greetings,
We use libunwind just for stack traces (I suspect many others do as well).
The use pattern is:
GetStackTrace(void** result, int max_depth)
{
...
unw_getcontext(&uc);
unw_init_local(&cursor, &uc);
while (n < max_depth) {
if (unw_get_reg(&cursor, UNW_REG_IP, (unw_word_t *) &ip) < 0) {
break;
}
result[n++] = ip;
if (unw_step(&cursor) <= 0) {
break;
}
}
Given this usage, it is quite convenient for us to block signals (or
prevent signal handlers from re-entering libunwind by other means) at the
"top level", which makes most of the sigprocmask calls performed by
libunwind itself unneccessary.
The second patch in this series adds a configure option which removes most
of the sigprocmask calls.
Attached patch is a preliminary for it -- consolidating all of the
"sigprocmask; mutex_lock;" sequences into lock_acquire and "mutex_unlock;
sigprocmask;" sequences into lock_release.
Thanks,
--
Paul Pluzhnikov
commit 402d15b123d54a7669db7cf17a76dd315094e472
Author: Paul Pluzhnikov <ppluzhnikov@google.com>
Date: Mon Sep 21 10:18:28 2009 -0700
Replace "sigprocmask + mutext_lock" with a single lock_acquire.
Likewise, replace "mutext_unlock + sigprocmask" with lock_release.
This rule (no IP adjustment on ia64) may be correct for locating the right FDE.
Unfortunately the same adjusted/unadjusted return address is being used also by
__gxx_personality_v0() to locate the right call-site (the try {} block) for
unwinding. And this case is already sensitive for off-by-one PC values.
Unlike the FDE location where the function prologue + epilogue make it immune
against off-by-one PC calculations.
Therefore suggesting to unify it with non-ia64 case.
Signed-off-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Provide a special implementation for ia64, because the unwind
information is such that an IP adjustment is not necessary before
looking up unwind info.
Bad things happen if libunwind only provides parts of the ABI and
the rest come from libgcc.
Signed-off-by: Jan Kratochvil <jan.kratochvil@redhat.com>
getcontext in libc.
Also cleanup the namespace (check-name-space passes on x86_64 now).
Replace uses of offsets.h with ucontext_i.h.
Rename _x86_64_setcontext to _Ux86_64_setcontext.
TBD: Add CFI annotations for get/setcontext.
Signed-off-by: Paul Pluzhnikov <ppluzhnikov@google.com>
Signed-off-by: Arun Sharma <arun.sharma@google.com>
bad/missing unwind information, which could result in libunwind
dereferencing bad pointers. This mechanism is based on msync(2) system
call and significantly reduces the chances of a bad pointer
dereference in libunwind.
The original idea was to turn this mechanism on only when necessary
i.e. libunwind didn't find proper unwind information for a IP.
There are a couple of problems in the current implementation.
* The flag is global and is modified without locking
* The flag isn't reset when starting a new unwind
The attached patch makes ->validate a per-thread setting by moving it
into struct cursor from unw_local_addr_space and resets it to false
when starting a new unwind. As a result, cursor->as_arg points to the
cursor itself instead of the ucontext (for the local case).
This was found to reduce the number of msync() system calls from an
application using libunwind significantly.
Signed-off-by: Paul Pluzhnikov <ppluzhnikov@google.com>
Signed-off-by: Arun Sharma <arun.sharma@google.com>
* src/arm/unwind_i (arm_lock, arm_local_resume): Define.
* src/ptrace/_UPT_find_proc_info.c: Handle ARM like X86 etc.
* tests/flush-cache.S (flush_cache): Add (dummy) ARM-version.
ARM does need executable stack, even on Linux...
Signed-off-by: Anderson Lizardo <anderson.lizardo@indt.org.br>
Signed-off-by: Bruna Moreira <bruna.moreira@indt.org.br>
- Gtest-bt: like on x86/-64, the stack size passed to sigaltstack() is
too small for ARM thus causing segmentation fault due to stack
overflow.
- Gtest-dyn1: code size definition of dynamic function (template()) on
testcase is too big for ARM architecture so memcpy() reads invalid
memory causing random crashes (segmentation fault). A better
solution would be to compile the function in a separate binary,
mmap() it and memcpy() from it instead, so maximum size is known for
sure.
- check-name-space.in: fix some "bashisms", it causes the script to
fail to run on N8XX's busybox shell.
Signed-off-by: Anderson Lizardo <anderson.lizardo@indt.org.br>
Signed-off-by: Bruna Moreira <bruna.moreira@indt.org.br>