The crashes were tracked down to f->rpb_cfa_offset being incorrect.
The problem is that {rsp,rbp}_cfa_offset only have 15 bits, but for
SIGRETURN frame they are filled with:
// src/x86_64/Gstash_frame.c
f->cfa_reg_offset = d->cfa - c->sigcontext_addr;
f->rbp_cfa_offset = DWARF_GET_LOC(d->loc[RBP]) - d->cfa;
f->rsp_cfa_offset = DWARF_GET_LOC(d->loc[RSP]) - d->cfa;
The problem is that the delta here can be arbitrarily large when
sigaltstack is used, and can easily overflow the 15 and 30-bit fields.
When signal handler starts running, the stack layout is:
... higher addresses ...
ucontext
CFA->
__restore_rt (== pretcode in rt_sigframe from
linux-2.6/arch/x86/include/asm/sigframe.h)
SP ->
... sighandler runs on this stack.
... lower addresses ...
This makes it very convenient to find ucontext from the CFA.
Attached patch re-tested on Linux/x86_64, no new failures.
Signed-off-by: Paul Pluzhnikov <ppluzhnikov@google.com>
Reviwed-by: Lassi Tuura <lat@cern.ch>