Add call stack diagram & x86_64 conventions

This commit is contained in:
Théophile Bastian 2018-08-04 01:28:14 +02:00
parent 8203502e9a
commit c9773681cf
7 changed files with 11232 additions and 7 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

View file

@ -17,6 +17,7 @@ Under supervision of Francesco Zappa-Nardelli\\
\usepackage[utf8]{inputenc} \usepackage[utf8]{inputenc}
\usepackage{makecell} \usepackage{makecell}
\usepackage{booktabs} \usepackage{booktabs}
\usepackage{wrapfig}
%\usepackage[backend=biber,style=alphabetic]{biblatex} %\usepackage[backend=biber,style=alphabetic]{biblatex}
\usepackage[backend=biber]{biblatex} \usepackage[backend=biber]{biblatex}
@ -55,7 +56,7 @@ Under supervision of Francesco Zappa-Nardelli\\
\section{Stack unwinding data presentation} \section{Stack unwinding data presentation}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Stack frames and unwinding} \subsection{Stack frames and x86\_64 calling conventions}
On most platforms, programs make use of a \emph{call stack} to store On most platforms, programs make use of a \emph{call stack} to store
information about the nested function calls at the current execution point, and information about the nested function calls at the current execution point, and
@ -66,6 +67,43 @@ up to the compiler. Those frames are typically used for storing function
arguments, machine registers that must be restored before returning, the arguments, machine registers that must be restored before returning, the
function's return address and local variables. function's return address and local variables.
On the x86\_64 platform, with which this report is mostly concerned, the
calling convention that is followed is defined in the System V
ABI~\cite{systemVabi} for the Unix-like operating systems (among which Linux).
Under this calling convention, the first six arguments of a function are passed
in the registers \reg{rdi}, \reg{rsi}, \reg{rdx}, \reg{rcx}, \reg{r8},
\reg{r9}, while additional arguments are pushed onto the stack. It also defines
which registers may be overwritten by the callee, and which parameters must be
restored before returning (which most of the time is done by pushing the
register value onto the stack in the function prelude, and restoring it just
before returning). Those preserved registers are \reg{rbx}, \reg{rsp},
\reg{rbp}, \reg{r12}, \reg{r13}, \reg{r14}, \reg{r15}.
\begin{wrapfigure}{r}{0.4\textwidth}
\centering
\includegraphics[width=0.9\linewidth]{imgs/call_stack/call_stack.png}
\caption{Program stack with x86\_64 calling
conventions}\label{fig:call_stack}
\end{wrapfigure}
The register \reg{rsp} is supposed to always point just past the last used
memory cell in the stack, thus, when the process just enters a new function,
\reg{rsp} points 8 bytes after the location of the return address. Then, the
compiler might use \reg{rbp} (``base pointer'') to save this value of
\reg{rip}, by writing the old value of \reg{rbp} just below the return address
on the stack, then copying \reg{rsp} to \reg{rbp}. This makes it easy to find
the return address from anywhere within the function, and also allows for easy
addressing of local variables.
Often, a function will start by subtracting some value to \reg{rsp}, allocating
some space in the stack frame for its local variables. Then, it will push on
the stack the values of the callee-saved registers that are overwritten later,
effectively saving them. Before returning, it will pop the values of the saved
registers back to their original registers, then restoring \reg{rsp} to its
former value.
\subsection{Stack unwinding}
For various reasons, it might be interesting, at some point of the execution of For various reasons, it might be interesting, at some point of the execution of
a program, to glance at its program stack and be able to extract informations a program, to glance at its program stack and be able to extract informations
from it. For instance, when running a debugger such as \prog{gdb}, a frequent from it. For instance, when running a debugger such as \prog{gdb}, a frequent
@ -539,12 +577,32 @@ Section~\ref{sec:semantics} above. This C code is then compiled by GCC,
providing for free all the optimisation passes of a modern compiler. This code providing for free all the optimisation passes of a modern compiler. This code
is compiled as a shared library, containing a single function, taking as is compiled as a shared library, containing a single function, taking as
argument an instruction pointer and a memory context (\ie{} the value of the argument an instruction pointer and a memory context (\ie{} the value of the
various machine registers). An optionally enabled parameter can be used to pass various machine registers) as defined in Listing~\ref{lst:unw_ctx}. An
a function pointer to a dereferencing function, that conceptually does what the optionally enabled parameter can be used to pass a function pointer to a
dereferencing \lstc{*} operator on a pointer, and is used to unwind a process dereferencing function, that conceptually does what the dereferencing \lstc{*}
that is not the currently running process, and thus not sharing the same operator does on a pointer, and is used to unwind a process that is not the
address space. A call to this function returns a fresh memory context, currently running process, and thus not sharing the same address space. A call
containing the values the registers hold after unwinding this frame. to this function returns a fresh memory context, containing the values the
registers hold after unwinding this frame.
Unlike in the \ehframe, and unlike what should be done in a release,
real-world-proof version of the \ehelfs, the choice was made to keep this
prototype simple, and only handle the few registers that were needed to simply
unwind the stack. Thus, the only registers handled in \ehelfs{} are \reg{rip},
\reg{rbp}, \reg{rsp} and \reg{rbx} (the latter being used quite often in
\prog{libc} to hold the CFA address). This is enough to unwind the stack, but
is not sufficient to analyze every stack frame as \prog{gdb} would do after a
\lstbash{frame n} command.
\lstinputlisting[language=C, caption={Unwinding context}, label={lst:unw_ctx}]
{src/dwarf_assembly_context/unwind_context.c}
In the unwind context from Listing~\ref{lst:unw_ctx}, the values of type
\lstc{uintptr_t} are the values of the corresponding registers, and
\lstc{flags} is a 8-bytes value, indicating for each register whether it is
present or not in this context (\ie{} if the \lstc{rbx} bit is not set, the
value of \lstc{rbx} in the structure isn't meaningful), plus an error bit,
indicating whether an error occurred during unwinding.
This generated data is stored in separate shared object files, which we call This generated data is stored in separate shared object files, which we call
\ehelfs. It would have been possible to alter the original ELF file to embed \ehelfs. It would have been possible to alter the original ELF file to embed

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,4 @@
typedef struct {
uint8_t flags;
uintptr_t rip, rsp, rbp, rbx;
} unwind_context_t;

View file

@ -0,0 +1,23 @@
unwind_context_t _eh_elf(unwind_context_t ctx, uintptr_t pc) {
unwind_context_t out_ctx;
switch(pc) {
case 0x615 ... 0x618:
out_ctx.rsp = ctx.rsp + (8);
out_ctx.rip = *((uintptr_t*)(out_ctx.rsp + (-8)));
out_ctx.flags = 3u;
return out_ctx;
case 0x619 ... 0x658:
out_ctx.rsp = ctx.rsp + (48);
out_ctx.rip = *((uintptr_t*)(out_ctx.rsp + (-8)));
out_ctx.flags = 3u;
return out_ctx;
case 0x659 ... 0x659:
out_ctx.rsp = ctx.rsp + (8);
out_ctx.rip = *((uintptr_t*)(out_ctx.rsp + (-8)));
out_ctx.flags = 3u;
return out_ctx;
default:
out_ctx.flags = 128u;
return out_ctx;
}
}

View file

@ -12,6 +12,13 @@
author = {C11}, author = {C11},
} }
@manual{systemVabi,
title = {System V Application Binary Interface, AMD64
architecture},
url = {https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf},
}
@online{libunwind, @online{libunwind,
title = {Libunwind webpage}, title = {Libunwind webpage},
url = {http://www.nongnu.org/libunwind/}, url = {http://www.nongnu.org/libunwind/},