From 0a7b8b4e64318024e080c12817d0dfb33edd92f4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Th=C3=A9ophile=20Bastian?= Date: Wed, 1 Aug 2018 19:55:13 +0200 Subject: [PATCH] Start describing DWARF --- report/report.tex | 47 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 44 insertions(+), 3 deletions(-) diff --git a/report/report.tex b/report/report.tex index 330eb9e..3d24b28 100644 --- a/report/report.tex +++ b/report/report.tex @@ -55,14 +55,55 @@ Under supervision of Francesco Zappa-Nardelli\\ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Stack frames and unwinding} -\todo{} + +On most platforms, programs make use of a \emph{call stack} to store +information about the nested function calls at the current execution point, and +keep track of their nesting. Each function call has its own \emph{stack frame}, +an entry of the call stack, whose precise contents are often specified in the +Application Binary Interface (ABI) of the platform, and left to various extents +up to the compiler. Those frames are typically used for storing function +arguments, machine registers that must be restored before returning, the +function's return address and local variables. + +For various reasons, it might be interesting, at some point of the execution of +a program, to glance at its program stack and be able to extract informations +from it. For instance, when running a debugger such as \prog{gdb}, a frequent +usage is to obtain a \emph{backtrace}, that is, the list of all nested function +calls at this point. This actually reads the stack to find the different stack +frames, and decode them to identify the function names, parameter values, etc. + +This operation is far from trivial. Often, a stack frame will only make sense +with correct machine registers values, which can be restored from the previous +stack frame, imposing to \emph{walk} the stack, reading the entries one after +the other, instead of peeking at some frame directly. Moreover, the size of one +stack frame is often not that easy to determine when looking at some +instruction other than \texttt{return}, making it hard to extract single frames +from the whole stack. + +Interpreting a frame in order to get the machine state \emph{before} this +frame, and thus be able to decode the next frame recursively, is called +\emph{unwinding} a frame. For all the reasons above and more, it is often +necessary to have additional data to perform stack unwinding. This data is +often stored among the debugging informations of a program, and one common +format of debugging data is DWARF\@. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{DWARF format} -\todo{} + +The DWARF format was first standardized as the format for debugging +information of the ELF executable binaries. It is now commonly used across a +wide variety of binary formats to store debugging information. As of now, the +latest DWARF standard is DWARF 5~\cite{dwarf5std}, which is openly accessible. + +The DWARF data commonly includes type information about the variables in the +original programming language, correspondence of assembly instructions with a +line in the original source file, \ldots +The format also specifies a way to represent unwinding data, as described in +the previous paragraph, in an ELF section originally called +\lstc{.debug_frame}, most often found as \lstc{.eh_frame}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsection{DWARF functioning} +\subsection{DWARF unwinding data} \todo{} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%