report/report/report.tex

\title{DWARF debugging data, compilation and optimization}

\author{Théophile Bastian\\
Under supervision of Francesco Zappa-Nardelli\\
{\textsc{parkas}, \'Ecole Normale Supérieure de Paris}}

\date{March -- August 2018\\August 20, 2018}

\documentclass[11pt]{article}

\usepackage[left=2cm,right=2cm,top=2cm,bottom=2cm]{geometry}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{stmaryrd}
\usepackage{mathtools}
\usepackage{indentfirst}
\usepackage[utf8]{inputenc}
\usepackage{makecell}
\usepackage{booktabs}
%\usepackage[backend=biber,style=alphabetic]{biblatex}
\usepackage[backend=biber]{biblatex}

\usepackage{../shared/my_listings}
\usepackage{../shared/my_hyperref}
\usepackage{../shared/specific}
\usepackage{../shared/common}
\usepackage{../shared/todo}

\addbibresource{../shared/report.bib}

\renewcommand\theadalign{c}
\renewcommand\theadfont{\bfseries}
%\renewcommand\theadgape{\Gape[4pt]}
%\renewcommand\cellgape{\Gape[4pt]}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}

%% Main title %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\maketitle

%% Fiche de synthèse %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\input{fiche_synthese}

%% Abstract %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{abstract}
    \todo{Is there a need for an abstract, given the presence above of the
    ``fiche de synthèse''?}
\end{abstract}

%% Table of contents %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\tableofcontents

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% Main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Stack unwinding data presentation}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Stack frames and unwinding}

On most platforms, programs make use of a \emph{call stack} to store
information about the nested function calls at the current execution point, and
keep track of their nesting. Each function call has its own \emph{stack frame},
an entry of the call stack, whose precise contents are often specified in the
Application Binary Interface (ABI) of the platform, and left to various extents
up to the compiler. Those frames are typically used for storing function
arguments, machine registers that must be restored before returning, the
function's return address and local variables.

For various reasons, it might be interesting, at some point of the execution of
a program, to glance at its program stack and be able to extract informations
from it. For instance, when running a debugger such as \prog{gdb}, a frequent
usage is to obtain a \emph{backtrace}, that is, the list of all nested function
calls at this point. This actually reads the stack to find the different stack
frames, and decode them to identify the function names, parameter values, etc.

This operation is far from trivial. Often, a stack frame will only make sense
with correct machine registers values, which can be restored from the previous
stack frame, imposing to \emph{walk} the stack, reading the entries one after
the other, instead of peeking at some frame directly. Moreover, the size of one
stack frame is often not that easy to determine when looking at some
instruction other than \texttt{return}, making it hard to extract single frames
from the whole stack.

Interpreting a frame in order to get the machine state \emph{before} this
frame, and thus be able to decode the next frame recursively, is called
\emph{unwinding} a frame. For all the reasons above and more, it is often
necessary to have additional data to perform stack unwinding. This data is
often stored among the debugging informations of a program, and one common
format of debugging data is DWARF\@.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Unwinding usage and frequency}

Stack unwinding is a more common operation that one might think at first. The
most commonly thought use-case is simply to get a stack trace of a program, and
provide a debugger with the information it needs: for instance, when inspecting
a stack trace in \prog{gdb}, it is quite common to jump to a previous frame:

\lstinputlisting{src/segfault/gdb_session}

To be able to do this, \texttt{gdb} must be able to restore \lstc{fct_a}'s
context, by unwinding \lstc{fct_b}'s frame.

\medskip

Yet, stack unwinding (and thus debugging data) \emph{is not limited to
debugging}.

Another common usage is profiling. A profiling tool, such as \prog{perf} under
Linux, is used to measure and analyze in which functions a program spends its
time, identify bottlenecks and find out which parts are critical to optimize.
To do so, modern profilers pause the traced program at regular, short
intervals, inspect their stack, and determine which function is currently being
run. They also often perform a stack unwinding to determine the call path to
this function, to determine which function indirectly takes time: \eg, a
function \lstc{fct_a} can call both \lstc{fct_b} and \lstc{fct_c}, which are
quite heavy; spend practically no time directly in \lstc{fct_a}, but spend a
lot of time in calls to the other two functions that were made by \lstc{fct_a}.

Exception handling also requires a stack unwinding mechanism in most languages.
Indeed, an exception is completely different from a \lstc{return}: while the
latter returns to the previous function, the former can be caught by virtually
any function in the call path, at any point of the function. It is thus
necessary to be able to unwind frames, one by one, until a suitable
\lstc{catch} block is found. The C++ language, for one, includes a
stack-unwinding library similar to \prog{libunwind} in its runtime.

In both of these two previous cases, performance \emph{can} be a problem. In
the latter, a slow unwinding directly impacts the overall program performance,
particularly if a lot of exceptions are thrown and caught far away in their
call path. In the former, profiling \emph{is} performance-heavy and often quite
slow when analyzing large programs anyway.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{DWARF format}

The DWARF format was first standardized as the format for debugging
information of the ELF executable binaries. It is now commonly used across a
wide variety of binary formats to store debugging information. As of now, the
latest DWARF standard is DWARF 5~\cite{dwarf5std}, which is openly accessible.

The DWARF data commonly includes type information about the variables in the
original programming language, correspondence of assembly instructions with a
line in the original source file, \ldots
The format also specifies a way to represent unwinding data, as described in
the previous paragraph, in an ELF section originally called
\lstc{.debug_frame}, most often found as \ehframe.

For any binary, debugging information can easily get quite large if no
attention is payed to keeping it as compact as possible. In this matter, DWARF
does an excellent job, and everything is stored in a very compact way. This,
however, as we will see, makes it both difficult to parse correctly (with \eg{}
variable-length integers) and quite slow to interpret.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{DWARF unwinding data}

The unwinding data, which we will call from now on the \ehframe, contains, for
each possible instruction pointer (that is, an instruction address within the
program), a set of ``registers'' that can be unwound, and a rule describing how
to do so.

The DWARF language is completely agnostic of the platform and ABI, and in
particular, is completely agnostic of a particular platform's registers. Thus,
when talking about DWARF, a register is merely a numerical identifier that is
often, but not necessarily, mapped to a real machine register by the ABI\@.

In practice, this data takes the form of a collection of tables, one table per
Frame Description Entry (FDE), which most often corresponds to a function. Each
column of the table is a register (\eg{} \reg{rsp}), with two additional
special registers, CFA (Canonical Frame Address) and RA (Return Address),
containing respectively the base pointer of the current stack frame and the
return address of the current function (\ie{} for x86\_64, the unwound value of
\reg{rip}, the instruction pointer). Each row of the table is a particular
instruction pointer, within the instruction pointer range of the tabulated FDE
(assuming a FDE maps directly to a function, this range is simply the IP range
of the given function in the \lstc{.text} section of the binary), a row being
valid from its start IP to the start IP of the next row, or the end IP of the
FDE if it is the last row.

\begin{minipage}{0.45\textwidth}
    \lstinputlisting[language=C, firstline=3, lastline=12]
        {src/fib7/fib7.c}
\end{minipage} \hfill \begin{minipage}{0.45\textwidth}
    \lstinputlisting[language=C]{src/fib7/fib7.fde}
\end{minipage}

For instance, the C source code above, when compiled with \lstbash{gcc -O0
-fomit-frame-pointer}, gives the table at its right. During the function
prelude, \ie{} for $\mhex{675} \leq \reg{rip} < \mhex{679}$, the stack frame
only contains the return address, thus the CFA is 8 bytes above \reg{rsp}
(which was the value of \reg{rsp} before the call), and the return address is
precisely at \reg{rsp}. Then, 9 integers of 8 bytes each (8 for \lstc{fibo},
one for \lstc{pos}) are allocated on the stack, which puts the CFA 80 bytes
above \reg{rsp}, and the return address still 8 bytes below the CFA\@. Then, by
the end of the function, the local variables are discarded and \reg{rsp} is
reset to its value from the first row.

However, DWARF data isn't actually stored as a table in the binary files. The
first row has the location of the first IP in the FDE, and must define at least
its CFA\@. Then, when all relevant registers are defined, it is possible to
define a new row by providing a location offset (\eg{} here $4$), and the new
row is defined as a clone of the previous one, which can then be altered (\eg{}
here by setting \lstc{CFA} to $\reg{rsp} + 80$). This means that every line is
defined \wrt{} the previous one, and that the IPs of the successive rows cannot
be determined before evaluating every row before. Thus, unwinding a frame from
an IP close to the end of the frame will require evaluating pretty much every
DWARF row in the table before reaching the relevant information, slowing down
drastically the unwinding process.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{How big are FDEs?}
\todo{}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Unwinding state-of-the-art}

The most commonly used library to perform stack unwinding, in the Linux
ecosystem, is \prog{libunwind}~\cite{libunwind}. While it is very robust and
quite efficient, most of its optimization comes from fine-tuned code and good
caching mechanisms. While parsing DWARF, \prog{libunwind} is forced to parse
the relevant FDE from its start, until it finds the row it was seeking.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{General statistics}
\todo{}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{DWARF semantics}\label{sec:semantics}

We will now define semantics covering most of the operations used for
CFI\footnote{To be defined elsewhere in the report} described in the DWARF
standard~\cite{dwarf5std}, with the exception of DWARF expressions. These are
not exhaustively treated because they are quite rich and would take a lot of
time and space to formalize, and in the meantime are only seldom used (see the
DWARF statistics regarding this).

These semantics are defined with respect to the well-formalized C language, and
are passing through an intermediary language. The DWARF language can read the
whole memory, as well as registers, and is always executed for some instruction
pointer. The C function representing it will thus take as parameters an array
of the registers' values as well as an IP, and will return another array of
registers values, which will represent the evaluated DWARF row.

\subsection{Original language~: DWARF instructions}

These are the DWARF instructions used for CFI description, that is, the
instructions that contain the stack unwinding table informations. The following
list is an exhaustive list of instructions from the DWARF5
specification~\cite{dwarf5std} concerning CFI, with reworded descriptions for
brevity and clarity. All these instructions are up to variants (most
instructions exist in multiple formats to handle various operands formatting,
to optimize space). Since we won't be talking about the underlying file format
here, those variations between eg. \dwcfa{advance\_loc1} and
\dwcfa{advance\_loc2} ---~which differ only on the number of bytes of their
operand~--- are irrelevant and will be eluded.

\begin{itemize}
    \item{} \dwcfa{set\_loc(loc)}~:
        start a new table row from address $loc$
    \item{} \dwcfa{advance\_loc(delta)}~:
        start a new table row at address $prev\_loc + delta$
    \item{} \dwcfa{def\_cfa(reg, offset)}~:
        sets this row's CFA at $(\reg{reg} + \textit{offset})$
    \item{} \dwcfa{def\_cfa\_register(reg)}~:
        sets CFA at $(\reg{reg} + \textit{prev\_offset})$
    \item{} \dwcfa{def\_cfa\_offset(offset)}~:
        sets CFA at $(\reg{prev\_reg} + \textit{offset})$
    \item{} \dwcfa{def\_cfa\_expression(expr)}~:
        sets CFA as the result of $expr$
    \item{} \dwcfa{undefined(reg)}~:
        sets the register \reg{reg} as undefined in this row
    \item{} \dwcfa{same\_value(reg)}~:
        declares that the register \reg{reg} hasn't been touched, or was
        restored to its previous value, in this row. An unwinding procedure can
        leave it as-is.
    \item{} \dwcfa{offset(reg, offset)}~:
        the value of the register \reg{reg} is stored in memory at the address
        $CFA + \textit{offset}$.
    \item{} \dwcfa{val\_offset(reg, offset)}~:
        the value of the register \reg{reg} is the value $CFA + \textit{offset}$
    \item{} \dwcfa{register(reg, model)}~:
        the register \reg{reg} has, in this row, the value that $\reg{model}$
        had in the previous row
    \item{} \dwcfa{expression(reg, expr)}~:
        the value of \reg{reg} is stored in memory at the address defined by
        $expr$
    \item{} \dwcfa{val\_expression(reg, expr)}~:
        \reg{reg} has the value of $expr$
    \item{} \dwcfa{restore(reg)}~:
        \reg{reg} has the same value as in this FDE's preamble (CIE) in this
        row. This is \emph{not implemented in this semantics} for simplicity
        and brevity (we would have to introduce CIE (preamble) and FDE (body)
        independently). This is also not much used in actual ELF
        files\footnote{TODO: refer to stats}.
    \item{} \dwcfa{remember\_state()}~:
        push the state of all the registers of this row on an implicit stack
    \item{} \dwcfa{restore\_state()}~:
        pop an entry of the implicit stack, and restore all registers in this
        row to the value held in the stack record.
    \item{} \dwcfa{nop()}~:
        do nothing (padding)
\end{itemize}

\subsection{Intermediary language $\intermedlang$}

A first pass will translate DWARF instructions into this intermediary language
$\intermedlang$. It is designed to be more mathematical, representing the same
thing, but abstracting all the data compression of the DWARF format away, so
that we can better reason on it and transform it into C code.

Its grammar is as follows:

\begin{align*}
    \FDE &::= {\left(\mathbb{Z} \times \dwrow \right)}^{\ast}
        & \text{FDE (set of rows)} \\
    \dwrow &::= \values ^ \regs
        & \text{A single table row} \\
    \regs &::= \left\{0, 1, \ldots, \operatorname{NB\_REGS - 1} \right\}
        & \text{Machine registers} \\
    \values &::= \bot & \text{Values: undefined,}\\
        &\quad\vert~\valaddr{\spexpr} & \text{at address $x$},\\
        &\quad\vert~\valval{\spexpr} & \text{of value $x$} \\
    \spexpr &::= \regs \times \mathbb{Z}
        & \text{A ``simple'' expression $\reg{reg} + \textit{offset}$} \\
\end{align*}

The entry point of the grammar is a $\FDE$, which is a set of rows, each
annotated with a machine address, the address from which it is valid. Note that
the addresses are necessarily increasing within a FDE\@.

Each row then represents, as a function mapping registers to values, a row of
the unwinding table.

We implicitly consider that $\reg{reg}$ maps to a number, and we use here
\texttt{x86\_64} names for convenience, but actually in DWARF registers are
only handled as register identifiers, so we can safely state that $\reg{reg}
\in \regs$.

A value can then be undefined, stored at memory address $x$ or be directly a
value $x$, $x$ being here a simple expression consisting of $\reg{reg} +
\textit{offset}$. The CFA is considered a simple register here. For instance, to
define $\reg{rax}$ to the value contained in memory 16 bytes below the CFA, we
would have $\reg{rax} \mapsto \valaddr{\reg{CFA}, -16}$ (for the stack grows
downwards).

\subsection{Target language~: a C function body}

The target language of these semantics is a C function, to be interpreted with
respect to the C11 standard~\cite{c11std}. The function is supposed to be run
in the context of the program being unwound. In particular, it must be able to
dereference some pointer derived from DWARF instructions that will point to the
execution stack, or even the heap.

This function takes as arguments an instruction pointer ---~supposedly
extracted from $\reg{rip}$~--- and an array of register values; and returns a
fresh array of register values after unwinding this call frame. The function is
compositional\footnote{up to technicities: the IP obtained after unwinding the
first frame might be handled in a different dynamically loaded object, and this
would require inspecting the DWARF located in another file}: it can be called
twice in a row to unwind two stack frames.

The function is the following~:

\lstinputlisting[language=C]{src/dw_semantics/c_context.c}

The translation of $\intermedlang$ as produced by the later-defined function
are then to be inserted in this context, where the comment states so.

\subsection{From DWARF to $\intermedlang$}

To define the interpretation of $\DWARF$ to $\intermedlang$, we will need to
proceed forward, but, as the language inherently depends on the previous
instructions to give a meaning to the following ones, we will depend on what
was computed before. At a point of the interpretation $h \vert t$, where $t$ is
what remains to be interpreted, $h$ what has been, and $H$ the result of the
interpretation, it would thus look like $\llbracket t \rrbracket (H)$.

But we also need to keep track of this implicit stack DWARF uses, which will be
kept in subscript.

\medskip

Thus, we define $\semI{\bullet}{s}(\bullet) : \DWARF \times \FDE \to \FDE$, for
$s$ a stack of $\dwrow$, that is,
\[
    s \in \rowstack := \dwrow^\ast
\]

Implicitly, $\semI{\bullet}{} := \semI{\bullet}{\varepsilon}$

\medskip

For convenience, we define $\insarrow{reg}$, the operator changing the value of
a register for a given value in the last row, as

\[
    \left(f \in \FDE\right) \insarrow{$r \in \regs$} (v \in values)
    \quad := \quad
    \left( f\left[0 \ldots |f| - 2\right] \right) \cdot \left\{
        \begin{array}{r l}
            r' \neq r &\mapsto \left(f[-1]\right)(r') \\
            r &\mapsto v \\
        \end{array} \right.
\]

The same way, we define $\extrarrow{reg}$ that \emph{extracts} the rule
currently applied for $\reg{reg}$, eg. $F \extrarrow{CFA} \valval{\reg{reg} +
\text{off}}$. If the rule currently applied in such a case is \emph{not} of the
form $\reg{reg} + \text{off}$, then the program is considered erroneous. One
can see this $\extrarrow{reg}$ somehow as a \lstc{match} statement in OCaml,
but with only one case, allowing to retrieve packed data.

More generally, we define ${\extrarrow{reg}}^{-k}$ as the same operation, but
extracting in the $k$-older row, ie. ${\extrarrow{reg}}^{0}$ is the same as
$\extrarrow{reg}$, and $F {\extrarrow{reg}}^{-1} \bullet$ is the same as
$F\left[0 \ldots |F|-2\right] \extrarrow{reg} \bullet$.

\begin{align*}
    \semI{\varepsilon}{s}(F) &:= F \\
    \semI{\dwcfa{set\_loc(loc)} \cdot d}{s}(F) &:=
        \contsem{F \cdot \left(loc, F[-1].row \right)} \\
    \semI{\dwcfa{adv\_loc(delta)} \cdot d}{s}(F) &:=
        \contsem{F \cdot \left(F[-1].addr + delta, F[-1].row \right)} \\
    \semI{\dwcfa{def\_cfa(reg, offset)} \cdot d}{s}(F) &:=
        \contsem{F \insarrow{CFA} \valval{\reg{reg} + offset}} \\
    \semI{\dwcfa{def\_cfa\_register(reg)} \cdot d}{s}(F) &:=
        \text{let F }\extrarrow{CFA} \valval{\reg{oldreg} + \text{oldoffset}}
        \text{ in} \\
        &\quad \contsem{F \insarrow{CFA} \valval{\reg{reg} + oldoffset}} \\
    \semI{\dwcfa{def\_cfa\_offset(offset)} \cdot d}{s}(F) &:=
        \text{let F }\extrarrow{CFA} \valval{\reg{oldreg} + \text{oldoffset}}
        \text{ in} \\
        &\quad \contsem{F \insarrow{CFA} \valval{\reg{oldreg} + offset}} \\
    \semI{\dwcfa{def\_cfa\_expression(expr)} \cdot d}{s}(F) &:=
        \text{TO BE DEFINED} &\qtodo{CHECK ME?} \\
    \semI{\dwcfa{undefined(reg)} \cdot d}{s}(F) &:=
        \contsem{F \insarrow{reg} \bot} \\
    \semI{\dwcfa{same\_value(reg)} \cdot d}{s}(F) &:=
        \valval{\reg{reg}} \\
    \semI{\dwcfa{offset(reg, offset)} \cdot d}{s}(F) &:=
        \contsem{F \insarrow{reg} \valaddr{\reg{CFA} + \textit{offset}}} \\
    \semI{\dwcfa{val\_offset(reg, offset)} \cdot d}{s}(F) &:=
        \contsem{F \insarrow{reg} \valval{\reg{CFA} + \textit{offset}}} \\
    \semI{\dwcfa{register(reg, model)} \cdot d}{s}(F) &:=
        \text{let } F {\extrarrow{model}}^{-1} r \text{ in }
        \contsem{F \insarrow{reg} r} \\
    \semI{\dwcfa{expression(reg, expr)} \cdot d}{s}(F) &:=
        \text{TO BE DEFINED} &\qtodo{CHECK ME?}\\
    \semI{\dwcfa{val\_expression(reg, expr)} \cdot d}{s}(F) &:=
        \text{TO BE DEFINED} &\qtodo{CHECK ME?}\\
%    \semI{\dwcfa{restore(reg)} \cdot d}{s}(F) &:= \\  %% NOT IMPLEMENTED
    \semI{\dwcfa{remember\_state()} \cdot d}{s}(F) &:=
        \semI{d}{s \cdot F[-1].row}\left(F\right) \\
    \semI{\dwcfa{restore\_state()} \cdot d}{s \cdot t}(F) &:=
        \semI{d}{s}\left(F\left[0 \ldots |F|-2\right] \cdot
        \left(F[-1].addr, t\right) \right) \\
    \semI{\dwcfa{nop()} \cdot d}{s}(F) &:= \contsem{F}\\
\end{align*}

(The stack is used for \texttt{remember\_state} and \texttt{restore\_state}. If
we omit those two operations, we can plainly remove the stack).


\subsection{From $\intermedlang$ to C}

\textit{This only defines the semantics, with respect to standard C, of DWARF
as interpreted by \ehelf\@. The actual DWARF to C compiler is not implemented
this way.}

\medskip

We now define $\semC{\bullet} : \DWARF \to C$, in the context presented
earlier. The translation from $\intermedlang$ to C is defined as follows:

\begin{itemize}
    \item $\semC{\varepsilon} =$ \\
        \begin{lstlisting}[language=C, mathescape=true]
            else {
                for(int reg=0; reg < NB_REGS; ++reg)
                    new_ctx[reg] = $\semR{\bot}$;
            }
        \end{lstlisting}

    \item $\semC{(\text{loc}, \text{row}) \cdot t} = C\_code \cdot \semC{t}$,
        where $C\_code$ is
        \begin{lstlisting}[language=C, mathescape=true]
            if(ip >= $loc$) {
                for(int reg=0; reg < NB_REGS; ++reg)
                    new_ctx[reg] = $\semR{row[reg]}$;
                goto end_ifs; // Avoid if/else if problems
            }
        \end{lstlisting}
\end{itemize}

and $\semR{\bullet}$ is defined as
\begin{align*}
    \semR{\bot} &= \text{\lstc{ERROR_VALUE}} \\
    \semR{\valaddr{\text{reg}, \textit{offset}}} &=
        \text{\lstc{*(old_ctx[reg] + offset)}} \\
    \semR{\valval{\text{reg}, \textit{offset}}} &=
        \text{\lstc{(old_ctx[reg] + offset)}} \\
\end{align*}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Stack unwinding data compilation}

The tentative approach that was chosen to try to get better unwinding speeds at
a reasonable space loss was to compile directly the \ehframe{} into native
machine code on the x86\_64 platform.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Compilation: \ehelfs}

The rough idea of the compilation is to produce, out of the \ehframe{} section
of a binary, C code that resembles the code shown in the DWARF semantics from
Section~\ref{sec:semantics} above. This C code is then compiled by GCC,
providing for free all the optimisation passes of a modern compiler. This code
is compiled as a shared library, containing a single function, taking as
argument an instruction pointer and a memory context (\ie{} the value of the
various machine registers). An optionally enabled parameter can be used to pass
a function pointer to a dereferencing function, that conceptually does what the
dereferencing \lstc{*} operator on a pointer, and is used to unwind a process
that is not the currently running process, and thus not sharing the same
address space. A call to this function returns a fresh memory context,
containing the values the registers hold after unwinding this frame.

This generated data is stored in separate shared object files, which we call
\ehelfs. It would have been possible to alter the original ELF file to embed
this data as a new section, but it getting it to be executed just as any
portion of the \lstc{.text} section would probably have been painful, and
keeping it separated during the experimental phase is quite convenient. It is
possible to have multiple versions of \ehelfs{} files in parallel, with various
options turned on or off, and it doesn't require to alter the base system by
editing \eg{} \texttt{/usr/lib/libc-*.so}. Instead, when the \ehelf{} data is
required, those files can simply be \lstc{dlopen}'d.

\todo{More details here? Is it necessary or just too technical?}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{First results}

Without any particular care to efficiency or compactness, it is already
possible to produce a compiled version very close to the one described in
Section~\ref{sec:semantics}. Although the unwinding speed cannot yet be
actually benchmarked, it is already possible to write in a few hundreds of line
of C a simple stack walker printing the functions traversed. It already works
without any problem on the easily tested cases, since corner cases are mostly
found in standard and highly optimal libraries, and it is not that easy to get
the program to stop and print a stack trace from within a system library
without using a debugger.

The major drawback of this approach, without any particular care taken, is the
space waste.

\begin{table}[h]
    \begin{tabular}{r r r r r r}
        \toprule
        \thead{Shared object} & \thead{Original \\ program size}
            & \thead{Original \\ \lstc{.eh\_frame}}
            & \thead{Generated \\ \ehelf{} \lstc{.text}}
            & \thead{\% of original \\ program size}
            & \thead{Growth \\ factor} \\
        \midrule
        libc-2.27.so & 1.4 MiB & 130.1 KiB & 914.9 KiB & 63.92 & 7.03 \\
        libpthread-2.27.so & 58.1 KiB & 11.6 KiB & 70.5 KiB & 121.48 & 6.09 \\
        ld-2.27.so & 129.6 KiB & 9.6 KiB & 71.7 KiB & 55.34 & 7.44 \\
        hackbench & 2.9 KiB & 568.0 B & 2.1 KiB & 74.78 & 3.97 \\
        Total & 1.6 MiB & 151.8 KiB & 1.0 MiB & 65.32 & 6.98 \\
        \bottomrule
    \end{tabular}

    \caption{Basic \ehelfs{} space usage}\label{table:basic_eh_elf_space}
\end{table}

The space taken by those tentative \ehelfs{} is analyzed in
Table~\ref{table:basic_eh_elf_space} for \prog{hackbench}, a small program
introduced later in Section~\qtodo{Add a reference}, and the libraries on which
it depends.

The first column only includes the sizes of the ELF sections \lstc{.text} (the
program itself) and \lstc{.rodata}, the read-only data (such as static strings,
etc.). Only the weight of the \lstc{.text} section of the generated \ehelfs{}
is considered, because it is self-consistent (few data or none is stored in
\lstc{.rodata}), and the other sections could be removed if the \ehelfs{}
\lstc{.text} was somehow embedded in the original shared object.

This first tentative version of \ehelfs{} is roughly 7 times heavier than the
original \lstc{.eh_frame}, and represents a far too significant proportion of
the original program size.

\todo{more in-depth analysis?}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Space optimization}
\todo{}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Benchmarking}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Requirements}
\todo{}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Presentation of \prog{perf}}
\todo{}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Benchmarking with \prog{perf}}
\todo{}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Other explored methods}
\todo{}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Results}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Measured time performance}
\todo{}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Measured compactness}
\todo{}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Instructions coverage}
\todo{}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% Bibliography %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\printbibliography{}

\end{document}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%