2018-08-19 13:26:28 +02:00
|
|
|
\title{Speeding up stack unwinding by compiling DWARF debugging data}
|
2018-07-31 12:38:24 +02:00
|
|
|
|
2018-07-31 19:39:03 +02:00
|
|
|
\author{Théophile Bastian\\
|
2018-08-19 13:13:07 +02:00
|
|
|
Under supervision of Francesco Zappa Nardelli, March -- August 2018\\
|
2018-08-01 18:43:42 +02:00
|
|
|
{\textsc{parkas}, \'Ecole Normale Supérieure de Paris}}
|
2018-07-31 12:38:24 +02:00
|
|
|
|
2018-08-19 13:13:07 +02:00
|
|
|
%\date{March -- August 2018\\August 20, 2018}
|
|
|
|
\date{\vspace{-2em}}
|
2018-07-31 12:27:12 +02:00
|
|
|
|
|
|
|
\documentclass[11pt]{article}
|
|
|
|
|
|
|
|
\usepackage[left=2cm,right=2cm,top=2cm,bottom=2cm]{geometry}
|
|
|
|
\usepackage{amsmath}
|
|
|
|
\usepackage{amssymb}
|
|
|
|
\usepackage{stmaryrd}
|
|
|
|
\usepackage{mathtools}
|
2018-07-31 19:39:03 +02:00
|
|
|
\usepackage{indentfirst}
|
2018-07-31 12:27:12 +02:00
|
|
|
\usepackage[utf8]{inputenc}
|
2018-08-19 13:26:28 +02:00
|
|
|
\usepackage[T1]{fontenc}
|
2018-08-03 01:04:38 +02:00
|
|
|
\usepackage{makecell}
|
|
|
|
\usepackage{booktabs}
|
2018-08-04 01:28:14 +02:00
|
|
|
\usepackage{wrapfig}
|
2018-08-04 20:58:11 +02:00
|
|
|
\usepackage{pgfplots}
|
2018-07-31 19:39:03 +02:00
|
|
|
%\usepackage[backend=biber,style=alphabetic]{biblatex}
|
|
|
|
\usepackage[backend=biber]{biblatex}
|
2018-07-31 12:27:12 +02:00
|
|
|
|
|
|
|
\usepackage{../shared/my_listings}
|
|
|
|
\usepackage{../shared/my_hyperref}
|
|
|
|
\usepackage{../shared/specific}
|
|
|
|
\usepackage{../shared/common}
|
|
|
|
\usepackage{../shared/todo}
|
|
|
|
|
2018-07-31 19:39:03 +02:00
|
|
|
\addbibresource{../shared/report.bib}
|
2018-07-31 12:27:12 +02:00
|
|
|
|
2018-08-03 01:04:38 +02:00
|
|
|
\renewcommand\theadalign{c}
|
|
|
|
\renewcommand\theadfont{\bfseries}
|
|
|
|
%\renewcommand\theadgape{\Gape[4pt]}
|
|
|
|
%\renewcommand\cellgape{\Gape[4pt]}
|
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-07-31 12:27:12 +02:00
|
|
|
\begin{document}
|
2018-08-01 18:43:42 +02:00
|
|
|
|
|
|
|
%% Main title %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-07-31 12:27:12 +02:00
|
|
|
\maketitle
|
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
%% Fiche de synthèse %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-07-31 12:38:24 +02:00
|
|
|
\input{fiche_synthese}
|
2018-07-31 12:27:12 +02:00
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
%% Table of contents %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\tableofcontents
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
%%%% Main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
2018-08-08 15:00:27 +02:00
|
|
|
\subsection*{Source code}\label{ssec:source_code}
|
|
|
|
|
2018-08-19 13:13:07 +02:00
|
|
|
Our implementation is available from \url{https://git.tobast.fr/m2-internship}.
|
|
|
|
See the \texttt{abstract} repository for an introductive \texttt{README}.
|
2018-08-08 15:00:27 +02:00
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\section{Stack unwinding data presentation}
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-08-04 01:28:14 +02:00
|
|
|
\subsection{Stack frames and x86\_64 calling conventions}
|
2018-08-01 19:55:13 +02:00
|
|
|
|
2018-08-18 21:12:05 +02:00
|
|
|
On every common platform, programs make use of a \emph{call stack} to store
|
2018-08-01 19:55:13 +02:00
|
|
|
information about the nested function calls at the current execution point, and
|
2018-08-07 20:44:12 +02:00
|
|
|
keep track of their nesting. This call stack is conventionally a contiguous
|
|
|
|
memory space mapped close to the top of the addressing space. Each function
|
|
|
|
call has its own \emph{stack frame}, an entry of the call stack, whose precise
|
|
|
|
contents are often specified in the Application Binary Interface (ABI) of the
|
|
|
|
platform, and left to various extents up to the compiler. Those frames are
|
|
|
|
typically used for storing function arguments, machine registers that must be
|
|
|
|
restored before returning, the function's return address and local variables.
|
2018-08-01 19:55:13 +02:00
|
|
|
|
2018-08-04 01:28:14 +02:00
|
|
|
On the x86\_64 platform, with which this report is mostly concerned, the
|
|
|
|
calling convention that is followed is defined in the System V
|
2018-08-18 21:12:05 +02:00
|
|
|
ABI~\cite{systemVabi} for the Unix-like operating systems --~among which Linux
|
|
|
|
and MacOS\@. Under this calling convention, the first six arguments of a
|
|
|
|
function are passed in the registers \reg{rdi}, \reg{rsi}, \reg{rdx},
|
|
|
|
\reg{rcx}, \reg{r8}, \reg{r9}, while additional arguments are pushed onto the
|
|
|
|
stack. It also defines which registers may be overwritten by the callee, and
|
|
|
|
which registers must be restored before returning. This restoration, for most
|
|
|
|
compilers, is done by pushing the register value onto the stack in the function
|
|
|
|
prelude, and restoring it just before returning. Those preserved registers are
|
|
|
|
\reg{rbx}, \reg{rsp}, \reg{rbp}, \reg{r12}, \reg{r13}, \reg{r14}, \reg{r15}.
|
2018-08-04 01:28:14 +02:00
|
|
|
|
|
|
|
\begin{wrapfigure}{r}{0.4\textwidth}
|
|
|
|
\centering
|
|
|
|
\includegraphics[width=0.9\linewidth]{imgs/call_stack/call_stack.png}
|
|
|
|
\caption{Program stack with x86\_64 calling
|
|
|
|
conventions}\label{fig:call_stack}
|
|
|
|
\end{wrapfigure}
|
|
|
|
|
2018-08-08 14:01:55 +02:00
|
|
|
The register \reg{rsp} is supposed to always point to the last used memory cell
|
2018-08-16 00:26:59 +02:00
|
|
|
in the stack. Thus, when the process just enters a new function, \reg{rsp}
|
|
|
|
points right to the location of the return address. Then, the compiler might
|
2018-08-08 14:01:55 +02:00
|
|
|
use \reg{rbp} (``base pointer'') to save this value of \reg{rip}, by writing
|
|
|
|
the old value of \reg{rbp} just below the return address on the stack, then
|
|
|
|
copying \reg{rsp} to \reg{rbp}. This makes it easy to find the return address
|
|
|
|
from anywhere within the function, and also allows for easy addressing of local
|
2018-08-18 21:12:05 +02:00
|
|
|
variables. To some extents, it also allows for hot debugging, such as saving a
|
|
|
|
useful core dump upon segfault. Yet, using \reg{rbp} to save \reg{rip} is not
|
2018-08-19 13:13:07 +02:00
|
|
|
always done, since it wastes a register. This decision is, on x86\_64 System V,
|
|
|
|
up to the compiler.
|
2018-08-04 01:28:14 +02:00
|
|
|
|
2018-08-19 13:13:07 +02:00
|
|
|
Usually, a function starts by subtracting some value to \reg{rsp}, allocating
|
|
|
|
some space in the stack frame for its local variables. Then, it pushes on
|
2018-08-04 01:28:14 +02:00
|
|
|
the stack the values of the callee-saved registers that are overwritten later,
|
2018-08-19 13:13:07 +02:00
|
|
|
effectively saving them. Before returning, it pops the values of the saved
|
2018-08-07 20:44:12 +02:00
|
|
|
registers back to their original registers and restore \reg{rsp} to its former
|
|
|
|
value.
|
2018-08-04 01:28:14 +02:00
|
|
|
|
2018-08-18 21:12:05 +02:00
|
|
|
\subsection{Stack unwinding}\label{ssec:stack_unwinding}
|
2018-08-04 01:28:14 +02:00
|
|
|
|
2018-08-19 13:13:07 +02:00
|
|
|
For various reasons, it is interesting, at some point of the execution of a
|
|
|
|
program, to glance at its program stack and be able to extract informations
|
|
|
|
from it. For instance, when running a debugger, a frequent usage is to obtain a
|
|
|
|
\emph{backtrace}, that is, the list of all nested function calls at the current
|
|
|
|
IP\@. This actually observes the stack to find the different stack frames, and
|
|
|
|
decode them to identify the function names, parameter values, etc.
|
2018-08-01 19:55:13 +02:00
|
|
|
|
|
|
|
This operation is far from trivial. Often, a stack frame will only make sense
|
2018-08-18 21:12:05 +02:00
|
|
|
when the correct values are stored in the machine registers. These values,
|
|
|
|
however, are to be restored from the previous stack frame, where they are
|
|
|
|
stored. This imposes to \emph{walk} the stack, reading the entries one after
|
2018-08-01 19:55:13 +02:00
|
|
|
the other, instead of peeking at some frame directly. Moreover, the size of one
|
|
|
|
stack frame is often not that easy to determine when looking at some
|
|
|
|
instruction other than \texttt{return}, making it hard to extract single frames
|
|
|
|
from the whole stack.
|
|
|
|
|
|
|
|
Interpreting a frame in order to get the machine state \emph{before} this
|
|
|
|
frame, and thus be able to decode the next frame recursively, is called
|
2018-08-04 02:28:39 +02:00
|
|
|
\emph{unwinding} a frame.
|
|
|
|
|
|
|
|
Let us consider a stack with x86\_64 calling conventions, such as shown in
|
|
|
|
Figure~\ref{fig:call_stack}. Assuming the compiler decided here \emph{not} to
|
2018-08-18 21:12:05 +02:00
|
|
|
use \reg{rbp}, and assuming the function allocates \eg{} a buffer of 8
|
2018-08-04 02:28:39 +02:00
|
|
|
integers, the area allocated for local variables should be at least $32$ bytes
|
2018-08-07 20:44:12 +02:00
|
|
|
long (for 4-bytes integers), and \reg{rsp} will be pointing below this area.
|
2018-08-04 02:28:39 +02:00
|
|
|
Left apart analyzing the assembly code produced, there is no way to find where
|
|
|
|
the return address is stored, relatively to \reg{rsp}, at some arbitrary point
|
|
|
|
of the function. Even when \reg{rbp} is used, there is no easy way to guess
|
2018-08-18 21:12:05 +02:00
|
|
|
where each callee-saved register is stored in the stack frame, since the
|
|
|
|
compiler is free to do as it wishes. Even worse, it is not trivial to know
|
|
|
|
callee-saved registers were at all, since if the function does not alter a
|
|
|
|
register, it does not have to save it.
|
2018-08-04 02:28:39 +02:00
|
|
|
|
2018-08-18 21:12:05 +02:00
|
|
|
With this example, it seems pretty clear tha some additional data is necessary
|
|
|
|
to perform stack unwinding reliably, without only performing a guesswork. This
|
|
|
|
data is stored along with the debugging informations of a program, and one
|
|
|
|
common format of debugging data is DWARF\@.
|
2018-08-01 18:43:42 +02:00
|
|
|
|
2018-08-02 13:28:36 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\subsection{Unwinding usage and frequency}
|
|
|
|
|
|
|
|
Stack unwinding is a more common operation that one might think at first. The
|
2018-08-18 21:12:05 +02:00
|
|
|
use case mostly thought of is simply to get a stack trace of a program, and
|
|
|
|
provide a debugger with the information it needs. For instance, when inspecting
|
|
|
|
a stack trace in \prog{gdb}, a common operation is to jump to a previous frame:
|
2018-08-02 13:28:36 +02:00
|
|
|
|
|
|
|
\lstinputlisting{src/segfault/gdb_session}
|
|
|
|
|
|
|
|
To be able to do this, \texttt{gdb} must be able to restore \lstc{fct_a}'s
|
|
|
|
context, by unwinding \lstc{fct_b}'s frame.
|
|
|
|
|
|
|
|
\medskip
|
|
|
|
|
2018-08-16 00:26:59 +02:00
|
|
|
Yet, stack unwinding, and thus, debugging data, \emph{is not limited to
|
2018-08-02 13:28:36 +02:00
|
|
|
debugging}.
|
|
|
|
|
|
|
|
Another common usage is profiling. A profiling tool, such as \prog{perf} under
|
2018-08-16 00:26:59 +02:00
|
|
|
Linux --~see Section~\ref{ssec:perf} --, is used to measure and analyze in
|
2018-08-07 20:44:12 +02:00
|
|
|
which functions a program spends its time, identify bottlenecks and find out
|
|
|
|
which parts are critical to optimize. To do so, modern profilers pause the
|
|
|
|
traced program at regular, short intervals, inspect their stack, and determine
|
2018-08-18 21:12:05 +02:00
|
|
|
which function is currently being run. They also perform a stack unwinding to
|
|
|
|
figure out the call path to this function, in order to determine which function
|
|
|
|
indirectly takes time: for instance, a function \lstc{fct_a} can call both
|
|
|
|
\lstc{fct_b} and \lstc{fct_c}, which both take a lot of time; spend practically
|
|
|
|
no time directly in \lstc{fct_a}, but spend a lot of time in calls to the other
|
|
|
|
two functions that were made from \lstc{fct_a}. Knowing that after all,
|
|
|
|
\lstc{fct_a} is the culprit can be useful to a programmer.
|
2018-08-02 13:28:36 +02:00
|
|
|
|
2018-08-19 13:13:07 +02:00
|
|
|
Exception handling also requires a stack unwinding mechanism in some languages.
|
2018-08-18 21:12:05 +02:00
|
|
|
Indeed, an exception is completely different from a \lstinline{return}: while
|
|
|
|
the latter returns to the previous function, at a well-defined IP, the former
|
|
|
|
can be caught by virtually any function in the call path, at any point of the
|
|
|
|
function. It is thus necessary to be able to unwind frames, one by one, until a
|
|
|
|
suitable \lstc{catch} block is found. The C++ language, for one, includes a
|
2018-08-02 13:28:36 +02:00
|
|
|
stack-unwinding library similar to \prog{libunwind} in its runtime.
|
|
|
|
|
2018-08-04 13:33:00 +02:00
|
|
|
Technically, exception handling could be implemented without any stack
|
2018-08-18 21:12:05 +02:00
|
|
|
unwinding, by using \lstc{setjmp} and \lstc{longjmp}
|
|
|
|
mechanics~\cite{niditoexn}. However, it is not possible to implement this
|
|
|
|
straight away in C++ (among others), because the stack needs to be
|
|
|
|
properly unwound in order to trigger the destructors of stack-allocated
|
|
|
|
objects. Furthermore, this is often undesirable: \lstc{setjmp} introduces an
|
|
|
|
overhead, which is hit whenever a \lstc{try} block is encountered. Instead, it
|
|
|
|
is often preferred to have strictly no overhead when no exception happens, at
|
|
|
|
the cost of a greater overhead when an exception is actually fired --~after
|
|
|
|
all, they are supposed to be \emph{exceptional}. For more details on C++
|
|
|
|
exception handling, see~\cite{koening1990exception} (especially Section~16.5).
|
|
|
|
Possible implementation mechanisms are also presented
|
|
|
|
in~\cite{dinechin2000exn}.
|
2018-08-04 13:33:00 +02:00
|
|
|
|
2018-08-02 13:28:36 +02:00
|
|
|
In both of these two previous cases, performance \emph{can} be a problem. In
|
|
|
|
the latter, a slow unwinding directly impacts the overall program performance,
|
|
|
|
particularly if a lot of exceptions are thrown and caught far away in their
|
2018-08-18 21:12:05 +02:00
|
|
|
call path. As for the former, profiling \emph{is} performance-heavy and slow:
|
|
|
|
for a session analyzing the \prog{tor-browser} for two and a half minutes,
|
|
|
|
\prog{perf} spends $100\,\mu \text{s}$ analyzing each of the $325679$ samples,
|
|
|
|
that is, $300\,\text{ms}$ per second of program run with default settings.
|
2018-08-02 13:28:36 +02:00
|
|
|
|
2018-08-04 20:58:11 +02:00
|
|
|
One of the causes that inspired this internship were also Stephen Kell's
|
|
|
|
\prog{libcrunch}~\cite{kell2016libcrunch}, which makes a heavy use of stack
|
|
|
|
unwinding through \prog{libunwind} and was forced to force \prog{gcc} to use a
|
|
|
|
frame pointer (\reg{rbp}) everywhere through \lstbash{-fno-omit-frame-pointer}
|
|
|
|
in order to mitigate the slowness.
|
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\subsection{DWARF format}
|
2018-08-01 19:55:13 +02:00
|
|
|
|
2018-08-17 18:07:17 +02:00
|
|
|
The DWARF format was first standardized as the format for debugging information
|
|
|
|
of the ELF executable binaries, which are standard on UNIX-like systems,
|
|
|
|
including Linux and MacOS --~but not Windows. It is now commonly used across a
|
2018-08-01 19:55:13 +02:00
|
|
|
wide variety of binary formats to store debugging information. As of now, the
|
|
|
|
latest DWARF standard is DWARF 5~\cite{dwarf5std}, which is openly accessible.
|
|
|
|
|
|
|
|
The DWARF data commonly includes type information about the variables in the
|
|
|
|
original programming language, correspondence of assembly instructions with a
|
|
|
|
line in the original source file, \ldots
|
|
|
|
The format also specifies a way to represent unwinding data, as described in
|
2018-08-18 21:12:05 +02:00
|
|
|
Section~\ref{ssec:stack_unwinding} above, in an ELF section originally called
|
|
|
|
\lstc{.debug_frame}, but most often found as \ehframe.
|
2018-08-02 02:20:01 +02:00
|
|
|
|
2018-08-18 21:12:05 +02:00
|
|
|
For any binary, debugging information can easily take up space and grow bigger
|
|
|
|
than the program itself if no attention is paid at keeping it as compact as
|
|
|
|
possible when designing the file format. On this matter, DWARF does an
|
|
|
|
excellent job, and everything is stored in a very compact way. This, however,
|
|
|
|
as we will see, makes it both difficult to parse correctly and relatively slow
|
|
|
|
to interpret.
|
2018-08-01 18:43:42 +02:00
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-08-01 19:55:13 +02:00
|
|
|
\subsection{DWARF unwinding data}
|
2018-08-02 02:20:01 +02:00
|
|
|
|
|
|
|
The unwinding data, which we will call from now on the \ehframe, contains, for
|
2018-08-18 21:12:05 +02:00
|
|
|
each possible IP, a set of ``registers'' that can be unwound, and a rule
|
|
|
|
describing how to do so.
|
2018-08-02 02:20:01 +02:00
|
|
|
|
|
|
|
The DWARF language is completely agnostic of the platform and ABI, and in
|
|
|
|
particular, is completely agnostic of a particular platform's registers. Thus,
|
2018-08-18 21:12:05 +02:00
|
|
|
as far as DWARF is concerned, a register is merely a numerical identifier that
|
|
|
|
is often, but not necessarily, mapped to a real machine register by the ABI\@.
|
2018-08-02 02:20:01 +02:00
|
|
|
|
|
|
|
In practice, this data takes the form of a collection of tables, one table per
|
2018-08-08 14:01:55 +02:00
|
|
|
Frame Description Entry (FDE). A FDE, in turn, is a DWARF entry describing such
|
|
|
|
a table, that has a range of IPs on which it has authority. Most often, but not
|
|
|
|
necessarily, it corresponds to a single function in the original source code.
|
2018-08-18 21:12:05 +02:00
|
|
|
Each column of the table is a register (\eg{} \reg{rsp}), along with two
|
|
|
|
additional special registers, CFA (Canonical Frame Address) and RA (Return
|
|
|
|
Address), containing respectively the base pointer of the current stack frame
|
|
|
|
and the return address of the current function. For instance, on a x86\_64
|
2018-08-16 00:26:59 +02:00
|
|
|
architecture, RA would contain the unwound value of \reg{rip}, the instruction
|
|
|
|
pointer. Each row has a certain validity interval, on which it describes
|
|
|
|
accurate unwinding data. This range starts at the instruction pointer it is
|
2018-08-18 21:12:05 +02:00
|
|
|
associated with, and ends at the start IP of the next table row --~or the end
|
|
|
|
IP of the current FDE if it was the last row. In particular, there can be no
|
|
|
|
``IP hole'' within a FDE --~unlike FDEs themselves, which can leave holes
|
|
|
|
between them.
|
2018-08-02 02:20:01 +02:00
|
|
|
|
2018-08-08 14:01:55 +02:00
|
|
|
\begin{figure}[h]
|
|
|
|
\begin{minipage}{0.45\textwidth}
|
|
|
|
\lstinputlisting[language=C, firstline=3, lastline=12,
|
|
|
|
caption={Original C},label={lst:ex1_c}]
|
|
|
|
{src/fib7/fib7.c}
|
|
|
|
\end{minipage} \hfill \begin{minipage}{0.45\textwidth}
|
2018-08-17 18:15:43 +02:00
|
|
|
\lstinputlisting[language={[x86masm]Assembler},
|
|
|
|
caption={Generated assembly},label={lst:ex1_asm}]
|
|
|
|
{src/fib7/fib7.s}
|
2018-08-08 14:01:55 +02:00
|
|
|
\end{minipage}
|
|
|
|
\end{figure}
|
2018-08-03 13:59:11 +02:00
|
|
|
|
2018-08-08 14:01:55 +02:00
|
|
|
\begin{figure}[h]
|
|
|
|
\begin{minipage}{0.45\textwidth}
|
2018-08-17 18:15:43 +02:00
|
|
|
\lstinputlisting[language=C,caption={Raw DWARF},label={lst:ex1_dwraw}]
|
|
|
|
{src/fib7/fib7.raw_fde}
|
2018-08-08 14:01:55 +02:00
|
|
|
\end{minipage} \hfill \begin{minipage}{0.45\textwidth}
|
2018-08-17 18:15:43 +02:00
|
|
|
\lstinputlisting[language=C,caption={Processed DWARF},
|
|
|
|
label={lst:ex1_dw}]
|
|
|
|
{src/fib7/fib7.fde}
|
2018-08-08 14:01:55 +02:00
|
|
|
\end{minipage}
|
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
\begin{table}[h]
|
|
|
|
\centering
|
|
|
|
\begin{tabular}{|c|c|c|c|c|c}
|
|
|
|
\stackfhead{+ \mhex{30}}
|
|
|
|
& \stackfhead{+ \mhex{28}}
|
|
|
|
& \stackfhead{+ \mhex{20}}
|
|
|
|
& \stackfhead{+ \mhex{1c}}
|
|
|
|
& \stackfhead{+ \mhex{4}}
|
|
|
|
& \stackfhead{}
|
|
|
|
\\
|
|
|
|
\hline{}
|
|
|
|
Return Address & \textit{Alignment space}
|
|
|
|
& \spaced{2ex}{\lstc{fibo[7]}}
|
|
|
|
& \spaced{4ex}{\ldots}
|
|
|
|
& \spaced{2ex}{\lstc{fibo[0]}}
|
|
|
|
& \textit{Next frame}
|
|
|
|
\\
|
|
|
|
\hline
|
|
|
|
\end{tabular}
|
2018-08-19 13:13:07 +02:00
|
|
|
\caption{Stack frame schema for fib7 (horizontal layout)}\label{table:ex1_stack_schema}
|
2018-08-08 14:01:55 +02:00
|
|
|
\end{table}
|
2018-08-02 02:20:01 +02:00
|
|
|
|
2018-08-18 21:12:05 +02:00
|
|
|
For instance, the C source code in Listing~\ref{lst:ex1_c}, when compiled
|
2018-08-03 13:59:11 +02:00
|
|
|
with \lstbash{gcc -O1 -fomit-frame-pointer -fno-stack-protector}, yields the
|
2018-08-08 14:01:55 +02:00
|
|
|
assembly code in Listing~\ref{lst:ex1_asm}. The memory layout of the stack
|
|
|
|
frame is presented in Table~\ref{table:ex1_stack_schema}, to help understanding
|
|
|
|
how the stack frame is constructed. When interpreting the generated \ehframe{}
|
|
|
|
with \lstbash{readelf -wF}, we obtain the (slightly edited)
|
2018-08-03 13:59:11 +02:00
|
|
|
Listing~\ref{lst:ex1_dw}. During the function prelude, \ie{} for $\mhex{615}
|
|
|
|
\leq \reg{rip} < \mhex{619}$, the stack frame only contains the return address,
|
2018-08-16 00:26:59 +02:00
|
|
|
thus the CFA is 8 bytes above \reg{rsp}, and the return address is precisely at
|
|
|
|
\reg{rsp} --~that is, stored between \reg{rsp} and $\reg{rsp} + 8$. Then, the
|
|
|
|
contents of \lstc{fibo}, 8 integers of 4 bytes each, are allocated on the
|
|
|
|
stack, which puts the CFA 32 bytes above \reg{rsp}; the return address still
|
|
|
|
being 8 bytes below the CFA\@. The variable \lstc{pos} is optimized out in the
|
|
|
|
generated assembly code, thus no stack space is allocated for it. Yet,
|
|
|
|
\prog{gcc} decided to allocate a total space of 48 bytes for the stack frame
|
|
|
|
for memory alignment reasons, which means subtracting 40 bytes to \reg{rsp}
|
|
|
|
(address $\mhex{615}$ in the assembly). Then, by the end of the function, the
|
|
|
|
local variables are discarded and \reg{rsp} is reset to its value from the
|
|
|
|
first row.
|
2018-08-03 13:59:11 +02:00
|
|
|
|
|
|
|
However, DWARF data isn't actually stored as a table in the binary files, but
|
|
|
|
is instead stored as in Listing~\ref{lst:ex1_dwraw}. The first row has the
|
|
|
|
location of the first IP in the FDE, and must define at least its CFA\@. Then,
|
|
|
|
when all relevant registers are defined, it is possible to define a new row by
|
|
|
|
providing a location offset (\eg{} here $4$), and the new row is defined as a
|
|
|
|
clone of the previous one, which can then be altered (\eg{} here by setting
|
2018-08-08 14:01:55 +02:00
|
|
|
\lstc{CFA} to $\reg{rsp} + 48$). This means that every line is defined \wrt{}
|
2018-08-03 13:59:11 +02:00
|
|
|
the previous one, and that the IPs of the successive rows cannot be determined
|
2018-08-08 14:01:55 +02:00
|
|
|
without evaluating every row that comes before in the first place. Thus,
|
|
|
|
unwinding a frame from an IP close to the end of the frame will require
|
|
|
|
evaluating pretty much every DWARF row in the table before reaching the
|
|
|
|
relevant information, slowing down drastically the unwinding process.
|
2018-08-01 18:43:42 +02:00
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\subsection{How big are FDEs?}
|
2018-08-04 20:58:11 +02:00
|
|
|
|
|
|
|
\begin{figure}[h]
|
|
|
|
\centering
|
|
|
|
\begin{tikzpicture}
|
|
|
|
\begin{axis}[
|
|
|
|
width=0.9\linewidth, height=4cm,
|
|
|
|
grid=major,
|
|
|
|
grid style={dashed,gray!30},
|
|
|
|
xlabel=FDE row count,
|
|
|
|
ylabel=Proportion,
|
|
|
|
%legend style={at={(0.5,-0.2)},anchor=north},
|
|
|
|
xtick distance=5,
|
|
|
|
ybar, %added here
|
|
|
|
]
|
|
|
|
\addplot[blue,fill] table[x=lines,y=proportion, col sep=comma]
|
|
|
|
{data/fde_line_count.csv};
|
|
|
|
|
|
|
|
\end{axis}
|
|
|
|
\end{tikzpicture}
|
|
|
|
\caption{FDE line count density}\label{fig:fde_line_density}
|
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
Since evaluating an \lstc{.eh_frame} FDE entry is, as seen in the previous
|
|
|
|
section, roughly linear in time in its rows number, we must wonder what is the
|
|
|
|
distribution of FDE rows count. The histogram in
|
|
|
|
Figure~\ref{fig:fde_line_density} was generated on a random sample of around
|
|
|
|
2000 ELF files present on an ArchLinux system.
|
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\subsection{Unwinding state-of-the-art}
|
2018-08-02 13:47:50 +02:00
|
|
|
|
|
|
|
The most commonly used library to perform stack unwinding, in the Linux
|
|
|
|
ecosystem, is \prog{libunwind}~\cite{libunwind}. While it is very robust and
|
2018-08-18 21:12:05 +02:00
|
|
|
decently efficient, most of its optimization comes from fine-tuned code and
|
|
|
|
good caching mechanisms. When parsing DWARF, \prog{libunwind} is forced to
|
|
|
|
parse the relevant FDE from its start, until it finds the row it was seeking.
|
2018-08-01 18:43:42 +02:00
|
|
|
|
|
|
|
|
2018-08-02 14:08:14 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-08-03 01:04:38 +02:00
|
|
|
\section{DWARF semantics}\label{sec:semantics}
|
2018-08-02 14:08:14 +02:00
|
|
|
|
2018-08-07 11:09:33 +02:00
|
|
|
We will now define semantics covering most of the operations used for FDEs
|
|
|
|
described in the DWARF standard~\cite{dwarf5std}, such as seen in
|
|
|
|
Listing~\ref{lst:ex1_dwraw}, with the exception of DWARF expressions. These are
|
2018-08-18 21:12:05 +02:00
|
|
|
not exhaustively treated because they form a rich language and would take a lot
|
|
|
|
of time and space to formalize, and in the meantime are only seldom used (see
|
|
|
|
the DWARF statistics regarding this).
|
2018-08-02 14:08:14 +02:00
|
|
|
|
|
|
|
These semantics are defined with respect to the well-formalized C language, and
|
|
|
|
are passing through an intermediary language. The DWARF language can read the
|
|
|
|
whole memory, as well as registers, and is always executed for some instruction
|
|
|
|
pointer. The C function representing it will thus take as parameters an array
|
|
|
|
of the registers' values as well as an IP, and will return another array of
|
|
|
|
registers values, which will represent the evaluated DWARF row.
|
|
|
|
|
2018-08-17 18:15:43 +02:00
|
|
|
\subsection{Original language: DWARF instructions}
|
2018-08-02 14:08:14 +02:00
|
|
|
|
|
|
|
These are the DWARF instructions used for CFI description, that is, the
|
|
|
|
instructions that contain the stack unwinding table informations. The following
|
|
|
|
list is an exhaustive list of instructions from the DWARF5
|
|
|
|
specification~\cite{dwarf5std} concerning CFI, with reworded descriptions for
|
2018-08-16 00:26:59 +02:00
|
|
|
brevity and clarity. All these instructions are up to variants --~most
|
2018-08-02 14:08:14 +02:00
|
|
|
instructions exist in multiple formats to handle various operands formatting,
|
2018-08-16 00:26:59 +02:00
|
|
|
to optimize space. Since we won't be talking about the underlying file format
|
2018-08-02 14:08:14 +02:00
|
|
|
here, those variations between eg. \dwcfa{advance\_loc1} and
|
2018-08-08 14:01:55 +02:00
|
|
|
\dwcfa{advance\_loc2} --~which differ only on the number of bytes of their
|
|
|
|
operand~-- are irrelevant and will be eluded.
|
2018-08-02 14:08:14 +02:00
|
|
|
|
|
|
|
\begin{itemize}
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{set\_loc(loc)}:
|
2018-08-02 14:08:14 +02:00
|
|
|
start a new table row from address $loc$
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{advance\_loc(delta)}:
|
2018-08-02 14:08:14 +02:00
|
|
|
start a new table row at address $prev\_loc + delta$
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{def\_cfa(reg, offset)}:
|
2018-08-02 14:08:14 +02:00
|
|
|
sets this row's CFA at $(\reg{reg} + \textit{offset})$
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{def\_cfa\_register(reg)}:
|
2018-08-02 14:08:14 +02:00
|
|
|
sets CFA at $(\reg{reg} + \textit{prev\_offset})$
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{def\_cfa\_offset(offset)}:
|
2018-08-02 14:08:14 +02:00
|
|
|
sets CFA at $(\reg{prev\_reg} + \textit{offset})$
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{def\_cfa\_expression(expr)}:
|
2018-08-02 14:08:14 +02:00
|
|
|
sets CFA as the result of $expr$
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{undefined(reg)}:
|
2018-08-02 14:08:14 +02:00
|
|
|
sets the register \reg{reg} as undefined in this row
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{same\_value(reg)}:
|
2018-08-02 14:08:14 +02:00
|
|
|
declares that the register \reg{reg} hasn't been touched, or was
|
|
|
|
restored to its previous value, in this row. An unwinding procedure can
|
|
|
|
leave it as-is.
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{offset(reg, offset)}:
|
2018-08-02 14:08:14 +02:00
|
|
|
the value of the register \reg{reg} is stored in memory at the address
|
|
|
|
$CFA + \textit{offset}$.
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{val\_offset(reg, offset)}:
|
2018-08-02 14:08:14 +02:00
|
|
|
the value of the register \reg{reg} is the value $CFA + \textit{offset}$
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{register(reg, model)}:
|
2018-08-02 14:08:14 +02:00
|
|
|
the register \reg{reg} has, in this row, the value that $\reg{model}$
|
|
|
|
had in the previous row
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{expression(reg, expr)}:
|
2018-08-02 14:08:14 +02:00
|
|
|
the value of \reg{reg} is stored in memory at the address defined by
|
|
|
|
$expr$
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{val\_expression(reg, expr)}:
|
2018-08-02 14:08:14 +02:00
|
|
|
\reg{reg} has the value of $expr$
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{restore(reg)}:
|
2018-08-02 14:08:14 +02:00
|
|
|
\reg{reg} has the same value as in this FDE's preamble (CIE) in this
|
|
|
|
row. This is \emph{not implemented in this semantics} for simplicity
|
|
|
|
and brevity (we would have to introduce CIE (preamble) and FDE (body)
|
|
|
|
independently). This is also not much used in actual ELF
|
2018-08-07 11:33:30 +02:00
|
|
|
files: the analysis in Section~\ref{ssec:instr_cov} found no such
|
|
|
|
instruction, on a random uniform sample of 4000 ELF files.
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{remember\_state()}:
|
2018-08-02 14:08:14 +02:00
|
|
|
push the state of all the registers of this row on an implicit stack
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{restore\_state()}:
|
2018-08-02 14:08:14 +02:00
|
|
|
pop an entry of the implicit stack, and restore all registers in this
|
|
|
|
row to the value held in the stack record.
|
2018-08-17 18:15:43 +02:00
|
|
|
\item{} \dwcfa{nop()}:
|
2018-08-02 14:08:14 +02:00
|
|
|
do nothing (padding)
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
\subsection{Intermediary language $\intermedlang$}
|
|
|
|
|
|
|
|
A first pass will translate DWARF instructions into this intermediary language
|
|
|
|
$\intermedlang$. It is designed to be more mathematical, representing the same
|
|
|
|
thing, but abstracting all the data compression of the DWARF format away, so
|
|
|
|
that we can better reason on it and transform it into C code.
|
|
|
|
|
|
|
|
Its grammar is as follows:
|
|
|
|
|
|
|
|
\begin{align*}
|
|
|
|
\FDE &::= {\left(\mathbb{Z} \times \dwrow \right)}^{\ast}
|
|
|
|
& \text{FDE (set of rows)} \\
|
|
|
|
\dwrow &::= \values ^ \regs
|
|
|
|
& \text{A single table row} \\
|
|
|
|
\regs &::= \left\{0, 1, \ldots, \operatorname{NB\_REGS - 1} \right\}
|
|
|
|
& \text{Machine registers} \\
|
|
|
|
\values &::= \bot & \text{Values: undefined,}\\
|
|
|
|
&\quad\vert~\valaddr{\spexpr} & \text{at address $x$},\\
|
|
|
|
&\quad\vert~\valval{\spexpr} & \text{of value $x$} \\
|
|
|
|
\spexpr &::= \regs \times \mathbb{Z}
|
|
|
|
& \text{A ``simple'' expression $\reg{reg} + \textit{offset}$} \\
|
|
|
|
\end{align*}
|
|
|
|
|
|
|
|
The entry point of the grammar is a $\FDE$, which is a set of rows, each
|
2018-08-19 13:13:07 +02:00
|
|
|
annotated with a machine address, the address from which it is valid.
|
|
|
|
The addresses are necessarily increasing within a FDE\@.
|
2018-08-02 14:08:14 +02:00
|
|
|
|
|
|
|
Each row then represents, as a function mapping registers to values, a row of
|
|
|
|
the unwinding table.
|
|
|
|
|
|
|
|
We implicitly consider that $\reg{reg}$ maps to a number, and we use here
|
|
|
|
\texttt{x86\_64} names for convenience, but actually in DWARF registers are
|
|
|
|
only handled as register identifiers, so we can safely state that $\reg{reg}
|
|
|
|
\in \regs$.
|
|
|
|
|
|
|
|
A value can then be undefined, stored at memory address $x$ or be directly a
|
|
|
|
value $x$, $x$ being here a simple expression consisting of $\reg{reg} +
|
2018-08-16 00:26:59 +02:00
|
|
|
\textit{offset}$. The CFA is considered a simple register here. For instance,
|
|
|
|
to define $\reg{rax}$ to the value contained in memory 16 bytes below the CFA,
|
|
|
|
we would have $\reg{rax} \mapsto \valaddr{\reg{CFA}, -16}$, since the stack
|
|
|
|
grows downwards.
|
2018-08-02 14:08:14 +02:00
|
|
|
|
2018-08-17 18:15:43 +02:00
|
|
|
\subsection{Target language: a C function body}
|
2018-08-02 14:08:14 +02:00
|
|
|
|
|
|
|
The target language of these semantics is a C function, to be interpreted with
|
|
|
|
respect to the C11 standard~\cite{c11std}. The function is supposed to be run
|
|
|
|
in the context of the program being unwound. In particular, it must be able to
|
|
|
|
dereference some pointer derived from DWARF instructions that will point to the
|
|
|
|
execution stack, or even the heap.
|
|
|
|
|
2018-08-08 14:01:55 +02:00
|
|
|
This function takes as arguments an instruction pointer --~supposedly
|
|
|
|
extracted from $\reg{rip}$~-- and an array of register values; and returns a
|
2018-08-02 14:08:14 +02:00
|
|
|
fresh array of register values after unwinding this call frame. The function is
|
2018-08-16 00:26:59 +02:00
|
|
|
compositional: it can be called twice in a row to unwind two stack frames,
|
|
|
|
unless the IP obtained after the first unwinding comes from another shared
|
|
|
|
object file, for instance a call to \prog{libc}. In this case, unwinding the
|
|
|
|
second frame will require loading the corresponding DWARF information.
|
2018-08-02 14:08:14 +02:00
|
|
|
|
2018-08-17 18:15:43 +02:00
|
|
|
The function is the following:
|
2018-08-02 14:08:14 +02:00
|
|
|
|
2018-08-17 21:48:45 +02:00
|
|
|
\lstinputlisting[language=C]{src/dw_semantics/c_context.c}\label{lst:sem_c_ctx}
|
2018-08-02 14:08:14 +02:00
|
|
|
|
|
|
|
The translation of $\intermedlang$ as produced by the later-defined function
|
|
|
|
are then to be inserted in this context, where the comment states so.
|
|
|
|
|
|
|
|
\subsection{From DWARF to $\intermedlang$}
|
|
|
|
|
|
|
|
To define the interpretation of $\DWARF$ to $\intermedlang$, we will need to
|
|
|
|
proceed forward, but, as the language inherently depends on the previous
|
|
|
|
instructions to give a meaning to the following ones, we will depend on what
|
|
|
|
was computed before. At a point of the interpretation $h \vert t$, where $t$ is
|
|
|
|
what remains to be interpreted, $h$ what has been, and $H$ the result of the
|
|
|
|
interpretation, it would thus look like $\llbracket t \rrbracket (H)$.
|
|
|
|
|
|
|
|
But we also need to keep track of this implicit stack DWARF uses, which will be
|
|
|
|
kept in subscript.
|
|
|
|
|
|
|
|
\medskip
|
|
|
|
|
2018-08-17 18:15:43 +02:00
|
|
|
Thus, we define $\semI{\bullet}{s}(\bullet): \DWARF \times \FDE \to \FDE$, for
|
2018-08-02 14:08:14 +02:00
|
|
|
$s$ a stack of $\dwrow$, that is,
|
|
|
|
\[
|
|
|
|
s \in \rowstack := \dwrow^\ast
|
|
|
|
\]
|
|
|
|
|
|
|
|
Implicitly, $\semI{\bullet}{} := \semI{\bullet}{\varepsilon}$
|
|
|
|
|
|
|
|
\medskip
|
|
|
|
|
|
|
|
For convenience, we define $\insarrow{reg}$, the operator changing the value of
|
|
|
|
a register for a given value in the last row, as
|
|
|
|
|
|
|
|
\[
|
|
|
|
\left(f \in \FDE\right) \insarrow{$r \in \regs$} (v \in values)
|
|
|
|
\quad := \quad
|
|
|
|
\left( f\left[0 \ldots |f| - 2\right] \right) \cdot \left\{
|
|
|
|
\begin{array}{r l}
|
|
|
|
r' \neq r &\mapsto \left(f[-1]\right)(r') \\
|
|
|
|
r &\mapsto v \\
|
|
|
|
\end{array} \right.
|
|
|
|
\]
|
|
|
|
|
|
|
|
The same way, we define $\extrarrow{reg}$ that \emph{extracts} the rule
|
|
|
|
currently applied for $\reg{reg}$, eg. $F \extrarrow{CFA} \valval{\reg{reg} +
|
|
|
|
\text{off}}$. If the rule currently applied in such a case is \emph{not} of the
|
|
|
|
form $\reg{reg} + \text{off}$, then the program is considered erroneous. One
|
|
|
|
can see this $\extrarrow{reg}$ somehow as a \lstc{match} statement in OCaml,
|
|
|
|
but with only one case, allowing to retrieve packed data.
|
|
|
|
|
|
|
|
More generally, we define ${\extrarrow{reg}}^{-k}$ as the same operation, but
|
|
|
|
extracting in the $k$-older row, ie. ${\extrarrow{reg}}^{0}$ is the same as
|
|
|
|
$\extrarrow{reg}$, and $F {\extrarrow{reg}}^{-1} \bullet$ is the same as
|
|
|
|
$F\left[0 \ldots |F|-2\right] \extrarrow{reg} \bullet$.
|
|
|
|
|
|
|
|
\begin{align*}
|
|
|
|
\semI{\varepsilon}{s}(F) &:= F \\
|
|
|
|
\semI{\dwcfa{set\_loc(loc)} \cdot d}{s}(F) &:=
|
|
|
|
\contsem{F \cdot \left(loc, F[-1].row \right)} \\
|
|
|
|
\semI{\dwcfa{adv\_loc(delta)} \cdot d}{s}(F) &:=
|
|
|
|
\contsem{F \cdot \left(F[-1].addr + delta, F[-1].row \right)} \\
|
|
|
|
\semI{\dwcfa{def\_cfa(reg, offset)} \cdot d}{s}(F) &:=
|
|
|
|
\contsem{F \insarrow{CFA} \valval{\reg{reg} + offset}} \\
|
|
|
|
\semI{\dwcfa{def\_cfa\_register(reg)} \cdot d}{s}(F) &:=
|
|
|
|
\text{let F }\extrarrow{CFA} \valval{\reg{oldreg} + \text{oldoffset}}
|
|
|
|
\text{ in} \\
|
|
|
|
&\quad \contsem{F \insarrow{CFA} \valval{\reg{reg} + oldoffset}} \\
|
|
|
|
\semI{\dwcfa{def\_cfa\_offset(offset)} \cdot d}{s}(F) &:=
|
|
|
|
\text{let F }\extrarrow{CFA} \valval{\reg{oldreg} + \text{oldoffset}}
|
|
|
|
\text{ in} \\
|
|
|
|
&\quad \contsem{F \insarrow{CFA} \valval{\reg{oldreg} + offset}} \\
|
|
|
|
\semI{\dwcfa{def\_cfa\_expression(expr)} \cdot d}{s}(F) &:=
|
|
|
|
\text{TO BE DEFINED} &\qtodo{CHECK ME?} \\
|
|
|
|
\semI{\dwcfa{undefined(reg)} \cdot d}{s}(F) &:=
|
|
|
|
\contsem{F \insarrow{reg} \bot} \\
|
|
|
|
\semI{\dwcfa{same\_value(reg)} \cdot d}{s}(F) &:=
|
|
|
|
\valval{\reg{reg}} \\
|
|
|
|
\semI{\dwcfa{offset(reg, offset)} \cdot d}{s}(F) &:=
|
|
|
|
\contsem{F \insarrow{reg} \valaddr{\reg{CFA} + \textit{offset}}} \\
|
|
|
|
\semI{\dwcfa{val\_offset(reg, offset)} \cdot d}{s}(F) &:=
|
|
|
|
\contsem{F \insarrow{reg} \valval{\reg{CFA} + \textit{offset}}} \\
|
|
|
|
\semI{\dwcfa{register(reg, model)} \cdot d}{s}(F) &:=
|
|
|
|
\text{let } F {\extrarrow{model}}^{-1} r \text{ in }
|
|
|
|
\contsem{F \insarrow{reg} r} \\
|
|
|
|
\semI{\dwcfa{expression(reg, expr)} \cdot d}{s}(F) &:=
|
|
|
|
\text{TO BE DEFINED} &\qtodo{CHECK ME?}\\
|
|
|
|
\semI{\dwcfa{val\_expression(reg, expr)} \cdot d}{s}(F) &:=
|
|
|
|
\text{TO BE DEFINED} &\qtodo{CHECK ME?}\\
|
|
|
|
% \semI{\dwcfa{restore(reg)} \cdot d}{s}(F) &:= \\ %% NOT IMPLEMENTED
|
|
|
|
\semI{\dwcfa{remember\_state()} \cdot d}{s}(F) &:=
|
|
|
|
\semI{d}{s \cdot F[-1].row}\left(F\right) \\
|
|
|
|
\semI{\dwcfa{restore\_state()} \cdot d}{s \cdot t}(F) &:=
|
|
|
|
\semI{d}{s}\left(F\left[0 \ldots |F|-2\right] \cdot
|
|
|
|
\left(F[-1].addr, t\right) \right) \\
|
|
|
|
\semI{\dwcfa{nop()} \cdot d}{s}(F) &:= \contsem{F}\\
|
|
|
|
\end{align*}
|
|
|
|
|
2018-08-16 00:26:59 +02:00
|
|
|
The stack is used for \texttt{remember\_state} and \texttt{restore\_state}. If
|
|
|
|
we omit those two operations, we can plainly remove the stack.
|
2018-08-02 14:08:14 +02:00
|
|
|
|
|
|
|
|
|
|
|
\subsection{From $\intermedlang$ to C}
|
|
|
|
|
|
|
|
\textit{This only defines the semantics, with respect to standard C, of DWARF
|
|
|
|
as interpreted by \ehelf\@. The actual DWARF to C compiler is not implemented
|
|
|
|
this way.}
|
|
|
|
|
|
|
|
\medskip
|
|
|
|
|
2018-08-17 18:15:43 +02:00
|
|
|
We now define $\semC{\bullet}: \DWARF \to C$, in the context presented
|
2018-08-02 14:08:14 +02:00
|
|
|
earlier. The translation from $\intermedlang$ to C is defined as follows:
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
\item $\semC{\varepsilon} =$ \\
|
|
|
|
\begin{lstlisting}[language=C, mathescape=true]
|
|
|
|
else {
|
|
|
|
for(int reg=0; reg < NB_REGS; ++reg)
|
|
|
|
new_ctx[reg] = $\semR{\bot}$;
|
|
|
|
}
|
|
|
|
\end{lstlisting}
|
|
|
|
|
|
|
|
\item $\semC{(\text{loc}, \text{row}) \cdot t} = C\_code \cdot \semC{t}$,
|
|
|
|
where $C\_code$ is
|
|
|
|
\begin{lstlisting}[language=C, mathescape=true]
|
|
|
|
if(ip >= $loc$) {
|
|
|
|
for(int reg=0; reg < NB_REGS; ++reg)
|
|
|
|
new_ctx[reg] = $\semR{row[reg]}$;
|
2018-08-18 21:12:05 +02:00
|
|
|
goto end_ifs; // Avoid using `else if` (easier for generation)
|
2018-08-02 14:08:14 +02:00
|
|
|
}
|
|
|
|
\end{lstlisting}
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
and $\semR{\bullet}$ is defined as
|
|
|
|
\begin{align*}
|
|
|
|
\semR{\bot} &= \text{\lstc{ERROR_VALUE}} \\
|
|
|
|
\semR{\valaddr{\text{reg}, \textit{offset}}} &=
|
|
|
|
\text{\lstc{*(old_ctx[reg] + offset)}} \\
|
|
|
|
\semR{\valval{\text{reg}, \textit{offset}}} &=
|
|
|
|
\text{\lstc{(old_ctx[reg] + offset)}} \\
|
|
|
|
\end{align*}
|
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\section{Stack unwinding data compilation}
|
|
|
|
|
2018-08-19 13:13:07 +02:00
|
|
|
In this section, we will study all the design options we explored for the
|
|
|
|
actual C implementation.
|
2018-08-03 01:04:38 +02:00
|
|
|
|
2018-08-17 20:38:20 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\subsection{Code availability}\label{ssec:code_avail}
|
|
|
|
|
|
|
|
All the code produced during this internship is available on the various
|
|
|
|
repositories from \url{https://git.tobast.fr/m2-internship/}. The repositories
|
|
|
|
contain \texttt{README} files describing them; a summary and global description
|
|
|
|
can be found in the \texttt{abstract} repository. This should be detailed
|
|
|
|
enough to run the project. The source code is entirely under free software
|
|
|
|
licenses.
|
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-08-07 11:09:33 +02:00
|
|
|
\subsection{Compilation: \ehelfs}\label{ssec:ehelfs}
|
2018-08-03 01:04:38 +02:00
|
|
|
|
|
|
|
The rough idea of the compilation is to produce, out of the \ehframe{} section
|
|
|
|
of a binary, C code that resembles the code shown in the DWARF semantics from
|
2018-08-08 14:01:55 +02:00
|
|
|
Section~\ref{sec:semantics} above. This C code is then compiled by GCC in
|
2018-08-18 21:12:05 +02:00
|
|
|
\lstbash{-O2} mode. This saves us the trouble of optimizing the generated C
|
|
|
|
code whenever GCC does that by itself.
|
2018-08-04 02:28:39 +02:00
|
|
|
|
2018-08-08 14:01:55 +02:00
|
|
|
The generated code consists in a single monolithic function, \lstc{_eh_elf},
|
|
|
|
taking as arguments an instruction pointer and a memory context (\ie{} the
|
|
|
|
value of the various machine registers) as defined in
|
|
|
|
Listing~\ref{lst:unw_ctx}. The function will then return a fresh memory
|
|
|
|
context, containing the values the registers hold after unwinding this frame.
|
2018-08-04 02:28:39 +02:00
|
|
|
|
2018-08-18 21:12:05 +02:00
|
|
|
The body of the function itself consists in a single monolithic switch, taking
|
2018-08-18 22:06:55 +02:00
|
|
|
advantage of the non-standard --~yet overwhelmingly implemented in common C
|
|
|
|
compilers~-- syntax for range switches, in which each \lstinline{case} can
|
|
|
|
refer to a range, \eg{} \lstc{case 17 ... 42:}. All the FDEs are merged
|
|
|
|
together into this switch, each row of a FDE being a switch case. Separating
|
|
|
|
the various FDEs in the C code --~other than with comments~-- is, unlike what
|
|
|
|
is done in DWARF, pointless, since accessing a ``row'' has a linear cost, and
|
|
|
|
the C code is not meant to be read, except maybe for debugging purposes. The
|
|
|
|
switch cases bodies then fill a context with unwound values before return it.
|
2018-08-08 14:01:55 +02:00
|
|
|
|
|
|
|
A setting of the compiler also optionally enables another parameter to the
|
|
|
|
\lstc{_eh_elf} function, \lstc{deref}, which is a function pointer. This
|
2018-08-16 00:26:59 +02:00
|
|
|
\lstc{deref} function, when present, replaces everywhere the dereferencing
|
2018-08-08 14:01:55 +02:00
|
|
|
\lstc{*} operator, and can be used to generate \ehelfs{} that will work on
|
2018-08-16 00:26:59 +02:00
|
|
|
remote address spaces, that is, whenever the unwinding is not done on the
|
|
|
|
process reading the \ehelf{} itself, but some other process, or even on a stack
|
|
|
|
dump of a long-terminated process.
|
2018-08-04 01:28:14 +02:00
|
|
|
|
|
|
|
Unlike in the \ehframe, and unlike what should be done in a release,
|
|
|
|
real-world-proof version of the \ehelfs, the choice was made to keep this
|
2018-08-17 18:07:17 +02:00
|
|
|
implementation simple, and only handle the few registers that were needed to
|
|
|
|
simply unwind the stack. Thus, the only registers handled in \ehelfs{} are
|
2018-08-18 21:12:05 +02:00
|
|
|
\reg{rip}, \reg{rbp}, \reg{rsp} and \reg{rbx}, the latter being used a few
|
2018-08-18 22:06:55 +02:00
|
|
|
times in \prog{libc} and other less common libraries to hold the CFA address in
|
|
|
|
common functions. This is enough to unwind the stack reliably, and thus enough
|
|
|
|
for profiling, but is not sufficient to analyze every stack frame as \prog{gdb}
|
|
|
|
would do after a \lstbash{frame n} command. Yet, if one was to enhance the
|
|
|
|
code to handle every register, it would not be much harder and would probably
|
|
|
|
be only a few hours worth of code refactoring and rewriting.
|
2018-08-04 01:28:14 +02:00
|
|
|
|
|
|
|
\lstinputlisting[language=C, caption={Unwinding context}, label={lst:unw_ctx}]
|
|
|
|
{src/dwarf_assembly_context/unwind_context.c}
|
|
|
|
|
|
|
|
In the unwind context from Listing~\ref{lst:unw_ctx}, the values of type
|
|
|
|
\lstc{uintptr_t} are the values of the corresponding registers, and
|
2018-08-08 14:01:55 +02:00
|
|
|
\lstc{flags} is a 8-bits value, indicating for each register whether it is
|
2018-08-16 00:26:59 +02:00
|
|
|
present or not in this context, plus an error bit, indicating whether an error
|
|
|
|
occurred during unwinding. Such errors can be due \eg{} to an unsupported
|
2018-08-17 21:48:45 +02:00
|
|
|
operation in the original DWARF\@. This context differs from the one presented
|
|
|
|
in Section~\ref{lst:sem_c_ctx}, since the previous one was only an array of
|
|
|
|
values, and the one from the real implementation is more robust, in particular
|
|
|
|
by including an error flag by lack of $\bot$ value.
|
2018-08-03 01:04:38 +02:00
|
|
|
|
|
|
|
This generated data is stored in separate shared object files, which we call
|
|
|
|
\ehelfs. It would have been possible to alter the original ELF file to embed
|
2018-08-18 21:12:05 +02:00
|
|
|
this data as a new section, but getting it to be executed just as any portion
|
|
|
|
of the \lstc{.text} section would probably have been painful, and keeping it
|
|
|
|
separated during the experimental phase is convenient. It is possible to have
|
|
|
|
multiple versions of \ehelfs{} files in parallel, with various options turned
|
|
|
|
on or off, and it doesn't require to alter the base system by editing \eg{}
|
|
|
|
\texttt{/usr/lib/libc-*.so}. Instead, when the \ehelf{} data is required, those
|
|
|
|
files can simply be \lstc{dlopen}'d. It is also possible to imagine, in a
|
|
|
|
future environment production, packaging \ehelfs{} files separately, so that
|
2018-08-18 22:06:55 +02:00
|
|
|
people interested in better performance can have the choice to install them.
|
2018-08-03 01:04:38 +02:00
|
|
|
|
2018-08-17 20:38:20 +02:00
|
|
|
This, in particular, means that each ELF file has its unwinding data in a
|
2018-08-18 22:06:55 +02:00
|
|
|
separate \ehelf{} file, implying that the unwinding data for a given program is
|
|
|
|
scattered among various \ehelf{} files, one for each shared object loaded
|
|
|
|
--~just like with DWARF, where each ELF retains its own DWARF data. Thus, an
|
|
|
|
unwinder must first acquire a \emph{memory map}, a table listing the various
|
|
|
|
ELF files loaded and \emph{mapped} in memory, and on which memory segment. This
|
|
|
|
memory map is provided by the operating system --~for instance, on Linux, it is
|
|
|
|
available as a file in \texttt{/proc}. Once this map is acquired, when
|
|
|
|
unwinding from a given IP, the unwinder must identify the memory segment from
|
|
|
|
which it comes, deduce the source ELF file, and deduce the corresponding
|
|
|
|
\ehelf.
|
2018-08-17 20:38:20 +02:00
|
|
|
|
2018-08-04 02:28:39 +02:00
|
|
|
\medskip
|
2018-08-03 02:15:54 +02:00
|
|
|
|
2018-08-04 02:28:39 +02:00
|
|
|
\lstinputlisting[language=C, caption={\ehelf{} for the previous example},
|
|
|
|
label={lst:fib7_eh_elf_basic}]
|
|
|
|
{src/fib7/fib7.eh_elf_basic.c}
|
|
|
|
|
2018-08-18 22:06:55 +02:00
|
|
|
The C code in Listing~\ref{lst:fib7_eh_elf_basic} is the relevant part of what
|
|
|
|
was generated for the C code in Listing~\ref{lst:ex1_c}.
|
2018-08-01 18:43:42 +02:00
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\subsection{First results}
|
2018-08-03 01:04:38 +02:00
|
|
|
|
|
|
|
Without any particular care to efficiency or compactness, it is already
|
|
|
|
possible to produce a compiled version very close to the one described in
|
|
|
|
Section~\ref{sec:semantics}. Although the unwinding speed cannot yet be
|
2018-08-08 14:01:55 +02:00
|
|
|
actually benchmarked, it is already possible to write in a few hundred lines of
|
|
|
|
C code a simple stack walker printing the functions traversed. It already works
|
2018-08-18 21:12:05 +02:00
|
|
|
well on the standard cases that are easily tested, and can be used to unwind
|
|
|
|
the stack of simple programs.
|
2018-08-03 01:04:38 +02:00
|
|
|
|
|
|
|
The major drawback of this approach, without any particular care taken, is the
|
2018-08-08 14:01:55 +02:00
|
|
|
space waste. The space taken by those tentative \ehelfs{} is analyzed in
|
|
|
|
Table~\ref{table:basic_eh_elf_space} for \prog{hackbench}, a small program
|
|
|
|
introduced later in Section~\ref{ssec:bench_perf}, and the libraries on which
|
|
|
|
it depends.
|
|
|
|
|
2018-08-03 01:04:38 +02:00
|
|
|
|
|
|
|
\begin{table}[h]
|
2018-08-03 13:59:11 +02:00
|
|
|
\centering
|
2018-08-03 01:04:38 +02:00
|
|
|
\begin{tabular}{r r r r r r}
|
|
|
|
\toprule
|
|
|
|
\thead{Shared object} & \thead{Original \\ program size}
|
|
|
|
& \thead{Original \\ \lstc{.eh\_frame}}
|
|
|
|
& \thead{Generated \\ \ehelf{} \lstc{.text}}
|
|
|
|
& \thead{\% of original \\ program size}
|
|
|
|
& \thead{Growth \\ factor} \\
|
|
|
|
\midrule
|
|
|
|
libc-2.27.so & 1.4 MiB & 130.1 KiB & 914.9 KiB & 63.92 & 7.03 \\
|
|
|
|
libpthread-2.27.so & 58.1 KiB & 11.6 KiB & 70.5 KiB & 121.48 & 6.09 \\
|
|
|
|
ld-2.27.so & 129.6 KiB & 9.6 KiB & 71.7 KiB & 55.34 & 7.44 \\
|
|
|
|
hackbench & 2.9 KiB & 568.0 B & 2.1 KiB & 74.78 & 3.97 \\
|
|
|
|
Total & 1.6 MiB & 151.8 KiB & 1.0 MiB & 65.32 & 6.98 \\
|
|
|
|
\bottomrule
|
|
|
|
\end{tabular}
|
|
|
|
|
|
|
|
\caption{Basic \ehelfs{} space usage}\label{table:basic_eh_elf_space}
|
|
|
|
\end{table}
|
|
|
|
|
|
|
|
The first column only includes the sizes of the ELF sections \lstc{.text} (the
|
|
|
|
program itself) and \lstc{.rodata}, the read-only data (such as static strings,
|
|
|
|
etc.). Only the weight of the \lstc{.text} section of the generated \ehelfs{}
|
2018-08-18 22:06:55 +02:00
|
|
|
is considered, because it is self-contained (few data or none is stored in
|
2018-08-03 01:04:38 +02:00
|
|
|
\lstc{.rodata}), and the other sections could be removed if the \ehelfs{}
|
|
|
|
\lstc{.text} was somehow embedded in the original shared object.
|
|
|
|
|
|
|
|
This first tentative version of \ehelfs{} is roughly 7 times heavier than the
|
|
|
|
original \lstc{.eh_frame}, and represents a far too significant proportion of
|
2018-08-18 22:06:55 +02:00
|
|
|
the original program size ($65\,\%$).
|
2018-08-03 01:04:38 +02:00
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-08-04 20:58:11 +02:00
|
|
|
\subsection{Space optimization}\label{ssec:space_optim}
|
2018-08-03 02:15:54 +02:00
|
|
|
|
|
|
|
A lot of small space optimizations, such as filtering out empty FDEs, merging
|
|
|
|
together the rows that are equivalent on all the registers kept, etc.\ were
|
|
|
|
made in order to shrink the \ehelfs.
|
|
|
|
|
|
|
|
\medskip
|
|
|
|
|
2018-08-04 02:28:39 +02:00
|
|
|
The major optimization that most reduced the output size was to use an if/else
|
2018-08-18 21:12:05 +02:00
|
|
|
tree implementing a binary search on the instruction pointer relevant
|
|
|
|
intervals, instead of a single monolithic switch. In the process, we also
|
|
|
|
\emph{outline} code whenever possible, that is, find out identical ``switch
|
2018-08-18 22:06:55 +02:00
|
|
|
cases'' bodies --~which are not switch cases anymore, but \texttt{if}
|
|
|
|
bodies~--, move them outside of the if/else tree, identify them by a label, and
|
|
|
|
jump to them using a \lstc{goto}, which de-duplicates a lot of code and
|
|
|
|
contributes greatly to the shrinking. In the process, we noticed that the vast
|
|
|
|
majority of FDE rows are actually taken among very few ``common'' FDE rows. For
|
|
|
|
instance, in the \prog{libc}, out of a total of $20827$ rows, only $302$
|
|
|
|
($1.5\,\%$) unique rows remain after the outlining.
|
2018-08-03 02:15:54 +02:00
|
|
|
|
2018-08-04 02:28:39 +02:00
|
|
|
This makes this optimization really efficient, as seen later in
|
2018-08-08 14:01:55 +02:00
|
|
|
Section~\ref{ssec:results_size}, but also makes it an interesting question
|
|
|
|
--~not investigated during this internship~-- to find out whether standard
|
|
|
|
DWARF data could be efficiently compressed in this way.
|
2018-08-01 18:43:42 +02:00
|
|
|
|
2018-08-04 02:28:39 +02:00
|
|
|
\begin{minipage}{0.45\textwidth}
|
|
|
|
\lstinputlisting[language=C, caption={\ehelf{} for the previous example},
|
|
|
|
label={lst:fib7_eh_elf_outline},
|
|
|
|
lastline=18]
|
|
|
|
{src/fib7/fib7.eh_elf_outline.c}
|
|
|
|
\end{minipage} \hfill \begin{minipage}{0.45\textwidth}
|
|
|
|
\lstinputlisting[language=C, firstnumber=last, firstline=19]
|
|
|
|
{src/fib7/fib7.eh_elf_outline.c}
|
|
|
|
\end{minipage}
|
2018-08-01 18:43:42 +02:00
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-08-18 21:12:05 +02:00
|
|
|
\section{Benchmarking}\label{sec:benchmarking}
|
2018-08-01 18:43:42 +02:00
|
|
|
|
2018-08-04 02:59:15 +02:00
|
|
|
Benchmarking turned out to be, quite surprisingly, the hardest part of the
|
2018-08-18 21:12:05 +02:00
|
|
|
project. It ended up requiring a good deal of investigation to find a working
|
2018-08-04 02:59:15 +02:00
|
|
|
protocol, and afterwards, a good deal of code reading and coding to get the
|
|
|
|
solution working.
|
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-08-04 14:07:34 +02:00
|
|
|
\subsection{Requirements}\label{ssec:bench_req}
|
2018-08-04 02:59:15 +02:00
|
|
|
|
|
|
|
To provide relevant benchmarks of the \ehelfs{} performance, one must sample at
|
2018-08-18 22:06:55 +02:00
|
|
|
least a few hundreds or thousands of stack unwindings, since a single frame
|
2018-08-04 02:59:15 +02:00
|
|
|
unwinding with regular DWARF takes the order of magnitude of $10\,\mu s$, and
|
|
|
|
\ehelfs{} were expected to have significantly better performance.
|
|
|
|
|
|
|
|
However, unwinding over and over again from the same program point would have
|
|
|
|
had no interest at all, since \prog{libunwind} would have simply cached the
|
2018-08-18 22:06:55 +02:00
|
|
|
relevant DWARF rows. In the mean time, making sure that the various unwindings
|
2018-08-04 02:59:15 +02:00
|
|
|
are made from different locations is somehow cheating, since it makes useless
|
2018-08-08 14:01:55 +02:00
|
|
|
\prog{libunwind}'s caching and does not reproduce ``real-world'' unwinding
|
|
|
|
distribution. All in all, the benchmarking method must have a ``natural''
|
|
|
|
distribution of unwindings.
|
2018-08-04 02:59:15 +02:00
|
|
|
|
2018-08-18 21:12:05 +02:00
|
|
|
Another requirement is to also distribute evenly enough the unwinding points
|
|
|
|
across the program to mimic real-world unwinding: we would like to benchmark
|
|
|
|
stack unwindings crossing some standard library functions, starting from inside
|
|
|
|
them, etc.
|
2018-08-04 02:59:15 +02:00
|
|
|
|
2018-08-18 21:12:05 +02:00
|
|
|
Finally, the unwound program must be interesting enough to enter and exit
|
2018-08-18 22:06:55 +02:00
|
|
|
functions often, building a good stack of nested function calls (at least
|
|
|
|
frequently 5), have FDEs that are not as simple as in Listing~\ref{lst:ex1_dw},
|
2018-08-18 21:12:05 +02:00
|
|
|
etc.
|
2018-08-04 02:59:15 +02:00
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-08-07 20:44:12 +02:00
|
|
|
\subsection{Presentation of \prog{perf}}\label{ssec:perf}
|
2018-08-04 02:59:15 +02:00
|
|
|
|
2018-08-16 00:26:59 +02:00
|
|
|
\prog{Perf} is a \emph{profiler} that comes with the Linux ecosystem, and is
|
|
|
|
even developed within the Linux kernel source tree. A profiler is an important
|
|
|
|
tool from the developer's toolbox that analyzes the performance of programs by
|
|
|
|
recording the time spent in each function, including within nested calls. This
|
|
|
|
analysis often enables programmers to optimize critical paths and functions in
|
|
|
|
their programs, while leaving unoptimized functions that are seldom traversed.
|
2018-08-04 02:59:15 +02:00
|
|
|
|
2018-08-17 18:07:17 +02:00
|
|
|
\prog{Perf} is a \emph{polling} profiler, to be opposed with
|
|
|
|
\emph{instrumenting} profilers. This means that with \prog{perf}, the basic
|
|
|
|
idea is to stop the traced program at regular intervals, unwind its stack,
|
|
|
|
write down the current nested function calls, and integrate the sampled data in
|
|
|
|
the end. Instrumenting profilers, on the other hand, do not interrupt the
|
|
|
|
program, but instead inject code in it.
|
2018-08-04 02:59:15 +02:00
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-08-04 14:59:09 +02:00
|
|
|
\subsection{Benchmarking with \prog{perf}}\label{ssec:bench_perf}
|
2018-08-04 14:07:34 +02:00
|
|
|
|
|
|
|
In the context of this internship, the main advantage of \prog{perf} is that it
|
2018-08-18 21:12:05 +02:00
|
|
|
unwinds the stack on a regular, controllable basis, easily unwinding thousands
|
|
|
|
of time in a few seconds. It also meets all the requirements from
|
2018-08-04 14:07:34 +02:00
|
|
|
Section~\ref{ssec:bench_req} above: since it stops at regular intervals and
|
|
|
|
unwinds, the unwindings are evenly distributed \wrt{} the frequency of
|
|
|
|
execution of the code, which is a natural enough setup for the benchmarks to be
|
|
|
|
meaningful, while still unwinding from diversified locations, preventing
|
2018-08-18 22:06:55 +02:00
|
|
|
caching from being be overwhelming --~as can be observed later in
|
|
|
|
Section~\ref{ssec:timeperf}. It also has the ability to unwind from
|
2018-08-04 14:07:34 +02:00
|
|
|
within any function, included functions of linked shared libraries. It can also
|
|
|
|
be applied to virtually any program, which allows unwinding ``interesting''
|
|
|
|
code.
|
|
|
|
|
|
|
|
The program that was chosen for \prog{perf}-benchmarking is
|
|
|
|
\prog{hackbench}~\cite{hackbenchsrc}. This small program is designed to
|
|
|
|
stress-test and benchmark the Linux scheduler by spawning processes or threads
|
|
|
|
that communicate with each other. It has the interest of generating stack
|
|
|
|
activity, be linked against \prog{libc} and \prog{pthread}, and be very light.
|
|
|
|
|
|
|
|
\medskip
|
|
|
|
|
|
|
|
Interfacing \ehelfs{} with \prog{perf} required, in a first place, to fork
|
|
|
|
\prog{libunwind} and implement \ehelfs{} support for it. In the process, it
|
|
|
|
turned out necessary to slightly modify \prog{libunwind}'s interface to add a
|
2018-08-16 00:26:59 +02:00
|
|
|
parameter to an initialisation function, since \prog{libunwind} is made to be
|
|
|
|
agnostic of the system and process as much as possible, to be able to unwind in
|
2018-08-17 20:38:20 +02:00
|
|
|
any context. This very restricted information lacked a memory map (see
|
2018-08-18 22:06:55 +02:00
|
|
|
Section~\ref{ssec:ehelfs}) in order to use \ehelfs{} --~while, on the other
|
|
|
|
hand, providing information about the original DWARF that are now useless.
|
|
|
|
Apart from this, the modified version of \prog{libunwind} produced is entirely
|
|
|
|
compatible with the vanilla version. This means that the only modifications
|
|
|
|
required to use \ehelfs{} within any project using \prog{libunwind} should be
|
|
|
|
changing one line of code to add one parameter to a function call and linking
|
|
|
|
against the modified version of \prog{libunwind} instead of the system version.
|
2018-08-04 14:07:34 +02:00
|
|
|
|
|
|
|
Once this was done, plugging it in \prog{perf} was the matter of a few lines of
|
2018-08-08 14:01:55 +02:00
|
|
|
code only, left apart the benchmarking code. The major problem encountered was
|
|
|
|
to understand how \prog{perf} works. In order to avoid perturbing the traced
|
|
|
|
program, \prog{perf} does not unwind at runtime, but rather records at regular
|
|
|
|
intervals the program's stack, and all the auxiliary information that is needed
|
2018-08-18 22:06:55 +02:00
|
|
|
to unwind later. This is done when running \lstbash{perf record}. Then, a
|
|
|
|
subsequent call to \lstbash{perf report} unwinds the stack to analyze it; but
|
|
|
|
at this point of time, the traced process is long dead. Thus, any PID-based
|
|
|
|
approach, or any approach using \texttt{/proc} information will fail. However,
|
|
|
|
as this was the easiest method, the first version of \ehelfs{} used those
|
|
|
|
mechanisms; it took some code rewriting to move to a PID- and
|
|
|
|
\texttt{/proc}-agnostic implementation.
|
2018-08-08 15:00:27 +02:00
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\subsection{Other explored methods}
|
2018-08-04 14:07:34 +02:00
|
|
|
|
|
|
|
The first approach tried to benchmark was trying to create some specific C code
|
|
|
|
that would meet the requirements from Section~\ref{ssec:bench_req}, while
|
2018-08-18 21:12:05 +02:00
|
|
|
calling itself a benchmarking procedure from time to time. This was quickly
|
|
|
|
abandoned, because generating C code interesting enough to be unwound turned
|
|
|
|
out hard, and the generated FDEs invariably ended out uninteresting. It would
|
|
|
|
also never have met the requirement of unwinding from fairly distributed
|
2018-08-04 14:07:34 +02:00
|
|
|
locations anyway.
|
|
|
|
|
|
|
|
Another attempt was made using CSmith~\cite{csmith}, a random C code generator
|
|
|
|
initially made for C compilers random testing. The idea was still to craft an
|
|
|
|
interesting C program that would unwind on its own frequently, but to integrate
|
2018-08-08 14:01:55 +02:00
|
|
|
CSmith-randomly generated C code within hand-written C snippets that
|
2018-08-04 14:07:34 +02:00
|
|
|
would generate large enough FDEs and nested calls. This was abandoned as well
|
|
|
|
as the call graph of a CSmith-generated code is often far too small, and the
|
|
|
|
CSmith code is notoriously hard to understand and edit.
|
2018-08-01 18:43:42 +02:00
|
|
|
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\section{Results}
|
|
|
|
|
2018-08-04 14:59:09 +02:00
|
|
|
\subsection{Hardware used}~\label{ssec:bench_hw}
|
|
|
|
|
|
|
|
All the measures in this report were made on a computer with an Intel Xeon
|
|
|
|
E3-1505M v6 CPU, with a clock frequency of $3.00$\,GHz and 8 cores. The
|
|
|
|
computer has 32\,GB of RAM, and care was taken never to fill it and start
|
|
|
|
swapping.
|
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-08-17 21:48:45 +02:00
|
|
|
\subsection{Measured time performance}\label{ssec:timeperf}
|
2018-08-04 14:59:09 +02:00
|
|
|
|
2018-08-16 00:26:59 +02:00
|
|
|
A benchmarking of \ehelfs{} against the vanilla \prog{libunwind} was made using
|
|
|
|
the exact same methodology as in Section~\ref{ssec:bench_perf}, only linking
|
|
|
|
\prog{perf} against the vanilla \prog{libunwind}. It yields the results in
|
2018-08-04 14:59:09 +02:00
|
|
|
Table~\ref{table:bench_time}.
|
|
|
|
|
|
|
|
\begin{table}[h]
|
|
|
|
\centering
|
|
|
|
\begin{tabular}{l r r r r r}
|
|
|
|
\toprule
|
|
|
|
\thead{Unwinding method} & \thead{Frames \\ unwound}
|
|
|
|
& \thead{Total time \\ unwinding ($\mu s$)}
|
|
|
|
& \thead{Average time \\ per frame ($ns$)}
|
|
|
|
& \thead{Unwinding \\ errors}
|
|
|
|
& \thead{Time ratio} \\
|
|
|
|
\midrule
|
|
|
|
\ehelfs{}
|
|
|
|
& 23506 % Frames unwound
|
|
|
|
& 14837 % Total time
|
|
|
|
& 631 % Avg time
|
|
|
|
& 1099 % # Errors
|
|
|
|
& 1
|
|
|
|
\\
|
|
|
|
\prog{libunwind}, cached
|
|
|
|
& 27058 % Frames unwound
|
|
|
|
& 441601 % Total time
|
|
|
|
& 16320 % Avg time
|
|
|
|
& 885 % # Errors
|
|
|
|
& 25.9
|
|
|
|
\\
|
|
|
|
\prog{libunwind}, uncached
|
|
|
|
& 27058 % Frames unwound
|
|
|
|
& 671292 % Total time
|
|
|
|
& 24809 % Avg time
|
|
|
|
& 885 % # Errors
|
|
|
|
& 39.3
|
|
|
|
\\
|
|
|
|
\bottomrule
|
|
|
|
\end{tabular}
|
|
|
|
|
|
|
|
\caption{Time benchmarking on hackbench}\label{table:bench_time}
|
|
|
|
\end{table}
|
|
|
|
|
2018-08-08 14:29:04 +02:00
|
|
|
The performance of \ehelfs{} is probably overestimated for a production-ready
|
|
|
|
version, since \ehelfs{} do not handle all registers from the original DWARF
|
|
|
|
file, and thus the \prog{libunwind} version must perform more computation.
|
|
|
|
However, this overhead, although impossible to measure without first
|
|
|
|
implementing supports for every register, would probably not be that big, since
|
|
|
|
most of the time is spent finding the relevant row. Support for every DWARF
|
|
|
|
instruction, however, would not slow down at all the implementation, since
|
|
|
|
every instruction would simply be compiled to x86\_64 without affecting the
|
|
|
|
already supported code.
|
2018-08-08 14:19:29 +02:00
|
|
|
|
2018-08-18 22:06:55 +02:00
|
|
|
The fact that there is a sharp difference between cached and uncached
|
|
|
|
\prog{libunwind} confirm that our experimental setup did not unwind at totally
|
|
|
|
different locations every single time, and thus was not biased in this
|
|
|
|
direction, since caching is still very efficient.
|
|
|
|
|
2018-08-16 00:26:59 +02:00
|
|
|
It is also worth noting that the compilation time of \ehelfs{} is also
|
|
|
|
reasonably short. On the machine described in Section~\ref{ssec:bench_hw}, and
|
|
|
|
without using multiple cores to compile, the various shared objects needed to
|
|
|
|
run \prog{hackbench} --~that is, \prog{hackbench}, \prog{libc}, \prog{ld} and
|
|
|
|
\prog{libpthread}~-- are compiled in an overall time of $25.28$ seconds.
|
2018-08-01 18:43:42 +02:00
|
|
|
|
2018-08-17 21:48:45 +02:00
|
|
|
The unwinding errors observed are hard to investigate, but are most probably
|
|
|
|
due to truncated stack records. Indeed, since \prog{perf} dumps the last $n$
|
|
|
|
bytes of the call stack (for a given $n$), and only keeps those for later
|
|
|
|
unwinding, large stacks leads to lost information when analyzing the results.
|
|
|
|
The difference between \ehelfs{} and the vanilla library could be due either to
|
|
|
|
unsupported DWARF instructions or registers, \prog{libdwarfpp} bugs or bugs in
|
|
|
|
the custom \prog{libunwind} implementation that were not spotted.
|
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-08-03 02:15:54 +02:00
|
|
|
\subsection{Measured compactness}\label{ssec:results_size}
|
2018-08-04 20:58:11 +02:00
|
|
|
|
|
|
|
A first measure of compactness was made in this report for one of the earliest
|
|
|
|
working versions in Table~\ref{table:basic_eh_elf_space}.
|
|
|
|
|
|
|
|
The same data, generated for the latest version of \ehelfs, can be seen in
|
|
|
|
Table~\ref{table:bench_space}.
|
|
|
|
|
|
|
|
The effect of the outlining mentioned in Section~\ref{ssec:space_optim} is
|
|
|
|
particularly visible in this table: \prog{hackbench} has a significantly bigger
|
|
|
|
growth than the other shared objects. This is because \prog{hackbench} has a
|
|
|
|
way smaller \lstc{.eh_frame}, thus, the outlined data is reused only a few
|
|
|
|
times, compared to \eg{} \prog{libc}, in which the outlined data is reused a
|
|
|
|
lot.
|
|
|
|
|
2018-08-08 14:19:29 +02:00
|
|
|
Just as with time performance, the measured compactness would be impacted by
|
|
|
|
supporting every register, but probably not that much either, since most
|
|
|
|
columns are concerned with the four supported registers (see
|
|
|
|
Section~\ref{ssec:instr_cov}).
|
|
|
|
|
2018-08-04 20:58:11 +02:00
|
|
|
\begin{table}[h]
|
|
|
|
\centering
|
|
|
|
\begin{tabular}{r r r r r r}
|
|
|
|
\toprule
|
|
|
|
\thead{Shared object} & \thead{Original \\ program size}
|
|
|
|
& \thead{Original \\ \lstc{.eh\_frame}}
|
|
|
|
& \thead{Generated \\ \ehelf{} \lstc{.text}}
|
|
|
|
& \thead{\% of original \\ program size}
|
|
|
|
& \thead{Growth \\ factor} \\
|
|
|
|
\midrule
|
|
|
|
libc-2.27.so
|
|
|
|
& 1.4 MiB & 130.1 KiB & 313.2 KiB & 21.88 & 2.41 \\
|
|
|
|
libpthread-2.27.so
|
|
|
|
& 58.1 KiB & 11.6 KiB & 25.4 KiB & 43.71 & 2.19 \\
|
|
|
|
ld-2.27.so
|
|
|
|
& 129.6 KiB & 9.6 KiB & 28.6 KiB & 22.09 & 2.97 \\
|
|
|
|
hackbench
|
|
|
|
& 2.9 KiB & 568.0 B & 2.8 KiB & 93.87 & 4.99 \\
|
|
|
|
Total
|
|
|
|
& 1.6 MiB & 151.8 KiB & 370.0 KiB & 22.81 & 2.44 \\
|
|
|
|
\bottomrule
|
|
|
|
\end{tabular}
|
|
|
|
|
|
|
|
\caption{\ehelfs{} space usage}\label{table:bench_space}
|
|
|
|
\end{table}
|
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-08-07 11:33:30 +02:00
|
|
|
\subsection{Instructions coverage}\label{ssec:instr_cov}
|
2018-08-07 11:09:33 +02:00
|
|
|
|
2018-08-18 22:06:55 +02:00
|
|
|
In order to determine which DWARF instructions are necessary to implement to
|
|
|
|
have meaningful results, as well as to assess the instruction coverage of our
|
|
|
|
compiler and \ehelfs, we must look at real-world ELF files and inspect the
|
|
|
|
instructions used.
|
2018-08-07 11:09:33 +02:00
|
|
|
|
2018-08-07 11:33:30 +02:00
|
|
|
The method chosen was to take a random uniform sample of 4000 ELFs among those
|
|
|
|
present on a basic ArchLinux system setup, in the directories \texttt{/bin},
|
|
|
|
\texttt{/lib}, \texttt{/usr/bin}, \texttt{/usr/lib} and their subdirectories,
|
|
|
|
making sure those files were ELF64 files, then gathering statistics on those
|
|
|
|
files.
|
2018-08-07 11:09:33 +02:00
|
|
|
|
|
|
|
\begin{table}[h]
|
|
|
|
\centering
|
|
|
|
\begin{tabular}{r r r r r r r}
|
|
|
|
\toprule
|
|
|
|
\thead{} & \thead{Unsupported \\ register rule}
|
|
|
|
& \thead{Register \\ rules seen}
|
|
|
|
& \thead{\% \\ supp.}
|
|
|
|
& \thead{Unsupported \\ expression}
|
|
|
|
& \thead{Expressions \\ seen}
|
|
|
|
& \thead{\% \\ supp.}
|
|
|
|
\\
|
|
|
|
\midrule
|
|
|
|
\makecell{Only supp. \\ columns} &
|
|
|
|
1603 & 42959683 & 99.996\,\% &
|
|
|
|
1114 & 5977 & 81.4\,\%
|
|
|
|
\\
|
|
|
|
All columns &
|
|
|
|
1607 & 67587841 & 99.998\,\% &
|
|
|
|
1154 & 13869 & 91.7\,\%
|
|
|
|
\\
|
|
|
|
\bottomrule
|
|
|
|
\end{tabular}
|
|
|
|
|
|
|
|
\caption{Instructions coverage statistics}\label{table:instr_cov}
|
|
|
|
\end{table}
|
|
|
|
|
|
|
|
\begin{table}[h]
|
|
|
|
\centering
|
|
|
|
\begin{tabular}{r r r r r r}
|
|
|
|
\toprule
|
|
|
|
\thead{}
|
|
|
|
& \thead{\texttt{Undefined}}
|
|
|
|
& \thead{\texttt{Same\_value}}
|
|
|
|
& \thead{\texttt{Offset}}
|
|
|
|
& \thead{\texttt{Val\_offset}}
|
|
|
|
& \thead{\texttt{Register}}
|
|
|
|
\\
|
|
|
|
\midrule
|
|
|
|
\makecell{Only supp. \\ columns}
|
|
|
|
& 1698 (0.006\,\%)
|
|
|
|
& 0
|
|
|
|
& 30038255 (99.9\,\%)
|
|
|
|
& 0
|
|
|
|
& 14 (0\,\%)
|
|
|
|
\\
|
|
|
|
All columns
|
|
|
|
& 1698 (0.003\,\%)
|
|
|
|
& 0
|
|
|
|
& 54666405 (99.9\,\%)
|
|
|
|
& 0
|
|
|
|
& 22 (0\,\%)
|
|
|
|
\\
|
|
|
|
\bottomrule
|
|
|
|
\toprule
|
|
|
|
\thead{}
|
|
|
|
& \thead{\texttt{Expression}}
|
|
|
|
& \thead{\texttt{Val\_expression}}
|
|
|
|
& \thead{\texttt{Architectural}}
|
|
|
|
& & \thead{Total}
|
|
|
|
\\
|
|
|
|
\midrule
|
|
|
|
\makecell{Only supp. \\ columns}
|
|
|
|
& 4475 (0.015\,\%)
|
|
|
|
& 0
|
|
|
|
& 0
|
|
|
|
& & 30044442
|
|
|
|
\\
|
|
|
|
All columns
|
|
|
|
& 12367 (0.02\,\%)
|
|
|
|
& 0
|
|
|
|
& 0
|
|
|
|
& & 54680492
|
|
|
|
\\
|
|
|
|
|
|
|
|
\bottomrule
|
|
|
|
\end{tabular}
|
|
|
|
|
|
|
|
\caption{Instruction type statistics}\label{table:instr_types}
|
|
|
|
\end{table}
|
|
|
|
|
|
|
|
The Table~\ref{table:instr_cov} gives statistics about the proportion of
|
|
|
|
instructions encountered that were not supported by \ehelfs. The first row is
|
|
|
|
only concerned about the columns CFA, \reg{rip}, \reg{rsp}, \reg{rbp} and
|
|
|
|
\reg{rbx} (the supported registers --~see Section~\ref{ssec:ehelfs}). The
|
|
|
|
second row analyzes all the columns that were encountered, no matter whether
|
2018-08-18 22:06:55 +02:00
|
|
|
supported or not in \ehelfs.
|
2018-08-07 11:09:33 +02:00
|
|
|
|
2018-08-16 00:26:59 +02:00
|
|
|
The Table~\ref{table:instr_types} analyzes the proportion of each command
|
|
|
|
--~the formal way a register is set~-- for non-CFA columns in the sampled data. For
|
2018-08-08 14:29:04 +02:00
|
|
|
a brief explanation, \texttt{Offset} means stored at offset from CFA,
|
2018-08-07 11:09:33 +02:00
|
|
|
\texttt{Register} means the value from a machine register, \texttt{Expression}
|
2018-08-08 14:29:04 +02:00
|
|
|
means stored at the address of an expression's result, and the \texttt{Val\_}
|
|
|
|
prefix means that the value must not be dereferenced. Overall, it can be seen
|
|
|
|
that supporting \texttt{Offset} already means supporting the vast majority of
|
2018-08-07 11:09:33 +02:00
|
|
|
registers. The data gathered (not reproduced here) also suggests that
|
2018-08-18 22:06:55 +02:00
|
|
|
supporting a few common expressions is enough to support most of them. This is
|
|
|
|
further supported by the fact that we already support more than $80\,\%$ of
|
|
|
|
expressions only by supporting two basic constructs.
|
2018-08-07 11:09:33 +02:00
|
|
|
|
2018-08-18 22:06:55 +02:00
|
|
|
It is also worth noting that among all of the 4000 analyzed files, all the
|
|
|
|
unsupported expressions are clustered in only 12 of them, and only 24 contained
|
|
|
|
unsupported instructions at all.
|
2018-08-07 11:09:33 +02:00
|
|
|
|
2018-08-01 18:43:42 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
%%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
|
|
|
%% Bibliography %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-07-31 12:27:12 +02:00
|
|
|
\printbibliography{}
|
2018-08-01 18:43:42 +02:00
|
|
|
|
2018-08-04 14:59:09 +02:00
|
|
|
%% License notice %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
2018-08-04 20:58:11 +02:00
|
|
|
\vfill
|
2018-08-04 14:59:09 +02:00
|
|
|
\hfill \begin{minipage}{0.7\textwidth}
|
|
|
|
\begin{flushright}
|
|
|
|
\itshape{} \small{}
|
2018-08-17 20:38:20 +02:00
|
|
|
Unless otherwise explicitly stated, any image from the present document
|
|
|
|
is distributed under Creative Commons BY-SA license, and any code
|
|
|
|
snippet is distributed under 3-clause BSD license.
|
2018-08-04 14:59:09 +02:00
|
|
|
\end{flushright}
|
|
|
|
\end{minipage}
|
|
|
|
|
2018-07-31 12:27:12 +02:00
|
|
|
\end{document}
|
2018-08-01 18:43:42 +02:00
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|