153 lines
7.8 KiB
TeX
153 lines
7.8 KiB
TeX
\pagestyle{empty} %
|
|
\thispagestyle{empty}
|
|
|
|
%% Attention: pas plus d'un recto-verso!
|
|
% Ne conservez pas les questions
|
|
|
|
\section*{Internship synthesis}
|
|
|
|
\subsection*{The general context}
|
|
|
|
The standard debugging data format for ELF binary files, DWARF, contains a lot
|
|
of information, which is generated mostly when passing \eg{} the switch
|
|
\lstbash{-g} to \prog{gcc}. This information, essentially provided for
|
|
debuggers, contains all that is needed to connect the generated assembly with
|
|
the original code, information that can be used by sanitizers (\eg{} the type
|
|
of each variable in the source language), etc.
|
|
|
|
Even in stripped (non-debug) binaries, a small portion of DWARF data remains.
|
|
Among this essential data that is never stripped is the stack unwinding data,
|
|
which allows to unwind stack frames, restoring machine registers to the value
|
|
they had in the previous frame, for instance within the context of a debugger
|
|
or a profiler.
|
|
|
|
This data is structured into tables, each row corresponding to an program
|
|
counter (PC) range for which it describes valid unwinding data, and each column
|
|
describing how to unwind a particular machine register (or virtual register
|
|
used for various purposes). These rules are mostly basic, consisting in offsets
|
|
from memory addresses stored in registers (such as \reg{rbp} or \reg{rsp}), but
|
|
in some cases, they can take the form of a stack-machine expression that can
|
|
access virtually all the process's memory and perform Turing-complete
|
|
computation~\cite{oakley2011exploiting}.
|
|
|
|
\subsection*{The research problem}
|
|
|
|
As debugging data can easily get heavy beyond reasonable if stored carelessly,
|
|
the DWARF standard pays a great attention to data compactness and compression,
|
|
and succeeds particularly well at it. But this, as always, is at the expense
|
|
of efficiency: accessing stack unwinding data for a particular program point
|
|
can be quite costly.
|
|
|
|
This is often not a huge problem, as stack unwinding is mostly thought of as a
|
|
debugging procedure: when something behaves unexpectedly, the programmer might
|
|
be interested in exploring the stack. Yet, stack unwinding might, in some
|
|
cases, be performance-critical: for instance, profiler programs needs to
|
|
perform a whole lot of stack unwindings. Even worse, exception handling relies
|
|
on stack unwinding in order to find a suitable catch-block! For such
|
|
applications, it might be desirable to find a different time/space trade-off,
|
|
allowing a slightly space-heavier, but far more time-efficient unwinding
|
|
procedure.
|
|
|
|
This different trade-off is the question that I explored during this
|
|
internship: what good alternative trade-off is reachable when storing the stack
|
|
unwinding data completely differently?
|
|
|
|
It seems that the subject has not really been explored yet, and as of now, the
|
|
most widely used library for stack unwinding,
|
|
\prog{libunwind}~\cite{libunwind}, essentially makes use of aggressive but
|
|
fine-tuned caching and optimized code to mitigate this problem.
|
|
|
|
% What is the question that you studied?
|
|
% Why is it important, what are the applications/consequences?
|
|
% Is it a new problem?
|
|
% If so, why are you the first researcher in the universe who consider it?
|
|
% If not, why did you think that you could bring an original contribution?
|
|
|
|
\subsection*{Your contribution}
|
|
|
|
This internship explored the possibility to compile DWARF's stack unwinding
|
|
data directly into native assembly on the x86\_64 architecture. Instead of
|
|
parsing and interpreting at runtime the debug data, the stack unwinding data is
|
|
accessed as a function of a dynamically-loaded shared library.
|
|
|
|
Multiple approaches have been tried, in order to determine which compilation
|
|
process leads to the best time/space trade-off.
|
|
|
|
Quite unexpectedly, the part that proved hardest of the project was finding a
|
|
benchmarking protocol that was both relevant and reliable. Unwinding one single
|
|
frame is way too fast to be benched on a few samples (around $10\,\mu s$ per
|
|
frame), and having a lot of samples is quite complex, since one must avoid
|
|
unwinding the same frame over and over again, which would only benchmark the
|
|
caching mechanism. The other problem is to distribute evenly the unwinding
|
|
measures across the various program positions, including directly into the
|
|
loaded libraries (\eg{} the \prog{libc}).
|
|
|
|
The solution eventually chosen was to modify \prog{perf}, the standard
|
|
profiling program for Linux, in order to gather statistics and benchmarks of
|
|
its unwindings, and produce an alternative version of \prog{libunwind} using
|
|
the compiled debugging data, in order to interface it with \prog{perf},
|
|
allowing to benchmark \prog{perf} with both the standard stack unwinding data
|
|
and the alternative experimental compiled format. As a free and enjoyable
|
|
side-effect, the experimental unwinding data is perfectly interfaced with
|
|
\prog{libunwind}, and thus interfaceable at practically no cost with any
|
|
existing project using the \textit{de facto} standard library \prog{libunwind}.
|
|
|
|
% What is your solution to the question described in the last paragraph?
|
|
%
|
|
% Be careful, do \emph{not} give technical details, only rough ideas!
|
|
%
|
|
% Pay a special attention to the description of the \emph{scientific} approach.
|
|
|
|
\subsection*{Arguments supporting its validity}
|
|
|
|
% What is the evidence that your solution is a good solution?
|
|
% Experiments? Proofs?
|
|
%
|
|
% Comment the robustness of your solution: how does it rely/depend on the working assumptions?
|
|
|
|
The goal was to obtain a compiled version of unwinding data that was faster
|
|
than DWARF, reasonably heavier and reliable. The benchmarks mentioned have
|
|
yielded convincing results: on the experimental setup created (detailed later
|
|
in this report), the compiled version is up to 25 times faster than the DWARF
|
|
version, while it remains only around 2.5 times bigger than the original data.
|
|
|
|
Even though the implementation is more a research prototype than a release
|
|
version, is still reasonably robust, compared to \prog{libunwind}, which is
|
|
built for robustness. Corner cases are frequent while analyzing stack data, and
|
|
even more when analyzing them through a profiler; yet the prototype fails only
|
|
on around 200 cases more than \prog{libunwind} on a 27000 samples test (1099
|
|
failures, against 885 for \prog{libunwind}).
|
|
|
|
The prototype, unlike \prog{libunwind}, does not support $100\,\%$ of the DWARF
|
|
instructions present in the DWARF5 standard~\cite{dwarf5std}. It is also
|
|
limited to the x86\_64 architecture, and relies to some extent on the Linux
|
|
operating system. But none of those limitations are real problems in practice.
|
|
As argued later on, the vast majority of the DWARF instruction set actually
|
|
used in the wild is implemented; other processor architectures and ABIs are
|
|
only a matter of time spent and engineering work; and the operating system
|
|
dependency is only present in the libraries developed in order to interact with
|
|
the compiled unwinding data, which can be developed for virtually any operating
|
|
system.
|
|
|
|
\subsection*{Summary and future work}
|
|
|
|
In most cases of everyday's life, a slow stack unwinding is not a problem, or
|
|
even an annoyance. Yet, having a 25 times speed-up on stack unwinding-heavy
|
|
tasks, such as profiling, can be really useful to profile heavy programs,
|
|
particularly if one wants to profile many times in order to analyze the impact
|
|
of multiple changes. It can also be useful for exception-heavy programs. Thus,
|
|
it might be interesting to implement a more stable version, and try to
|
|
interface it cleanly with mainstream tools, such as \prog{perf}.
|
|
|
|
Another question worth exploring might be whether it is possible to shrink even
|
|
more the original DWARF unwinding data, which would be stored in a format not
|
|
too far from the original standard, by applying techniques close to those
|
|
used to shrink the compiled unwinding data.
|
|
|
|
% What is next? In which respect is your approach general?
|
|
% What did your contribution bring to the area?
|
|
% What should be done now?
|
|
% What is the good \emph{next} question?
|
|
|
|
\pagestyle{plain}
|
|
\newpage
|