Live changes during meeting
This commit is contained in:
parent
2f44049506
commit
73f016f44c
2 changed files with 82 additions and 78 deletions
|
@ -8,17 +8,17 @@
|
||||||
|
|
||||||
\subsection*{The general context}
|
\subsection*{The general context}
|
||||||
|
|
||||||
The standard debugging data format for ELF binary files, DWARF, contains tables
|
The standard debugging data format, DWARF, contains tables that, for a given
|
||||||
that permit, for a given instruction pointer (IP), to understand how the
|
instruction pointer (IP), permit to understand how the assembly instruction
|
||||||
assembly instruction relates to the source code, where variables are currently
|
relates to the source code, where variables are currently allocated in memory
|
||||||
allocated in memory or if they are stored in a register, what are their type
|
or if they are stored in a register, what are their type and how to unwind the
|
||||||
and how to unwind the current stack frame. This inforation is generated when
|
current stack frame. This information is generated when passing \eg{} the
|
||||||
passing \eg{} the switch \lstbash{-g} to \prog{gcc} or equivalents.
|
switch \lstbash{-g} to \prog{gcc} or equivalents.
|
||||||
|
|
||||||
Even in stripped (non-debug) binaries, a small portion of DWARF data remains:
|
Even in stripped (non-debug) binaries, a small portion of DWARF data remains:
|
||||||
the stack unwinding data. This information is necessary to unwind stack
|
the stack unwinding data. This information is necessary to unwind stack
|
||||||
frames, restoring machine registers to the value they had in the previous
|
frames, restoring machine registers to the value they had in the previous
|
||||||
frame, for instance within the context of a debugger or a profiler.
|
frame.
|
||||||
|
|
||||||
This data is structured into tables, each row corresponding to an IP range for
|
This data is structured into tables, each row corresponding to an IP range for
|
||||||
which it describes valid unwinding data, and each column describing how to
|
which it describes valid unwinding data, and each column describing how to
|
||||||
|
@ -34,28 +34,29 @@ computation~\cite{oakley2011exploiting}.
|
||||||
|
|
||||||
As debugging data can easily take an unreasonable space and grow larger than
|
As debugging data can easily take an unreasonable space and grow larger than
|
||||||
the program itself if stored carelessly, the DWARF standard pays a great
|
the program itself if stored carelessly, the DWARF standard pays a great
|
||||||
attention to data compactness and compression, and succeeds particularly well
|
attention to data compactness and compression. It succeeds particularly well
|
||||||
at it. But this, as always, is at the expense of efficiency: accessing stack
|
at it, but at the expense of efficiency: accessing stack
|
||||||
unwinding data for a particular program point is not a light operation --~in
|
unwinding data for a particular program point is an expensive operation --~the
|
||||||
the order of magnitude of $10\,\mu{}\text{s}$ on a modern computer.
|
order of magnitude is $10\,\mu{}\text{s}$ on a modern computer.
|
||||||
|
|
||||||
This is often not a huge problem, as stack unwinding is often thought of as a
|
This is often not a problem, as stack unwinding is often thought of as a
|
||||||
debugging procedure: when something behaves unexpectedly, the programmer might
|
debugging procedure: when something behaves unexpectedly, the programmer might
|
||||||
be interested in opening their debugger and exploring the stack. Yet, stack
|
be interested in opening their debugger and exploring the stack. Yet, stack
|
||||||
unwinding might, in some cases, be performance-critical: for instance, profiler
|
unwinding might, in some cases, be performance-critical: for instance, polling
|
||||||
programs needs to perform a whole lot of stack unwindings. Even worse,
|
profilers repeatedly perform stack unwindings to observe which functions are
|
||||||
exception handling relies on stack unwinding in order to find a suitable
|
active. Even worse, C++ exception handling relies on stack unwinding in order
|
||||||
catch-block! For such applications, it might be desirable to find a different
|
to find a suitable catch-block! For such applications, it might be desirable to
|
||||||
time/space trade-off, storing a bit more for a faster unwinding.
|
find a different time/space trade-off, storing a bit more for a faster
|
||||||
|
unwinding.
|
||||||
|
|
||||||
This different trade-off is the question that I explored during this
|
This different trade-off is the question that I explored during this
|
||||||
internship: what good alternative trade-off is reachable when storing the stack
|
internship: what good alternative trade-off is reachable when storing the stack
|
||||||
unwinding data completely differently?
|
unwinding data completely differently?
|
||||||
|
|
||||||
It seems that the subject has not really been explored yet, and as of now, the
|
It seems that the subject has not been explored yet, and as of now, the most
|
||||||
most widely used library for stack unwinding,
|
widely used library for stack unwinding, \prog{libunwind}~\cite{libunwind},
|
||||||
\prog{libunwind}~\cite{libunwind}, essentially makes use of aggressive but
|
essentially makes use of aggressive but fine-tuned caching and optimized code
|
||||||
fine-tuned caching and optimized code to mitigate this problem.
|
to mitigate this problem.
|
||||||
|
|
||||||
% What is the question that you studied?
|
% What is the question that you studied?
|
||||||
% Why is it important, what are the applications/consequences?
|
% Why is it important, what are the applications/consequences?
|
||||||
|
@ -73,27 +74,25 @@ of compiled DWARF into existing projects have been made easy by implementing an
|
||||||
alternative version of the \textit{de facto} standard library for this purpose,
|
alternative version of the \textit{de facto} standard library for this purpose,
|
||||||
\prog{libunwind}.
|
\prog{libunwind}.
|
||||||
|
|
||||||
Multiple approaches have been tried, in order to determine which compilation
|
Multiple approaches have been tried and evaluated to determine which
|
||||||
process leads to the best time/space trade-off.
|
compilation process leads to the best time/space trade-off.
|
||||||
|
|
||||||
Unexpectedly, the part that proved hardest of the project was finding and
|
Unexpectedly, the part that proved hardest of the project was finding and
|
||||||
implementing a benchmarking protocol that was both relevant and reliable.
|
implementing a benchmarking protocol that was both relevant and reliable.
|
||||||
Unwinding one single frame is way too fast to provide a reliable benchmarking
|
Unwinding one single frame is too fast to provide a reliable benchmarking on a
|
||||||
on a few samples (around $10\,\mu s$ per frame). Having enough samples for this
|
few samples (around $10\,\mu s$ per frame) to avoid statistical errors. Having
|
||||||
purpose --~at least a few thousands~-- is not easy, since one must avoid
|
enough samples for this purpose --~at least a few thousands~-- is not easy,
|
||||||
unwinding the same frame over and over again, which would only benchmark the
|
since one must avoid unwinding the same frame over and over again, which would
|
||||||
caching mechanism. The other problem is to distribute evenly the unwinding
|
only benchmark the caching mechanism. The other problem is to distribute
|
||||||
measures across the various IPs, including directly into the loaded libraries
|
evenly the unwinding measures across the various IPs, including directly into
|
||||||
(\eg{} the \prog{libc}).
|
the loaded libraries (\eg{} the \prog{libc}).
|
||||||
|
|
||||||
The solution eventually chosen was to modify \prog{perf}, the standard
|
The solution eventually chosen was to modify \prog{perf}, the standard
|
||||||
profiling program for Linux, in order to gather statistics and benchmarks of
|
profiling program for Linux, in order to gather statistics and benchmarks of
|
||||||
its unwindings. Modifying \prog{perf} was an additional challenge that turned
|
its unwindings. Modifying \prog{perf} was an additional challenge that turned
|
||||||
out to be harder than expected, since the source code is pretty opaque to
|
out to be harder than expected, since the source code is hard to read, and
|
||||||
someone who doesn't know the project well, and the optimisations make some
|
optimisations make some parts counter-intuitive. To overcome this, we designed
|
||||||
parts counter-intuitive. This, in particular, required to produce an
|
an alternative version of \prog{libunwind} interfaced with the
|
||||||
alternative version of \prog{libunwind} interfaced with the compiled debugging
|
compiled debugging data.
|
||||||
data.
|
|
||||||
|
|
||||||
% What is your solution to the question described in the last paragraph?
|
% What is your solution to the question described in the last paragraph?
|
||||||
%
|
%
|
||||||
|
@ -108,12 +107,19 @@ data.
|
||||||
%
|
%
|
||||||
% Comment the robustness of your solution: how does it rely/depend on the working assumptions?
|
% Comment the robustness of your solution: how does it rely/depend on the working assumptions?
|
||||||
|
|
||||||
The goal was to obtain a compiled version of unwinding data that was faster
|
The goal of this project was to design a compiled version of unwinding data
|
||||||
than DWARF, reasonably heavier and reliable. The benchmarks mentioned have
|
that is faster than DWARF, while still being reliable and reasonably compact.
|
||||||
yielded convincing results: on the experimental setup created (detailed on
|
The benchmarks mentioned have yielded convincing results: on the experimental
|
||||||
Section~\ref{sec:benchmarking} below), the compiled version is around 26 times
|
setup created --~detailed on Section~\ref{sec:benchmarking} below~\textendash,
|
||||||
faster than the DWARF version, while it remains only around 2.5 times bigger
|
the compiled version is around 26 times faster than the DWARF version, while it
|
||||||
than the original data.
|
remains only around 2.5 times bigger than the original data.
|
||||||
|
|
||||||
|
We support the vast majority --~more than $99.9\,\%$~-- of the instructions
|
||||||
|
actually used in binaries, although we do not support all of DWARF5 instruction
|
||||||
|
set. We are almost as robust as libunwind: on a $27000$ samples test, 885
|
||||||
|
failures were observed for \prog{libunwind}, against $1099$ for the compiled
|
||||||
|
DWARF version (failures are due to signal handlers, unusual instructions,
|
||||||
|
\ldots) --~see Section~\ref{ssec:timeperf}.
|
||||||
|
|
||||||
The implementation is not yet release-ready, as it does not support 100\ \% of
|
The implementation is not yet release-ready, as it does not support 100\ \% of
|
||||||
the DWARF5 specification~\cite{dwarf5std} --~see Section~\ref{ssec:ehelfs}
|
the DWARF5 specification~\cite{dwarf5std} --~see Section~\ref{ssec:ehelfs}
|
||||||
|
@ -123,13 +129,13 @@ the reference implementation. Indeed, corner cases occur often, and on a 27000
|
||||||
samples test, 885 failures were observed for \prog{libunwind}, against 1099 for
|
samples test, 885 failures were observed for \prog{libunwind}, against 1099 for
|
||||||
the compiled DWARF version (see Section~\ref{ssec:timeperf}).
|
the compiled DWARF version (see Section~\ref{ssec:timeperf}).
|
||||||
|
|
||||||
The implementation, however, as a few other limitations. It only supports the
|
The implementation, however, is not production-ready: it only supports the
|
||||||
x86\_64 architecture, and relies to some extent on the Linux operating system.
|
x86\_64 architecture, and relies to some extent on the Linux operating system.
|
||||||
But none of those are real problems in practice. Other processor architectures
|
None of those are real problems in practice. Supporting other processor
|
||||||
and ABIs are only a matter of time spent and engineering work; and the
|
architectures and ABIs are only a matter of engineering,. The operating system
|
||||||
operating system dependency is only present in the libraries developed in order
|
dependency is only present in the libraries developed in order to interact with
|
||||||
to interact with the compiled unwinding data, which can be developed for
|
the compiled unwinding data, which can be developed for virtually any operating
|
||||||
virtually any operating system.
|
system.
|
||||||
|
|
||||||
\subsection*{Summary and future work}
|
\subsection*{Summary and future work}
|
||||||
|
|
||||||
|
@ -137,14 +143,13 @@ In most cases of everyday's life, a slow stack unwinding is not a problem, left
|
||||||
apart an annoyance. Yet, having a 26 times speed-up on stack unwinding-heavy
|
apart an annoyance. Yet, having a 26 times speed-up on stack unwinding-heavy
|
||||||
tasks can be really useful to \eg{} profile large programs, particularly if one
|
tasks can be really useful to \eg{} profile large programs, particularly if one
|
||||||
wants to profile many times in order to analyze the impact of multiple changes.
|
wants to profile many times in order to analyze the impact of multiple changes.
|
||||||
It can also be useful for exception-heavy programs. Thus, it might be
|
It can also be useful for exception-heavy programs. Thus, we plan to address
|
||||||
interesting to implement a more stable version, and try to interface it cleanly
|
the limitations and integrate it cleanly with mainstream tools, such as
|
||||||
with mainstream tools, such as \prog{perf}.
|
\prog{perf}.
|
||||||
|
|
||||||
Another question worth exploring might be whether it is possible to shrink even
|
Another research direction is to investigate how to compress even more the
|
||||||
more the original DWARF unwinding data, which would be stored in a format not
|
original DWARF unwinding data using outlining techniques, as we already do for
|
||||||
too far from the original standard, by applying techniques close to those
|
the compiled data successfully.
|
||||||
used to shrink the compiled unwinding data.
|
|
||||||
|
|
||||||
% What is next? In which respect is your approach general?
|
% What is next? In which respect is your approach general?
|
||||||
% What did your contribution bring to the area?
|
% What did your contribution bring to the area?
|
||||||
|
|
|
@ -1,10 +1,11 @@
|
||||||
\title{DWARF debugging data, compilation and optimization}
|
\title{DWARF debugging data, compilation and optimization}
|
||||||
|
|
||||||
\author{Théophile Bastian\\
|
\author{Théophile Bastian\\
|
||||||
Under supervision of Francesco Zappa Nardelli\\
|
Under supervision of Francesco Zappa Nardelli, March -- August 2018\\
|
||||||
{\textsc{parkas}, \'Ecole Normale Supérieure de Paris}}
|
{\textsc{parkas}, \'Ecole Normale Supérieure de Paris}}
|
||||||
|
|
||||||
\date{March -- August 2018\\August 20, 2018}
|
%\date{March -- August 2018\\August 20, 2018}
|
||||||
|
\date{\vspace{-2em}}
|
||||||
|
|
||||||
\documentclass[11pt]{article}
|
\documentclass[11pt]{article}
|
||||||
|
|
||||||
|
@ -54,8 +55,8 @@ Under supervision of Francesco Zappa Nardelli\\
|
||||||
|
|
||||||
\subsection*{Source code}\label{ssec:source_code}
|
\subsection*{Source code}\label{ssec:source_code}
|
||||||
|
|
||||||
All the source code produced during this internship is available openly. See
|
Our implementation is available from \url{https://git.tobast.fr/m2-internship}.
|
||||||
Section~\ref{ssec:code_avail} for details.
|
See the \texttt{abstract} repository for an introductive \texttt{README}.
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
|
@ -102,25 +103,24 @@ copying \reg{rsp} to \reg{rbp}. This makes it easy to find the return address
|
||||||
from anywhere within the function, and also allows for easy addressing of local
|
from anywhere within the function, and also allows for easy addressing of local
|
||||||
variables. To some extents, it also allows for hot debugging, such as saving a
|
variables. To some extents, it also allows for hot debugging, such as saving a
|
||||||
useful core dump upon segfault. Yet, using \reg{rbp} to save \reg{rip} is not
|
useful core dump upon segfault. Yet, using \reg{rbp} to save \reg{rip} is not
|
||||||
always done, since it somehow ``wastes'' a register. This decision is, on
|
always done, since it wastes a register. This decision is, on x86\_64 System V,
|
||||||
x86\_64 System V, up to the compiler.
|
up to the compiler.
|
||||||
|
|
||||||
Often, a function will start by subtracting some value to \reg{rsp}, allocating
|
Usually, a function starts by subtracting some value to \reg{rsp}, allocating
|
||||||
some space in the stack frame for its local variables. Then, it will push on
|
some space in the stack frame for its local variables. Then, it pushes on
|
||||||
the stack the values of the callee-saved registers that are overwritten later,
|
the stack the values of the callee-saved registers that are overwritten later,
|
||||||
effectively saving them. Before returning, it will pop the values of the saved
|
effectively saving them. Before returning, it pops the values of the saved
|
||||||
registers back to their original registers and restore \reg{rsp} to its former
|
registers back to their original registers and restore \reg{rsp} to its former
|
||||||
value.
|
value.
|
||||||
|
|
||||||
\subsection{Stack unwinding}\label{ssec:stack_unwinding}
|
\subsection{Stack unwinding}\label{ssec:stack_unwinding}
|
||||||
|
|
||||||
For various reasons, it might be interesting, at some point of the execution of
|
For various reasons, it is interesting, at some point of the execution of a
|
||||||
a program, to glance at its program stack and be able to extract informations
|
program, to glance at its program stack and be able to extract informations
|
||||||
from it. For instance, when running a debugger such as \prog{gdb}, a frequent
|
from it. For instance, when running a debugger, a frequent usage is to obtain a
|
||||||
usage is to obtain a \emph{backtrace}, that is, the list of all nested function
|
\emph{backtrace}, that is, the list of all nested function calls at the current
|
||||||
calls at the current IP\@. This actually reads the stack to find the different
|
IP\@. This actually observes the stack to find the different stack frames, and
|
||||||
stack frames, and decode them to identify the function names, parameter values,
|
decode them to identify the function names, parameter values, etc.
|
||||||
etc.
|
|
||||||
|
|
||||||
This operation is far from trivial. Often, a stack frame will only make sense
|
This operation is far from trivial. Often, a stack frame will only make sense
|
||||||
when the correct values are stored in the machine registers. These values,
|
when the correct values are stored in the machine registers. These values,
|
||||||
|
@ -184,7 +184,7 @@ no time directly in \lstc{fct_a}, but spend a lot of time in calls to the other
|
||||||
two functions that were made from \lstc{fct_a}. Knowing that after all,
|
two functions that were made from \lstc{fct_a}. Knowing that after all,
|
||||||
\lstc{fct_a} is the culprit can be useful to a programmer.
|
\lstc{fct_a} is the culprit can be useful to a programmer.
|
||||||
|
|
||||||
Exception handling also requires a stack unwinding mechanism in most languages.
|
Exception handling also requires a stack unwinding mechanism in some languages.
|
||||||
Indeed, an exception is completely different from a \lstinline{return}: while
|
Indeed, an exception is completely different from a \lstinline{return}: while
|
||||||
the latter returns to the previous function, at a well-defined IP, the former
|
the latter returns to the previous function, at a well-defined IP, the former
|
||||||
can be caught by virtually any function in the call path, at any point of the
|
can be caught by virtually any function in the call path, at any point of the
|
||||||
|
@ -313,7 +313,7 @@ between them.
|
||||||
\\
|
\\
|
||||||
\hline
|
\hline
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\caption{Stack frame schema}\label{table:ex1_stack_schema}
|
\caption{Stack frame schema for fib7 (horizontal layout)}\label{table:ex1_stack_schema}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
For instance, the C source code in Listing~\ref{lst:ex1_c}, when compiled
|
For instance, the C source code in Listing~\ref{lst:ex1_c}, when compiled
|
||||||
|
@ -492,8 +492,8 @@ Its grammar is as follows:
|
||||||
\end{align*}
|
\end{align*}
|
||||||
|
|
||||||
The entry point of the grammar is a $\FDE$, which is a set of rows, each
|
The entry point of the grammar is a $\FDE$, which is a set of rows, each
|
||||||
annotated with a machine address, the address from which it is valid. Note that
|
annotated with a machine address, the address from which it is valid.
|
||||||
the addresses are necessarily increasing within a FDE\@.
|
The addresses are necessarily increasing within a FDE\@.
|
||||||
|
|
||||||
Each row then represents, as a function mapping registers to values, a row of
|
Each row then represents, as a function mapping registers to values, a row of
|
||||||
the unwinding table.
|
the unwinding table.
|
||||||
|
@ -672,9 +672,8 @@ and $\semR{\bullet}$ is defined as
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\section{Stack unwinding data compilation}
|
\section{Stack unwinding data compilation}
|
||||||
|
|
||||||
The tentative approach that was chosen to try to get better unwinding speeds at
|
In this section, we will study all the design options we explored for the
|
||||||
a reasonable space loss was to compile directly the \ehframe{} into native
|
actual C implementation.
|
||||||
machine code on the x86\_64 platform.
|
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\subsection{Code availability}\label{ssec:code_avail}
|
\subsection{Code availability}\label{ssec:code_avail}
|
||||||
|
|
Loading…
Reference in a new issue