Few fixes
This commit is contained in:
parent
57a6cb9e8b
commit
b761f360cc
2 changed files with 44 additions and 40 deletions
|
@ -40,13 +40,13 @@ can be quite costly.
|
|||
|
||||
This is often not a huge problem, as stack unwinding is mostly thought of as a
|
||||
debugging procedure: when something behaves unexpectedly, the programmer might
|
||||
be interested in exploring the stack. Yet, stack unwinding might, in some
|
||||
cases, be performance-critical: for instance, profiler programs needs to
|
||||
perform a whole lot of stack unwindings. Even worse, exception handling relies
|
||||
on stack unwinding in order to find a suitable catch-block! For such
|
||||
applications, it might be desirable to find a different time/space trade-off,
|
||||
allowing a slightly space-heavier, but far more time-efficient unwinding
|
||||
procedure.
|
||||
be interested in opening their debugger and exploring the stack. Yet, stack
|
||||
unwinding might, in some cases, be performance-critical: for instance, profiler
|
||||
programs needs to perform a whole lot of stack unwindings. Even worse,
|
||||
exception handling relies on stack unwinding in order to find a suitable
|
||||
catch-block! For such applications, it might be desirable to find a different
|
||||
time/space trade-off, allowing a slightly space-heavier, but far more
|
||||
time-efficient unwinding procedure.
|
||||
|
||||
This different trade-off is the question that I explored during this
|
||||
internship: what good alternative trade-off is reachable when storing the stack
|
||||
|
@ -108,7 +108,7 @@ existing project using the \textit{de facto} standard library \prog{libunwind}.
|
|||
The goal was to obtain a compiled version of unwinding data that was faster
|
||||
than DWARF, reasonably heavier and reliable. The benchmarks mentioned have
|
||||
yielded convincing results: on the experimental setup created (detailed later
|
||||
in this report), the compiled version is up to 25 times faster than the DWARF
|
||||
in this report), the compiled version is around 26 times faster than the DWARF
|
||||
version, while it remains only around 2.5 times bigger than the original data.
|
||||
|
||||
Even though the implementation is more a research prototype than a release
|
||||
|
@ -132,7 +132,7 @@ system.
|
|||
\subsection*{Summary and future work}
|
||||
|
||||
In most cases of everyday's life, a slow stack unwinding is not a problem, or
|
||||
even an annoyance. Yet, having a 25 times speed-up on stack unwinding-heavy
|
||||
even an annoyance. Yet, having a 26 times speed-up on stack unwinding-heavy
|
||||
tasks, such as profiling, can be really useful to profile heavy programs,
|
||||
particularly if one wants to profile many times in order to analyze the impact
|
||||
of multiple changes. It can also be useful for exception-heavy programs. Thus,
|
||||
|
|
|
@ -61,12 +61,13 @@ Under supervision of Francesco Zappa-Nardelli\\
|
|||
|
||||
On most platforms, programs make use of a \emph{call stack} to store
|
||||
information about the nested function calls at the current execution point, and
|
||||
keep track of their nesting. Each function call has its own \emph{stack frame},
|
||||
an entry of the call stack, whose precise contents are often specified in the
|
||||
Application Binary Interface (ABI) of the platform, and left to various extents
|
||||
up to the compiler. Those frames are typically used for storing function
|
||||
arguments, machine registers that must be restored before returning, the
|
||||
function's return address and local variables.
|
||||
keep track of their nesting. This call stack is conventionally a contiguous
|
||||
memory space mapped close to the top of the addressing space. Each function
|
||||
call has its own \emph{stack frame}, an entry of the call stack, whose precise
|
||||
contents are often specified in the Application Binary Interface (ABI) of the
|
||||
platform, and left to various extents up to the compiler. Those frames are
|
||||
typically used for storing function arguments, machine registers that must be
|
||||
restored before returning, the function's return address and local variables.
|
||||
|
||||
On the x86\_64 platform, with which this report is mostly concerned, the
|
||||
calling convention that is followed is defined in the System V
|
||||
|
@ -94,14 +95,16 @@ compiler might use \reg{rbp} (``base pointer'') to save this value of
|
|||
\reg{rip}, by writing the old value of \reg{rbp} just below the return address
|
||||
on the stack, then copying \reg{rsp} to \reg{rbp}. This makes it easy to find
|
||||
the return address from anywhere within the function, and also allows for easy
|
||||
addressing of local variables.
|
||||
addressing of local variables. Yet, using \reg{rbp} to save \reg{rip} is not
|
||||
always done, since it somehow ``wastes'' a register. This decision is, on
|
||||
x86\_64 System V, up to the compiler.
|
||||
|
||||
Often, a function will start by subtracting some value to \reg{rsp}, allocating
|
||||
some space in the stack frame for its local variables. Then, it will push on
|
||||
the stack the values of the callee-saved registers that are overwritten later,
|
||||
effectively saving them. Before returning, it will pop the values of the saved
|
||||
registers back to their original registers, then restoring \reg{rsp} to its
|
||||
former value.
|
||||
registers back to their original registers and restore \reg{rsp} to its former
|
||||
value.
|
||||
|
||||
\subsection{Stack unwinding}
|
||||
|
||||
|
@ -128,7 +131,7 @@ Let us consider a stack with x86\_64 calling conventions, such as shown in
|
|||
Figure~\ref{fig:call_stack}. Assuming the compiler decided here \emph{not} to
|
||||
use \reg{rbp}, and assuming the function \eg{} allocates a buffer of 8
|
||||
integers, the area allocated for local variables should be at least $32$ bytes
|
||||
long (for 4-bytes integers), and \reg{rip} will be pointing below this area.
|
||||
long (for 4-bytes integers), and \reg{rsp} will be pointing below this area.
|
||||
Left apart analyzing the assembly code produced, there is no way to find where
|
||||
the return address is stored, relatively to \reg{rsp}, at some arbitrary point
|
||||
of the function. Even when \reg{rbp} is used, there is no easy way to guess
|
||||
|
@ -160,18 +163,19 @@ Yet, stack unwinding (and thus debugging data) \emph{is not limited to
|
|||
debugging}.
|
||||
|
||||
Another common usage is profiling. A profiling tool, such as \prog{perf} under
|
||||
Linux, is used to measure and analyze in which functions a program spends its
|
||||
time, identify bottlenecks and find out which parts are critical to optimize.
|
||||
To do so, modern profilers pause the traced program at regular, short
|
||||
intervals, inspect their stack, and determine which function is currently being
|
||||
run. They also often perform a stack unwinding to determine the call path to
|
||||
this function, to determine which function indirectly takes time: \eg, a
|
||||
function \lstc{fct_a} can call both \lstc{fct_b} and \lstc{fct_c}, which are
|
||||
quite heavy; spend practically no time directly in \lstc{fct_a}, but spend a
|
||||
lot of time in calls to the other two functions that were made by \lstc{fct_a}.
|
||||
Linux -- see Section~\ref{ssec:perf} --, is used to measure and analyze in
|
||||
which functions a program spends its time, identify bottlenecks and find out
|
||||
which parts are critical to optimize. To do so, modern profilers pause the
|
||||
traced program at regular, short intervals, inspect their stack, and determine
|
||||
which function is currently being run. They also often perform a stack
|
||||
unwinding to determine the call path to this function, to determine which
|
||||
function indirectly takes time: \eg, a function \lstc{fct_a} can call both
|
||||
\lstc{fct_b} and \lstc{fct_c}, which are quite heavy; spend practically no time
|
||||
directly in \lstc{fct_a}, but spend a lot of time in calls to the other two
|
||||
functions that were made from \lstc{fct_a}.
|
||||
|
||||
Exception handling also requires a stack unwinding mechanism in most languages.
|
||||
Indeed, an exception is completely different from a \lstc{return}: while the
|
||||
Indeed, an exception is completely different from a \lstinline{return}: while the
|
||||
latter returns to the previous function, the former can be caught by virtually
|
||||
any function in the call path, at any point of the function. It is thus
|
||||
necessary to be able to unwind frames, one by one, until a suitable
|
||||
|
@ -180,16 +184,16 @@ stack-unwinding library similar to \prog{libunwind} in its runtime.
|
|||
|
||||
Technically, exception handling could be implemented without any stack
|
||||
unwinding, by using \lstc{setjmp}/\lstc{longjmp} mechanics~\cite{niditoexn}.
|
||||
However, this is not possible in C++ (and some other languages), because the
|
||||
stack needs to be properly unwound in order to trigger the destructors of
|
||||
stack-allocated objects. Furthermore, this is often undesirable: \lstc{setjmp}
|
||||
has a quite big overhead, which is introduced whenever a \lstc{try} block is
|
||||
encountered. Instead, it is often preferred to have strictly no overhead when
|
||||
no exception happens, at the cost of a greater overhead when an exception is
|
||||
actually fired (after all, they are supposed to be \emph{exceptional}). For
|
||||
more details on C++ exception handling, see~\cite{koening1990exception}
|
||||
(especially Section~16.5). Possible implementation mechanisms are also
|
||||
presented in~\cite{dinechin2000exn}.
|
||||
However, this is not possible to implement it straight away in C++ (and some
|
||||
other languages), because the stack needs to be properly unwound in order to
|
||||
trigger the destructors of stack-allocated objects. Furthermore, this is often
|
||||
undesirable: \lstc{setjmp} has a quite big overhead, which is introduced
|
||||
whenever a \lstc{try} block is encountered. Instead, it is often preferred to
|
||||
have strictly no overhead when no exception happens, at the cost of a greater
|
||||
overhead when an exception is actually fired (after all, they are supposed to
|
||||
be \emph{exceptional}). For more details on C++ exception handling,
|
||||
see~\cite{koening1990exception} (especially Section~16.5). Possible
|
||||
implementation mechanisms are also presented in~\cite{dinechin2000exn}.
|
||||
|
||||
In both of these two previous cases, performance \emph{can} be a problem. In
|
||||
the latter, a slow unwinding directly impacts the overall program performance,
|
||||
|
@ -815,7 +819,7 @@ Listing~\ref{lst:ex1_dw}, etc.
|
|||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsection{Presentation of \prog{perf}}
|
||||
\subsection{Presentation of \prog{perf}}\label{ssec:perf}
|
||||
|
||||
\prog{Perf} is a \emph{profiler} that comes with the Linux ecosystem (actually,
|
||||
\prog{perf} is developed within the Linux kernel source tree). A profiler is an
|
||||
|
|
Loading…
Reference in a new issue