Few fixes
This commit is contained in:
parent
57a6cb9e8b
commit
b761f360cc
2 changed files with 44 additions and 40 deletions
|
@ -40,13 +40,13 @@ can be quite costly.
|
||||||
|
|
||||||
This is often not a huge problem, as stack unwinding is mostly thought of as a
|
This is often not a huge problem, as stack unwinding is mostly thought of as a
|
||||||
debugging procedure: when something behaves unexpectedly, the programmer might
|
debugging procedure: when something behaves unexpectedly, the programmer might
|
||||||
be interested in exploring the stack. Yet, stack unwinding might, in some
|
be interested in opening their debugger and exploring the stack. Yet, stack
|
||||||
cases, be performance-critical: for instance, profiler programs needs to
|
unwinding might, in some cases, be performance-critical: for instance, profiler
|
||||||
perform a whole lot of stack unwindings. Even worse, exception handling relies
|
programs needs to perform a whole lot of stack unwindings. Even worse,
|
||||||
on stack unwinding in order to find a suitable catch-block! For such
|
exception handling relies on stack unwinding in order to find a suitable
|
||||||
applications, it might be desirable to find a different time/space trade-off,
|
catch-block! For such applications, it might be desirable to find a different
|
||||||
allowing a slightly space-heavier, but far more time-efficient unwinding
|
time/space trade-off, allowing a slightly space-heavier, but far more
|
||||||
procedure.
|
time-efficient unwinding procedure.
|
||||||
|
|
||||||
This different trade-off is the question that I explored during this
|
This different trade-off is the question that I explored during this
|
||||||
internship: what good alternative trade-off is reachable when storing the stack
|
internship: what good alternative trade-off is reachable when storing the stack
|
||||||
|
@ -108,7 +108,7 @@ existing project using the \textit{de facto} standard library \prog{libunwind}.
|
||||||
The goal was to obtain a compiled version of unwinding data that was faster
|
The goal was to obtain a compiled version of unwinding data that was faster
|
||||||
than DWARF, reasonably heavier and reliable. The benchmarks mentioned have
|
than DWARF, reasonably heavier and reliable. The benchmarks mentioned have
|
||||||
yielded convincing results: on the experimental setup created (detailed later
|
yielded convincing results: on the experimental setup created (detailed later
|
||||||
in this report), the compiled version is up to 25 times faster than the DWARF
|
in this report), the compiled version is around 26 times faster than the DWARF
|
||||||
version, while it remains only around 2.5 times bigger than the original data.
|
version, while it remains only around 2.5 times bigger than the original data.
|
||||||
|
|
||||||
Even though the implementation is more a research prototype than a release
|
Even though the implementation is more a research prototype than a release
|
||||||
|
@ -132,7 +132,7 @@ system.
|
||||||
\subsection*{Summary and future work}
|
\subsection*{Summary and future work}
|
||||||
|
|
||||||
In most cases of everyday's life, a slow stack unwinding is not a problem, or
|
In most cases of everyday's life, a slow stack unwinding is not a problem, or
|
||||||
even an annoyance. Yet, having a 25 times speed-up on stack unwinding-heavy
|
even an annoyance. Yet, having a 26 times speed-up on stack unwinding-heavy
|
||||||
tasks, such as profiling, can be really useful to profile heavy programs,
|
tasks, such as profiling, can be really useful to profile heavy programs,
|
||||||
particularly if one wants to profile many times in order to analyze the impact
|
particularly if one wants to profile many times in order to analyze the impact
|
||||||
of multiple changes. It can also be useful for exception-heavy programs. Thus,
|
of multiple changes. It can also be useful for exception-heavy programs. Thus,
|
||||||
|
|
|
@ -61,12 +61,13 @@ Under supervision of Francesco Zappa-Nardelli\\
|
||||||
|
|
||||||
On most platforms, programs make use of a \emph{call stack} to store
|
On most platforms, programs make use of a \emph{call stack} to store
|
||||||
information about the nested function calls at the current execution point, and
|
information about the nested function calls at the current execution point, and
|
||||||
keep track of their nesting. Each function call has its own \emph{stack frame},
|
keep track of their nesting. This call stack is conventionally a contiguous
|
||||||
an entry of the call stack, whose precise contents are often specified in the
|
memory space mapped close to the top of the addressing space. Each function
|
||||||
Application Binary Interface (ABI) of the platform, and left to various extents
|
call has its own \emph{stack frame}, an entry of the call stack, whose precise
|
||||||
up to the compiler. Those frames are typically used for storing function
|
contents are often specified in the Application Binary Interface (ABI) of the
|
||||||
arguments, machine registers that must be restored before returning, the
|
platform, and left to various extents up to the compiler. Those frames are
|
||||||
function's return address and local variables.
|
typically used for storing function arguments, machine registers that must be
|
||||||
|
restored before returning, the function's return address and local variables.
|
||||||
|
|
||||||
On the x86\_64 platform, with which this report is mostly concerned, the
|
On the x86\_64 platform, with which this report is mostly concerned, the
|
||||||
calling convention that is followed is defined in the System V
|
calling convention that is followed is defined in the System V
|
||||||
|
@ -94,14 +95,16 @@ compiler might use \reg{rbp} (``base pointer'') to save this value of
|
||||||
\reg{rip}, by writing the old value of \reg{rbp} just below the return address
|
\reg{rip}, by writing the old value of \reg{rbp} just below the return address
|
||||||
on the stack, then copying \reg{rsp} to \reg{rbp}. This makes it easy to find
|
on the stack, then copying \reg{rsp} to \reg{rbp}. This makes it easy to find
|
||||||
the return address from anywhere within the function, and also allows for easy
|
the return address from anywhere within the function, and also allows for easy
|
||||||
addressing of local variables.
|
addressing of local variables. Yet, using \reg{rbp} to save \reg{rip} is not
|
||||||
|
always done, since it somehow ``wastes'' a register. This decision is, on
|
||||||
|
x86\_64 System V, up to the compiler.
|
||||||
|
|
||||||
Often, a function will start by subtracting some value to \reg{rsp}, allocating
|
Often, a function will start by subtracting some value to \reg{rsp}, allocating
|
||||||
some space in the stack frame for its local variables. Then, it will push on
|
some space in the stack frame for its local variables. Then, it will push on
|
||||||
the stack the values of the callee-saved registers that are overwritten later,
|
the stack the values of the callee-saved registers that are overwritten later,
|
||||||
effectively saving them. Before returning, it will pop the values of the saved
|
effectively saving them. Before returning, it will pop the values of the saved
|
||||||
registers back to their original registers, then restoring \reg{rsp} to its
|
registers back to their original registers and restore \reg{rsp} to its former
|
||||||
former value.
|
value.
|
||||||
|
|
||||||
\subsection{Stack unwinding}
|
\subsection{Stack unwinding}
|
||||||
|
|
||||||
|
@ -128,7 +131,7 @@ Let us consider a stack with x86\_64 calling conventions, such as shown in
|
||||||
Figure~\ref{fig:call_stack}. Assuming the compiler decided here \emph{not} to
|
Figure~\ref{fig:call_stack}. Assuming the compiler decided here \emph{not} to
|
||||||
use \reg{rbp}, and assuming the function \eg{} allocates a buffer of 8
|
use \reg{rbp}, and assuming the function \eg{} allocates a buffer of 8
|
||||||
integers, the area allocated for local variables should be at least $32$ bytes
|
integers, the area allocated for local variables should be at least $32$ bytes
|
||||||
long (for 4-bytes integers), and \reg{rip} will be pointing below this area.
|
long (for 4-bytes integers), and \reg{rsp} will be pointing below this area.
|
||||||
Left apart analyzing the assembly code produced, there is no way to find where
|
Left apart analyzing the assembly code produced, there is no way to find where
|
||||||
the return address is stored, relatively to \reg{rsp}, at some arbitrary point
|
the return address is stored, relatively to \reg{rsp}, at some arbitrary point
|
||||||
of the function. Even when \reg{rbp} is used, there is no easy way to guess
|
of the function. Even when \reg{rbp} is used, there is no easy way to guess
|
||||||
|
@ -160,18 +163,19 @@ Yet, stack unwinding (and thus debugging data) \emph{is not limited to
|
||||||
debugging}.
|
debugging}.
|
||||||
|
|
||||||
Another common usage is profiling. A profiling tool, such as \prog{perf} under
|
Another common usage is profiling. A profiling tool, such as \prog{perf} under
|
||||||
Linux, is used to measure and analyze in which functions a program spends its
|
Linux -- see Section~\ref{ssec:perf} --, is used to measure and analyze in
|
||||||
time, identify bottlenecks and find out which parts are critical to optimize.
|
which functions a program spends its time, identify bottlenecks and find out
|
||||||
To do so, modern profilers pause the traced program at regular, short
|
which parts are critical to optimize. To do so, modern profilers pause the
|
||||||
intervals, inspect their stack, and determine which function is currently being
|
traced program at regular, short intervals, inspect their stack, and determine
|
||||||
run. They also often perform a stack unwinding to determine the call path to
|
which function is currently being run. They also often perform a stack
|
||||||
this function, to determine which function indirectly takes time: \eg, a
|
unwinding to determine the call path to this function, to determine which
|
||||||
function \lstc{fct_a} can call both \lstc{fct_b} and \lstc{fct_c}, which are
|
function indirectly takes time: \eg, a function \lstc{fct_a} can call both
|
||||||
quite heavy; spend practically no time directly in \lstc{fct_a}, but spend a
|
\lstc{fct_b} and \lstc{fct_c}, which are quite heavy; spend practically no time
|
||||||
lot of time in calls to the other two functions that were made by \lstc{fct_a}.
|
directly in \lstc{fct_a}, but spend a lot of time in calls to the other two
|
||||||
|
functions that were made from \lstc{fct_a}.
|
||||||
|
|
||||||
Exception handling also requires a stack unwinding mechanism in most languages.
|
Exception handling also requires a stack unwinding mechanism in most languages.
|
||||||
Indeed, an exception is completely different from a \lstc{return}: while the
|
Indeed, an exception is completely different from a \lstinline{return}: while the
|
||||||
latter returns to the previous function, the former can be caught by virtually
|
latter returns to the previous function, the former can be caught by virtually
|
||||||
any function in the call path, at any point of the function. It is thus
|
any function in the call path, at any point of the function. It is thus
|
||||||
necessary to be able to unwind frames, one by one, until a suitable
|
necessary to be able to unwind frames, one by one, until a suitable
|
||||||
|
@ -180,16 +184,16 @@ stack-unwinding library similar to \prog{libunwind} in its runtime.
|
||||||
|
|
||||||
Technically, exception handling could be implemented without any stack
|
Technically, exception handling could be implemented without any stack
|
||||||
unwinding, by using \lstc{setjmp}/\lstc{longjmp} mechanics~\cite{niditoexn}.
|
unwinding, by using \lstc{setjmp}/\lstc{longjmp} mechanics~\cite{niditoexn}.
|
||||||
However, this is not possible in C++ (and some other languages), because the
|
However, this is not possible to implement it straight away in C++ (and some
|
||||||
stack needs to be properly unwound in order to trigger the destructors of
|
other languages), because the stack needs to be properly unwound in order to
|
||||||
stack-allocated objects. Furthermore, this is often undesirable: \lstc{setjmp}
|
trigger the destructors of stack-allocated objects. Furthermore, this is often
|
||||||
has a quite big overhead, which is introduced whenever a \lstc{try} block is
|
undesirable: \lstc{setjmp} has a quite big overhead, which is introduced
|
||||||
encountered. Instead, it is often preferred to have strictly no overhead when
|
whenever a \lstc{try} block is encountered. Instead, it is often preferred to
|
||||||
no exception happens, at the cost of a greater overhead when an exception is
|
have strictly no overhead when no exception happens, at the cost of a greater
|
||||||
actually fired (after all, they are supposed to be \emph{exceptional}). For
|
overhead when an exception is actually fired (after all, they are supposed to
|
||||||
more details on C++ exception handling, see~\cite{koening1990exception}
|
be \emph{exceptional}). For more details on C++ exception handling,
|
||||||
(especially Section~16.5). Possible implementation mechanisms are also
|
see~\cite{koening1990exception} (especially Section~16.5). Possible
|
||||||
presented in~\cite{dinechin2000exn}.
|
implementation mechanisms are also presented in~\cite{dinechin2000exn}.
|
||||||
|
|
||||||
In both of these two previous cases, performance \emph{can} be a problem. In
|
In both of these two previous cases, performance \emph{can} be a problem. In
|
||||||
the latter, a slow unwinding directly impacts the overall program performance,
|
the latter, a slow unwinding directly impacts the overall program performance,
|
||||||
|
@ -815,7 +819,7 @@ Listing~\ref{lst:ex1_dw}, etc.
|
||||||
|
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\subsection{Presentation of \prog{perf}}
|
\subsection{Presentation of \prog{perf}}\label{ssec:perf}
|
||||||
|
|
||||||
\prog{Perf} is a \emph{profiler} that comes with the Linux ecosystem (actually,
|
\prog{Perf} is a \emph{profiler} that comes with the Linux ecosystem (actually,
|
||||||
\prog{perf} is developed within the Linux kernel source tree). A profiler is an
|
\prog{perf} is developed within the Linux kernel source tree). A profiler is an
|
||||||
|
|
Loading…
Reference in a new issue