Few fixes

This commit is contained in:
Théophile Bastian 2018-08-07 20:44:12 +02:00
parent 57a6cb9e8b
commit b761f360cc
2 changed files with 44 additions and 40 deletions

View file

@ -40,13 +40,13 @@ can be quite costly.
This is often not a huge problem, as stack unwinding is mostly thought of as a This is often not a huge problem, as stack unwinding is mostly thought of as a
debugging procedure: when something behaves unexpectedly, the programmer might debugging procedure: when something behaves unexpectedly, the programmer might
be interested in exploring the stack. Yet, stack unwinding might, in some be interested in opening their debugger and exploring the stack. Yet, stack
cases, be performance-critical: for instance, profiler programs needs to unwinding might, in some cases, be performance-critical: for instance, profiler
perform a whole lot of stack unwindings. Even worse, exception handling relies programs needs to perform a whole lot of stack unwindings. Even worse,
on stack unwinding in order to find a suitable catch-block! For such exception handling relies on stack unwinding in order to find a suitable
applications, it might be desirable to find a different time/space trade-off, catch-block! For such applications, it might be desirable to find a different
allowing a slightly space-heavier, but far more time-efficient unwinding time/space trade-off, allowing a slightly space-heavier, but far more
procedure. time-efficient unwinding procedure.
This different trade-off is the question that I explored during this This different trade-off is the question that I explored during this
internship: what good alternative trade-off is reachable when storing the stack internship: what good alternative trade-off is reachable when storing the stack
@ -108,7 +108,7 @@ existing project using the \textit{de facto} standard library \prog{libunwind}.
The goal was to obtain a compiled version of unwinding data that was faster The goal was to obtain a compiled version of unwinding data that was faster
than DWARF, reasonably heavier and reliable. The benchmarks mentioned have than DWARF, reasonably heavier and reliable. The benchmarks mentioned have
yielded convincing results: on the experimental setup created (detailed later yielded convincing results: on the experimental setup created (detailed later
in this report), the compiled version is up to 25 times faster than the DWARF in this report), the compiled version is around 26 times faster than the DWARF
version, while it remains only around 2.5 times bigger than the original data. version, while it remains only around 2.5 times bigger than the original data.
Even though the implementation is more a research prototype than a release Even though the implementation is more a research prototype than a release
@ -132,7 +132,7 @@ system.
\subsection*{Summary and future work} \subsection*{Summary and future work}
In most cases of everyday's life, a slow stack unwinding is not a problem, or In most cases of everyday's life, a slow stack unwinding is not a problem, or
even an annoyance. Yet, having a 25 times speed-up on stack unwinding-heavy even an annoyance. Yet, having a 26 times speed-up on stack unwinding-heavy
tasks, such as profiling, can be really useful to profile heavy programs, tasks, such as profiling, can be really useful to profile heavy programs,
particularly if one wants to profile many times in order to analyze the impact particularly if one wants to profile many times in order to analyze the impact
of multiple changes. It can also be useful for exception-heavy programs. Thus, of multiple changes. It can also be useful for exception-heavy programs. Thus,

View file

@ -61,12 +61,13 @@ Under supervision of Francesco Zappa-Nardelli\\
On most platforms, programs make use of a \emph{call stack} to store On most platforms, programs make use of a \emph{call stack} to store
information about the nested function calls at the current execution point, and information about the nested function calls at the current execution point, and
keep track of their nesting. Each function call has its own \emph{stack frame}, keep track of their nesting. This call stack is conventionally a contiguous
an entry of the call stack, whose precise contents are often specified in the memory space mapped close to the top of the addressing space. Each function
Application Binary Interface (ABI) of the platform, and left to various extents call has its own \emph{stack frame}, an entry of the call stack, whose precise
up to the compiler. Those frames are typically used for storing function contents are often specified in the Application Binary Interface (ABI) of the
arguments, machine registers that must be restored before returning, the platform, and left to various extents up to the compiler. Those frames are
function's return address and local variables. typically used for storing function arguments, machine registers that must be
restored before returning, the function's return address and local variables.
On the x86\_64 platform, with which this report is mostly concerned, the On the x86\_64 platform, with which this report is mostly concerned, the
calling convention that is followed is defined in the System V calling convention that is followed is defined in the System V
@ -94,14 +95,16 @@ compiler might use \reg{rbp} (``base pointer'') to save this value of
\reg{rip}, by writing the old value of \reg{rbp} just below the return address \reg{rip}, by writing the old value of \reg{rbp} just below the return address
on the stack, then copying \reg{rsp} to \reg{rbp}. This makes it easy to find on the stack, then copying \reg{rsp} to \reg{rbp}. This makes it easy to find
the return address from anywhere within the function, and also allows for easy the return address from anywhere within the function, and also allows for easy
addressing of local variables. addressing of local variables. Yet, using \reg{rbp} to save \reg{rip} is not
always done, since it somehow ``wastes'' a register. This decision is, on
x86\_64 System V, up to the compiler.
Often, a function will start by subtracting some value to \reg{rsp}, allocating Often, a function will start by subtracting some value to \reg{rsp}, allocating
some space in the stack frame for its local variables. Then, it will push on some space in the stack frame for its local variables. Then, it will push on
the stack the values of the callee-saved registers that are overwritten later, the stack the values of the callee-saved registers that are overwritten later,
effectively saving them. Before returning, it will pop the values of the saved effectively saving them. Before returning, it will pop the values of the saved
registers back to their original registers, then restoring \reg{rsp} to its registers back to their original registers and restore \reg{rsp} to its former
former value. value.
\subsection{Stack unwinding} \subsection{Stack unwinding}
@ -128,7 +131,7 @@ Let us consider a stack with x86\_64 calling conventions, such as shown in
Figure~\ref{fig:call_stack}. Assuming the compiler decided here \emph{not} to Figure~\ref{fig:call_stack}. Assuming the compiler decided here \emph{not} to
use \reg{rbp}, and assuming the function \eg{} allocates a buffer of 8 use \reg{rbp}, and assuming the function \eg{} allocates a buffer of 8
integers, the area allocated for local variables should be at least $32$ bytes integers, the area allocated for local variables should be at least $32$ bytes
long (for 4-bytes integers), and \reg{rip} will be pointing below this area. long (for 4-bytes integers), and \reg{rsp} will be pointing below this area.
Left apart analyzing the assembly code produced, there is no way to find where Left apart analyzing the assembly code produced, there is no way to find where
the return address is stored, relatively to \reg{rsp}, at some arbitrary point the return address is stored, relatively to \reg{rsp}, at some arbitrary point
of the function. Even when \reg{rbp} is used, there is no easy way to guess of the function. Even when \reg{rbp} is used, there is no easy way to guess
@ -160,18 +163,19 @@ Yet, stack unwinding (and thus debugging data) \emph{is not limited to
debugging}. debugging}.
Another common usage is profiling. A profiling tool, such as \prog{perf} under Another common usage is profiling. A profiling tool, such as \prog{perf} under
Linux, is used to measure and analyze in which functions a program spends its Linux -- see Section~\ref{ssec:perf} --, is used to measure and analyze in
time, identify bottlenecks and find out which parts are critical to optimize. which functions a program spends its time, identify bottlenecks and find out
To do so, modern profilers pause the traced program at regular, short which parts are critical to optimize. To do so, modern profilers pause the
intervals, inspect their stack, and determine which function is currently being traced program at regular, short intervals, inspect their stack, and determine
run. They also often perform a stack unwinding to determine the call path to which function is currently being run. They also often perform a stack
this function, to determine which function indirectly takes time: \eg, a unwinding to determine the call path to this function, to determine which
function \lstc{fct_a} can call both \lstc{fct_b} and \lstc{fct_c}, which are function indirectly takes time: \eg, a function \lstc{fct_a} can call both
quite heavy; spend practically no time directly in \lstc{fct_a}, but spend a \lstc{fct_b} and \lstc{fct_c}, which are quite heavy; spend practically no time
lot of time in calls to the other two functions that were made by \lstc{fct_a}. directly in \lstc{fct_a}, but spend a lot of time in calls to the other two
functions that were made from \lstc{fct_a}.
Exception handling also requires a stack unwinding mechanism in most languages. Exception handling also requires a stack unwinding mechanism in most languages.
Indeed, an exception is completely different from a \lstc{return}: while the Indeed, an exception is completely different from a \lstinline{return}: while the
latter returns to the previous function, the former can be caught by virtually latter returns to the previous function, the former can be caught by virtually
any function in the call path, at any point of the function. It is thus any function in the call path, at any point of the function. It is thus
necessary to be able to unwind frames, one by one, until a suitable necessary to be able to unwind frames, one by one, until a suitable
@ -180,16 +184,16 @@ stack-unwinding library similar to \prog{libunwind} in its runtime.
Technically, exception handling could be implemented without any stack Technically, exception handling could be implemented without any stack
unwinding, by using \lstc{setjmp}/\lstc{longjmp} mechanics~\cite{niditoexn}. unwinding, by using \lstc{setjmp}/\lstc{longjmp} mechanics~\cite{niditoexn}.
However, this is not possible in C++ (and some other languages), because the However, this is not possible to implement it straight away in C++ (and some
stack needs to be properly unwound in order to trigger the destructors of other languages), because the stack needs to be properly unwound in order to
stack-allocated objects. Furthermore, this is often undesirable: \lstc{setjmp} trigger the destructors of stack-allocated objects. Furthermore, this is often
has a quite big overhead, which is introduced whenever a \lstc{try} block is undesirable: \lstc{setjmp} has a quite big overhead, which is introduced
encountered. Instead, it is often preferred to have strictly no overhead when whenever a \lstc{try} block is encountered. Instead, it is often preferred to
no exception happens, at the cost of a greater overhead when an exception is have strictly no overhead when no exception happens, at the cost of a greater
actually fired (after all, they are supposed to be \emph{exceptional}). For overhead when an exception is actually fired (after all, they are supposed to
more details on C++ exception handling, see~\cite{koening1990exception} be \emph{exceptional}). For more details on C++ exception handling,
(especially Section~16.5). Possible implementation mechanisms are also see~\cite{koening1990exception} (especially Section~16.5). Possible
presented in~\cite{dinechin2000exn}. implementation mechanisms are also presented in~\cite{dinechin2000exn}.
In both of these two previous cases, performance \emph{can} be a problem. In In both of these two previous cases, performance \emph{can} be a problem. In
the latter, a slow unwinding directly impacts the overall program performance, the latter, a slow unwinding directly impacts the overall program performance,
@ -815,7 +819,7 @@ Listing~\ref{lst:ex1_dw}, etc.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Presentation of \prog{perf}} \subsection{Presentation of \prog{perf}}\label{ssec:perf}
\prog{Perf} is a \emph{profiler} that comes with the Linux ecosystem (actually, \prog{Perf} is a \emph{profiler} that comes with the Linux ecosystem (actually,
\prog{perf} is developed within the Linux kernel source tree). A profiler is an \prog{perf} is developed within the Linux kernel source tree). A profiler is an