Few fixes

2018-08-07 20:44:12 +02:00 · 2018-08-07 20:44:12 +02:00 · b761f360cc
commit b761f360cc
parent 57a6cb9e8b
2 changed files with 44 additions and 40 deletions
--- a/report/fiche_synthese.tex
+++ b/report/fiche_synthese.tex
@ -40,13 +40,13 @@ can be quite costly.

 This is often not a huge problem, as stack unwinding is mostly thought of as a
 debugging procedure: when something behaves unexpectedly, the programmer might
-be interested in exploring the stack.  Yet, stack unwinding might, in some
-cases, be performance-critical: for instance, profiler programs needs to
-perform a whole lot of stack unwindings. Even worse, exception handling relies
-on stack unwinding in order to find a suitable catch-block! For such
-applications, it might be desirable to find a different time/space trade-off,
-allowing a slightly space-heavier, but far more time-efficient unwinding
-procedure.
+be interested in opening their debugger and exploring the stack.  Yet, stack
+unwinding might, in some cases, be performance-critical: for instance, profiler
+programs needs to perform a whole lot of stack unwindings. Even worse,
+exception handling relies on stack unwinding in order to find a suitable
+catch-block! For such applications, it might be desirable to find a different
+time/space trade-off, allowing a slightly space-heavier, but far more
+time-efficient unwinding procedure.

 This different trade-off is the question that I explored during this
 internship: what good alternative trade-off is reachable when storing the stack
@ -108,7 +108,7 @@ existing project using the \textit{de facto} standard library \prog{libunwind}.
 The goal was to obtain a compiled version of unwinding data that was faster
 than DWARF, reasonably heavier and reliable. The benchmarks mentioned have
 yielded convincing results: on the experimental setup created (detailed later
-in this report), the compiled version is up to 25 times faster than the DWARF
+in this report), the compiled version is around 26 times faster than the DWARF
 version, while it remains only around 2.5 times bigger than the original data.

 Even though the implementation is more a research prototype than a release
@ -132,7 +132,7 @@ system.
 \subsection*{Summary and future work}

 In most cases of everyday's life, a slow stack unwinding is not a problem, or
-even an annoyance. Yet, having a 25 times speed-up on stack unwinding-heavy
+even an annoyance. Yet, having a 26 times speed-up on stack unwinding-heavy
 tasks, such as profiling, can be really useful to profile heavy programs,
 particularly if one wants to profile many times in order to analyze the impact
 of multiple changes. It can also be useful for exception-heavy programs. Thus,
--- a/report/report.tex
+++ b/report/report.tex
@ -61,12 +61,13 @@ Under supervision of Francesco Zappa-Nardelli\\

 On most platforms, programs make use of a \emph{call stack} to store
 information about the nested function calls at the current execution point, and
-keep track of their nesting. Each function call has its own \emph{stack frame},
-an entry of the call stack, whose precise contents are often specified in the
-Application Binary Interface (ABI) of the platform, and left to various extents
-up to the compiler. Those frames are typically used for storing function
-arguments, machine registers that must be restored before returning, the
-function's return address and local variables.
+keep track of their nesting. This call stack is conventionally a contiguous
+memory space mapped close to the top of the addressing space. Each function
+call has its own \emph{stack frame}, an entry of the call stack, whose precise
+contents are often specified in the Application Binary Interface (ABI) of the
+platform, and left to various extents up to the compiler. Those frames are
+typically used for storing function arguments, machine registers that must be
+restored before returning, the function's return address and local variables.

 On the x86\_64 platform, with which this report is mostly concerned, the
 calling convention that is followed is defined in the System V
@ -94,14 +95,16 @@ compiler might use \reg{rbp} (``base pointer'') to save this value of
 \reg{rip}, by writing the old value of \reg{rbp} just below the return address
 on the stack, then copying \reg{rsp} to \reg{rbp}. This makes it easy to find
 the return address from anywhere within the function, and also allows for easy
-addressing of local variables.
+addressing of local variables. Yet, using \reg{rbp} to save \reg{rip} is not
+always done, since it somehow ``wastes'' a register. This decision is, on
+x86\_64 System V, up to the compiler.

 Often, a function will start by subtracting some value to \reg{rsp}, allocating
 some space in the stack frame for its local variables. Then, it will push on
 the stack the values of the callee-saved registers that are overwritten later,
 effectively saving them. Before returning, it will pop the values of the saved
-registers back to their original registers, then restoring \reg{rsp} to its
-former value.
+registers back to their original registers and restore \reg{rsp} to its former
+value.

 \subsection{Stack unwinding}

@ -128,7 +131,7 @@ Let us consider a stack with x86\_64 calling conventions, such as shown in
 Figure~\ref{fig:call_stack}. Assuming the compiler decided here \emph{not} to
 use \reg{rbp}, and assuming the function \eg{} allocates a buffer of 8
 integers, the area allocated for local variables should be at least $32$ bytes
-long (for 4-bytes integers), and \reg{rip} will be pointing below this area.
+long (for 4-bytes integers), and \reg{rsp} will be pointing below this area.
 Left apart analyzing the assembly code produced, there is no way to find where
 the return address is stored, relatively to \reg{rsp}, at some arbitrary point
 of the function. Even when \reg{rbp} is used, there is no easy way to guess
@ -160,18 +163,19 @@ Yet, stack unwinding (and thus debugging data) \emph{is not limited to
 debugging}.

 Another common usage is profiling. A profiling tool, such as \prog{perf} under
-Linux, is used to measure and analyze in which functions a program spends its
-time, identify bottlenecks and find out which parts are critical to optimize.
-To do so, modern profilers pause the traced program at regular, short
-intervals, inspect their stack, and determine which function is currently being
-run. They also often perform a stack unwinding to determine the call path to
-this function, to determine which function indirectly takes time: \eg, a
-function \lstc{fct_a} can call both \lstc{fct_b} and \lstc{fct_c}, which are
-quite heavy; spend practically no time directly in \lstc{fct_a}, but spend a
-lot of time in calls to the other two functions that were made by \lstc{fct_a}.
+Linux -- see Section~\ref{ssec:perf} --, is used to measure and analyze in
+which functions a program spends its time, identify bottlenecks and find out
+which parts are critical to optimize.  To do so, modern profilers pause the
+traced program at regular, short intervals, inspect their stack, and determine
+which function is currently being run. They also often perform a stack
+unwinding to determine the call path to this function, to determine which
+function indirectly takes time: \eg, a function \lstc{fct_a} can call both
+\lstc{fct_b} and \lstc{fct_c}, which are quite heavy; spend practically no time
+directly in \lstc{fct_a}, but spend a lot of time in calls to the other two
+functions that were made from \lstc{fct_a}.

 Exception handling also requires a stack unwinding mechanism in most languages.
-Indeed, an exception is completely different from a \lstc{return}: while the
+Indeed, an exception is completely different from a \lstinline{return}: while the
 latter returns to the previous function, the former can be caught by virtually
 any function in the call path, at any point of the function. It is thus
 necessary to be able to unwind frames, one by one, until a suitable
@ -180,16 +184,16 @@ stack-unwinding library similar to \prog{libunwind} in its runtime.

 Technically, exception handling could be implemented without any stack
 unwinding, by using \lstc{setjmp}/\lstc{longjmp} mechanics~\cite{niditoexn}.
-However, this is not possible in C++ (and some other languages), because the
-stack needs to be properly unwound in order to trigger the destructors of
-stack-allocated objects. Furthermore, this is often undesirable: \lstc{setjmp}
-has a quite big overhead, which is introduced whenever a \lstc{try} block is
-encountered. Instead, it is often preferred to have strictly no overhead when
-no exception happens, at the cost of a greater overhead when an exception is
-actually fired (after all, they are supposed to be \emph{exceptional}). For
-more details on C++ exception handling, see~\cite{koening1990exception}
-(especially Section~16.5). Possible implementation mechanisms are also
-presented in~\cite{dinechin2000exn}.
+However, this is not possible to implement it straight away in C++ (and some
+other languages), because the stack needs to be properly unwound in order to
+trigger the destructors of stack-allocated objects. Furthermore, this is often
+undesirable: \lstc{setjmp} has a quite big overhead, which is introduced
+whenever a \lstc{try} block is encountered. Instead, it is often preferred to
+have strictly no overhead when no exception happens, at the cost of a greater
+overhead when an exception is actually fired (after all, they are supposed to
+be \emph{exceptional}). For more details on C++ exception handling,
+see~\cite{koening1990exception} (especially Section~16.5). Possible
+implementation mechanisms are also presented in~\cite{dinechin2000exn}.

 In both of these two previous cases, performance \emph{can} be a problem. In
 the latter, a slow unwinding directly impacts the overall program performance,
@ -815,7 +819,7 @@ Listing~\ref{lst:ex1_dw}, etc.


 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsection{Presentation of \prog{perf}}
+\subsection{Presentation of \prog{perf}}\label{ssec:perf}

 \prog{Perf} is a \emph{profiler} that comes with the Linux ecosystem (actually,
 \prog{perf} is developed within the Linux kernel source tree). A profiler is an