From 7c2dbd228d8e131cf11f3eaad82889614a1d3b67 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Th=C3=A9ophile=20Bastian?= <contact@tobast.fr>
Date: Wed, 1 Aug 2018 17:34:32 +0200
Subject: [PATCH] =?UTF-8?q?Tentative=20fiche=20de=20synth=C3=A8se?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 report/fiche_synthese.tex | 126 +++++++++++++++++++++++++++++++-------
 report/report.tex         |   2 +-
 shared/todo.sty           |   8 +--
 3 files changed, 108 insertions(+), 28 deletions(-)

diff --git a/report/fiche_synthese.tex b/report/fiche_synthese.tex
index c77981a..b8fff10 100644
--- a/report/fiche_synthese.tex
+++ b/report/fiche_synthese.tex
@@ -12,47 +12,127 @@ unwind stack frames, restoring machine registers to their proper values, for
 instance within the context of a debugger.
 
 As debugging data can easily get heavy beyond reasonable if stored carelessly,
-the DWARF standard pays a great attention to data compactness and compression.
-This, as always, is at the expense of efficiency: accessing stack unwinding
-data for a particular program point can be quite costly.
+the DWARF standard pays a great attention to data compactness and compression,
+and succeeds particularly well at it.  But this, as always, is at the expense
+of efficiency: accessing stack unwinding data for a particular program point
+can be quite costly.
+
+This is often not a huge problem, as stack unwinding is mostly thought of as a
+debugging procedure: when something behaves unexpectedly, the programmer might
+be interested in exploring the stack, moving around between stack frames,
+tracing the program path leading to some bug, \ldots{} Yet, stack unwinding
+might, in some cases, be performance-critical: for instance, profiler programs
+needs to perform a whole lot of stack unwindings. Even worse, exception
+handling relies on stack unwinding in order to find a suitable catch-block!
 
 The most widely used library used for stack unwinding,
-\texttt{libunwind}~\cite{libunwind}, 
+\texttt{libunwind}~\cite{libunwind}, essentially makes use of aggressive but
+fine-tuned caching and optimized code to mitigate this problem.
 
 \subsection*{The research problem}
 
-This internship explored the possibility to compile the standard ELF debugging
-information format, DWARF, into x86\_64 assembly. 
+\todo{Split the previous paragraph into two paragraphs, fitting this section as
+well}
+
+\note{I have trouble figuring out what is expected here, and what is expected
+in the previous section…}
 
 
-\qtodo{Delete question} \textit{
-What is the question that you studied?
-Why is it important, what are the applications/consequences?
-Is it a new problem?
-If so, why are you the first researcher in the universe who consider it?
-If not, why did you think that you could bring an original contribution?
-}
+% What is the question that you studied?
+% Why is it important, what are the applications/consequences?
+% Is it a new problem?
+% If so, why are you the first researcher in the universe who consider it?
+% If not, why did you think that you could bring an original contribution?
 
 \subsection*{Your contribution}
 
-What is your solution to the question described in the last paragraph?
+This internship explored the possibility to compile the standard ELF debugging
+information format, DWARF, directly into native assembly on the x86\_64
+architecture. Instead of parsing and interpreting at runtime the debug data,
+the stack unwinding data is accessed as a function of a dynamically-loaded
+shared library.
 
-Be careful, do \emph{not} give technical details, only rough ideas!
+Multiple approaches have been tried, in order to determine which compilation
+process leads to the best time/space trade-off.
 
-Pay a special attention to the description  of the \emph{scientific} approach.
+Quite unexpectedly, the part that proved hardest of the project was finding a
+benchmarking protocol that was both relevant and reliable. Unwinding one single
+frame is way too fast to be benched on a few samples (around $10\,\mu s$ per
+frame), and having a lot of samples is quite complex, since one must avoid
+unwinding the same frame over and over again, which would only benchmark the
+caching mechanism. The other problem is to distribute evenly the unwinding
+measures across the various program positions, including directly into the
+loaded libraries (\eg{} the \texttt{libc}).
+
+The solution eventually chosen was to modify \texttt{perf}, the standard
+profiling program for Linux, in order to gather statistics and benchmarks of
+its unwindings, and produce an alternative version of \texttt{libunwind} using
+the compiled debugging data, in order to interface it with \texttt{perf},
+allowing to benchmark \texttt{perf} with both the standard stack unwinding data
+and the alternative experimental compiled format. As a free and enjoyable
+side-effect, the experimental unwinding data is perfectly interfaced with
+\texttt{libunwind}, and thus interfaceable at practically no cost with any
+existing project using the common library \texttt{libunwind}.
+
+% What is your solution to the question described in the last paragraph?
+%
+% Be careful, do \emph{not} give technical details, only rough ideas!
+%
+% Pay a special attention to the description  of the \emph{scientific} approach.
 
 \subsection*{Arguments supporting its validity}
 
-What is the evidence that your solution is a good solution?
-Experiments? Proofs?
+% What is the evidence that your solution is a good solution?
+% Experiments? Proofs?
+% 
+% Comment the robustness of your solution: how does it rely/depend on the working assumptions?
 
-Comment the robustness of your solution: how does it rely/depend on the working assumptions?
+The goal was to obtain a compiled version of unwinding data that was faster
+than DWARF, reasonably heavier and reliable. The benchmarks mentioned have
+yielded convincing results: on the experimental setup created (detailed later
+in this report), the compiled version is up to 25 times faster than the DWARF
+version, while it remains only around 2.5 times bigger than the original data.
+
+Even though the implementation is more a research prototype than a release
+version, is still reasonably robust, compared to \texttt{libunwind}, which is
+built for robustness. Corner cases are frequent while analyzing stack data, and
+even more when analyzing them through a profiler; yet the prototype fails only
+on around 200 cases more than libunwind on a 27000 samples test (1099 failures,
+against 885 for libunwind).
+
+The prototype, unlike libunwind, does not support $100\,\%$ of the DWARF
+instruction present in the DWARF5 standard~\cite{dwarf5std}. It is also limited
+to the x86\_64 architecture, and relies to some extent on the Linux operating
+system. But none of those limitations are real problems in practice. As argued
+later on, the vast majority of the DWARF instructions actually used in the wild
+are implemented; other processor architectures and ABIs are only a matter of
+time spent and engineering work; and the operating system dependency is only
+present in the libraries developed in order to interact with the compiled
+unwinding data, which can be developed for virtually any operating system.
 
 \subsection*{Summary and future work}
 
-What is next? In which respect is your approach general?
-What did your contribution bring to the area?
-What should be done now?
-What is the good \emph{next} question?
+In most cases of everyday's life, the slowness of stack unwinding is not a
+problem, or even an annoyance. Yet, having a 25 times speed-up on stack
+unwinding-heavy tasks, such as profiling, can be really useful to analyse heavy
+programs, particularly if one wants to profile many times in order to analyze
+the impact of multiple changes. It can also be useful for exception-heavy
+programs~\qtodo{cite Stephen's software?}. Thus, it might be interesting to
+implement a more stable version, and try to interface it cleanly with
+mainstream tools, such as \texttt{perf}.
+
+It might also be interesting to investigate whether it is possible to reach
+even greater speeds by using some more complex compilation process that would
+have yet to be determined.
+
+Another question worth exploring might be whether it is possible to shrink even
+more the original DWARF unwinding data, which would be stored in a format not
+too far from the original standard, by applying techniques close to those
+used to shrink the compiled unwinding data.
+
+% What is next? In which respect is your approach general?
+% What did your contribution bring to the area?
+% What should be done now?
+% What is the good \emph{next} question?
 
 \pagestyle{plain}
diff --git a/report/report.tex b/report/report.tex
index a2cc555..e6c9176 100644
--- a/report/report.tex
+++ b/report/report.tex
@@ -1,4 +1,4 @@
-\title{DWARF debugging data, compilation and verification}
+\title{DWARF debugging data, compilation and optimization}
 
 \author{Théophile Bastian\\
 Under supervision of Francesco Zappa-Nardelli\\
diff --git a/shared/todo.sty b/shared/todo.sty
index c1a3c3a..86ae375 100644
--- a/shared/todo.sty
+++ b/shared/todo.sty
@@ -3,9 +3,9 @@
 \definecolor{todobg}{HTML}{FF5F00}
 \definecolor{todofg}{HTML}{3700DA}
 \definecolor{notebg}{HTML}{87C23C}
-\definecolor{notefg}{HTML}{DF4431}
+\definecolor{notefg}{HTML}{BC3423}
 
 \newcommand{\qtodo}[1]{\colorbox{todobg}{\textcolor{todofg}{#1}}}
-\newcommand{\todo}[1]{\qtodo{\textbf{TODO:}\.#1}}
-\newcommand{\qnote}[1]{\colorbox{notebg}{\textcolor{notefg}{[#1]}}}
-\newcommand{\note}[1]{\qnote{\textbf{NOTE:}\.#1}}
+\newcommand{\todo}[1]{\qtodo{\textbf{TODO:}\,#1}}
+\newcommand{\qnote}[1]{\colorbox{notebg}{\textcolor{notefg}{#1}}}
+\newcommand{\note}[1]{\qnote{\textbf{NOTE:}\,#1}}