From 7c2dbd228d8e131cf11f3eaad82889614a1d3b67 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Th=C3=A9ophile=20Bastian?= Date: Wed, 1 Aug 2018 17:34:32 +0200 Subject: [PATCH] =?UTF-8?q?Tentative=20fiche=20de=20synth=C3=A8se?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- report/fiche_synthese.tex | 126 +++++++++++++++++++++++++++++++------- report/report.tex | 2 +- shared/todo.sty | 8 +-- 3 files changed, 108 insertions(+), 28 deletions(-) diff --git a/report/fiche_synthese.tex b/report/fiche_synthese.tex index c77981a..b8fff10 100644 --- a/report/fiche_synthese.tex +++ b/report/fiche_synthese.tex @@ -12,47 +12,127 @@ unwind stack frames, restoring machine registers to their proper values, for instance within the context of a debugger. As debugging data can easily get heavy beyond reasonable if stored carelessly, -the DWARF standard pays a great attention to data compactness and compression. -This, as always, is at the expense of efficiency: accessing stack unwinding -data for a particular program point can be quite costly. +the DWARF standard pays a great attention to data compactness and compression, +and succeeds particularly well at it. But this, as always, is at the expense +of efficiency: accessing stack unwinding data for a particular program point +can be quite costly. + +This is often not a huge problem, as stack unwinding is mostly thought of as a +debugging procedure: when something behaves unexpectedly, the programmer might +be interested in exploring the stack, moving around between stack frames, +tracing the program path leading to some bug, \ldots{} Yet, stack unwinding +might, in some cases, be performance-critical: for instance, profiler programs +needs to perform a whole lot of stack unwindings. Even worse, exception +handling relies on stack unwinding in order to find a suitable catch-block! The most widely used library used for stack unwinding, -\texttt{libunwind}~\cite{libunwind}, +\texttt{libunwind}~\cite{libunwind}, essentially makes use of aggressive but +fine-tuned caching and optimized code to mitigate this problem. \subsection*{The research problem} -This internship explored the possibility to compile the standard ELF debugging -information format, DWARF, into x86\_64 assembly. +\todo{Split the previous paragraph into two paragraphs, fitting this section as +well} + +\note{I have trouble figuring out what is expected here, and what is expected +in the previous section…} -\qtodo{Delete question} \textit{ -What is the question that you studied? -Why is it important, what are the applications/consequences? -Is it a new problem? -If so, why are you the first researcher in the universe who consider it? -If not, why did you think that you could bring an original contribution? -} +% What is the question that you studied? +% Why is it important, what are the applications/consequences? +% Is it a new problem? +% If so, why are you the first researcher in the universe who consider it? +% If not, why did you think that you could bring an original contribution? \subsection*{Your contribution} -What is your solution to the question described in the last paragraph? +This internship explored the possibility to compile the standard ELF debugging +information format, DWARF, directly into native assembly on the x86\_64 +architecture. Instead of parsing and interpreting at runtime the debug data, +the stack unwinding data is accessed as a function of a dynamically-loaded +shared library. -Be careful, do \emph{not} give technical details, only rough ideas! +Multiple approaches have been tried, in order to determine which compilation +process leads to the best time/space trade-off. -Pay a special attention to the description of the \emph{scientific} approach. +Quite unexpectedly, the part that proved hardest of the project was finding a +benchmarking protocol that was both relevant and reliable. Unwinding one single +frame is way too fast to be benched on a few samples (around $10\,\mu s$ per +frame), and having a lot of samples is quite complex, since one must avoid +unwinding the same frame over and over again, which would only benchmark the +caching mechanism. The other problem is to distribute evenly the unwinding +measures across the various program positions, including directly into the +loaded libraries (\eg{} the \texttt{libc}). + +The solution eventually chosen was to modify \texttt{perf}, the standard +profiling program for Linux, in order to gather statistics and benchmarks of +its unwindings, and produce an alternative version of \texttt{libunwind} using +the compiled debugging data, in order to interface it with \texttt{perf}, +allowing to benchmark \texttt{perf} with both the standard stack unwinding data +and the alternative experimental compiled format. As a free and enjoyable +side-effect, the experimental unwinding data is perfectly interfaced with +\texttt{libunwind}, and thus interfaceable at practically no cost with any +existing project using the common library \texttt{libunwind}. + +% What is your solution to the question described in the last paragraph? +% +% Be careful, do \emph{not} give technical details, only rough ideas! +% +% Pay a special attention to the description of the \emph{scientific} approach. \subsection*{Arguments supporting its validity} -What is the evidence that your solution is a good solution? -Experiments? Proofs? +% What is the evidence that your solution is a good solution? +% Experiments? Proofs? +% +% Comment the robustness of your solution: how does it rely/depend on the working assumptions? -Comment the robustness of your solution: how does it rely/depend on the working assumptions? +The goal was to obtain a compiled version of unwinding data that was faster +than DWARF, reasonably heavier and reliable. The benchmarks mentioned have +yielded convincing results: on the experimental setup created (detailed later +in this report), the compiled version is up to 25 times faster than the DWARF +version, while it remains only around 2.5 times bigger than the original data. + +Even though the implementation is more a research prototype than a release +version, is still reasonably robust, compared to \texttt{libunwind}, which is +built for robustness. Corner cases are frequent while analyzing stack data, and +even more when analyzing them through a profiler; yet the prototype fails only +on around 200 cases more than libunwind on a 27000 samples test (1099 failures, +against 885 for libunwind). + +The prototype, unlike libunwind, does not support $100\,\%$ of the DWARF +instruction present in the DWARF5 standard~\cite{dwarf5std}. It is also limited +to the x86\_64 architecture, and relies to some extent on the Linux operating +system. But none of those limitations are real problems in practice. As argued +later on, the vast majority of the DWARF instructions actually used in the wild +are implemented; other processor architectures and ABIs are only a matter of +time spent and engineering work; and the operating system dependency is only +present in the libraries developed in order to interact with the compiled +unwinding data, which can be developed for virtually any operating system. \subsection*{Summary and future work} -What is next? In which respect is your approach general? -What did your contribution bring to the area? -What should be done now? -What is the good \emph{next} question? +In most cases of everyday's life, the slowness of stack unwinding is not a +problem, or even an annoyance. Yet, having a 25 times speed-up on stack +unwinding-heavy tasks, such as profiling, can be really useful to analyse heavy +programs, particularly if one wants to profile many times in order to analyze +the impact of multiple changes. It can also be useful for exception-heavy +programs~\qtodo{cite Stephen's software?}. Thus, it might be interesting to +implement a more stable version, and try to interface it cleanly with +mainstream tools, such as \texttt{perf}. + +It might also be interesting to investigate whether it is possible to reach +even greater speeds by using some more complex compilation process that would +have yet to be determined. + +Another question worth exploring might be whether it is possible to shrink even +more the original DWARF unwinding data, which would be stored in a format not +too far from the original standard, by applying techniques close to those +used to shrink the compiled unwinding data. + +% What is next? In which respect is your approach general? +% What did your contribution bring to the area? +% What should be done now? +% What is the good \emph{next} question? \pagestyle{plain} diff --git a/report/report.tex b/report/report.tex index a2cc555..e6c9176 100644 --- a/report/report.tex +++ b/report/report.tex @@ -1,4 +1,4 @@ -\title{DWARF debugging data, compilation and verification} +\title{DWARF debugging data, compilation and optimization} \author{Théophile Bastian\\ Under supervision of Francesco Zappa-Nardelli\\ diff --git a/shared/todo.sty b/shared/todo.sty index c1a3c3a..86ae375 100644 --- a/shared/todo.sty +++ b/shared/todo.sty @@ -3,9 +3,9 @@ \definecolor{todobg}{HTML}{FF5F00} \definecolor{todofg}{HTML}{3700DA} \definecolor{notebg}{HTML}{87C23C} -\definecolor{notefg}{HTML}{DF4431} +\definecolor{notefg}{HTML}{BC3423} \newcommand{\qtodo}[1]{\colorbox{todobg}{\textcolor{todofg}{#1}}} -\newcommand{\todo}[1]{\qtodo{\textbf{TODO:}\.#1}} -\newcommand{\qnote}[1]{\colorbox{notebg}{\textcolor{notefg}{[#1]}}} -\newcommand{\note}[1]{\qnote{\textbf{NOTE:}\.#1}} +\newcommand{\todo}[1]{\qtodo{\textbf{TODO:}\,#1}} +\newcommand{\qnote}[1]{\colorbox{notebg}{\textcolor{notefg}{#1}}} +\newcommand{\note}[1]{\qnote{\textbf{NOTE:}\,#1}}