\pagestyle{empty} % \thispagestyle{empty} %% Attention: pas plus d'un recto-verso! % Ne conservez pas les questions \section*{Internship synthesis} \subsection*{The general context} The standard debugging data format, DWARF (Debugging With Attributed Record Formats), contains tables permitting, for a given instruction pointer (IP), to understand how instructions from the assembly code relates to the original source code, where are variables currently allocated in memory or if they are stored in a register, what are their type and how to unwind the current stack frame. This information is generated when passing \eg{} the switch \lstbash{-g} to \prog{gcc} or equivalents. Even in stripped (non-debug) binaries, a small portion of DWARF data remains: the stack unwinding data. This information is necessary to unwind stack frames, restoring machine registers to the value they had in the previous frame. This data is structured into tables, each row corresponding to an IP range for which it describes valid unwinding data, and each column describing how to unwind a particular machine register (or virtual register used for various purposes). The vast majority of the rules actually used are basic --~see Section~\ref{ssec:instr_cov}~\textendash, consisting in offsets from memory addresses stored in registers (such as \reg{rbp} or \reg{rsp}). Yet, the standard defines rules that take the form of a stack-machine expression that can access virtually all the process's memory and perform Turing-complete computations~\cite{oakley2011exploiting}. \subsection*{The research problem} As debugging data can easily grow larger than the program itself if stored carelessly, the DWARF standard pays a great attention to data compactness and compression. It succeeds particularly well at it, but at the expense of efficiency: accessing stack unwinding data for a particular program point is an expensive operation --~the order of magnitude is $10\,\mu{}\text{s}$ on a modern computer. This is often not a problem, as stack unwinding is often thought of as a debugging procedure: when something behaves unexpectedly, the programmer might open their debugger and explore the stack. Yet, stack unwinding might, in some cases, be performance-critical: for instance, polling profilers repeatedly perform stack unwindings to observe which functions are active. Even worse, C++ exception handling relies on stack unwinding in order to find a suitable catch-block! For such applications, it might be desirable to find a different time/space trade-off, storing a bit more for a faster unwinding. This different trade-off is the question that I explored during this internship: what good alternative trade-off is reachable when storing the stack unwinding data completely differently? It seems that the subject has not been explored yet, and as of now, the most widely used library for stack unwinding, \prog{libunwind}~\cite{libunwind}, essentially makes use of aggressive but fine-tuned caching and optimized code to mitigate this problem. % What is the question that you studied? % Why is it important, what are the applications/consequences? % Is it a new problem? % If so, why are you the first researcher in the universe who consider it? % If not, why did you think that you could bring an original contribution? \subsection*{Your contribution} This internship explored the possibility to compile DWARF's stack unwinding data directly into native assembly on the x86\_64 architecture, in order to provide fast access to the data at assembly level. This compilation process was fully implemented and tested on complex, real-world examples. The integration of compiled DWARF into existing projects have been made easy by implementing an alternative version of the \textit{de facto} standard library for this purpose, \prog{libunwind}. We explored and evaluated multiple approaches to determine which compilation process leads to the best time/space trade-off. Unexpectedly, the part that proved hardest of the project was finding and implementing a benchmarking protocol that was both relevant and reliable. Unwinding one single frame is too fast to provide a reliable benchmarking on a few samples (around $10\,\mu s$ per frame) to avoid statistical errors. Having enough samples for this purpose --~at least a few thousands~-- is not easy, since one must avoid unwinding the same frame over and over again, which would only benchmark the caching mechanism. The other problem is to distribute evenly the unwinding measures across the various IPs, among which those directly located into the loaded libraries (\eg{} the \prog{libc}). The solution eventually chosen was to modify \prog{perf}, the standard profiling program for Linux, in order to gather statistics and benchmarks of its unwindings. Modifying \prog{perf} was an additional challenge that turned out to be harder than expected, since the source code is hard to read, and optimisations make some parts counter-intuitive. To overcome this, we designed an alternative version of \prog{libunwind} interfaced with the compiled debugging data. % What is your solution to the question described in the last paragraph? % % Be careful, do \emph{not} give technical details, only rough ideas! % % Pay a special attention to the description of the \emph{scientific} approach. \subsection*{Arguments supporting its validity} % What is the evidence that your solution is a good solution? % Experiments? Proofs? % % Comment the robustness of your solution: how does it rely/depend on the working assumptions? The goal of this project was to design a compiled version of unwinding data that is faster than DWARF, while still being reliable and reasonably compact. Benchmarking has yielded convincing results: on the experimental setup created --~detailed on Section~\ref{sec:benchmarking} below~\textendash, the compiled version is around 26 times faster than the DWARF version, while it remains only around 2.5 times bigger than the original data. We support the vast majority --~more than $99.9\,\%$~-- of the instructions actually used in binaries, although we do not support all of DWARF5 instruction set. We are almost as robust as libunwind: on a $27000$ samples test, 885 failures were observed for \prog{libunwind}, against $1099$ for the compiled DWARF version (failures are due to signal handlers, unusual instructions, \ldots) --~see Section~\ref{ssec:timeperf}. The implementation is not yet release-ready, as it does not support 100\ \% of the DWARF5 specification~\cite{dwarf5std} --~see Section~\ref{ssec:ehelfs} below. Yet, it supports the vast majority --~more than $99.9$\ \%~-- of the cases seen in the wild, and is decently robust compared to \prog{libunwind}, the reference implementation. Indeed, corner cases occur often, and on a 27000 samples test, 885 failures were observed for \prog{libunwind}, against 1099 for the compiled DWARF version (see Section~\ref{ssec:timeperf}). The implementation, however, is not yet production-ready: it only supports the x86\_64 architecture, and relies to some extent on the Linux operating system. None of these pose a fundamental problem. Supporting other processor architectures and ABIs are only a matter of engineering. The operating system dependency is only present in the libraries developed in order to interact with the compiled unwinding data, which can be developed for virtually any operating system. \subsection*{Summary and future work} In most cases of everyday's life, a slow stack unwinding is not a problem, left apart an annoyance. Yet, having a 26 times speed-up on stack unwinding-heavy tasks can be really useful to \eg{} profile large programs, particularly if one wants to profile many times in order to analyze the impact of multiple changes. It can also be useful for exception-heavy programs. Thus, we plan to address the limitations and integrate it cleanly with mainstream tools, such as \prog{perf}. Another research direction is to investigate how to compress even more the original DWARF unwinding data using outlining techniques, as we already do for the compiled data successfully. % What is next? In which respect is your approach general? % What did your contribution bring to the area? % What should be done now? % What is the good \emph{next} question? \pagestyle{plain} \newpage