Rework fiche de synthèse

2018-08-03 17:36:57 +02:00 · 2018-08-03 17:36:57 +02:00 · 8203502e9a
commit 8203502e9a
parent a06b1a915a
2 changed files with 64 additions and 42 deletions
--- a/report/fiche_synthese.tex
+++ b/report/fiche_synthese.tex
@ -9,9 +9,28 @@
 \subsection*{The general context}
 The standard debugging data format for ELF binary files, DWARF, contains a lot
-of information. Among those are the stack unwinding data, which allows to
+of information, which is generated mostly when passing \eg{} the switch
-unwind stack frames, restoring machine registers to their proper values, for
+\lstbash{-g} to \prog{gcc}. This information, essentially provided for
-instance within the context of a debugger.
+debuggers, contains all that is needed to connect the generated assembly with
 the original code, information that can be used by sanitizers (\eg{} the type
 of each variable in the source language), etc.
 Even in stripped (non-debug) binaries, a small portion of DWARF data remains.
 Among this essential data that is never stripped is the stack unwinding data,
 which allows to unwind stack frames, restoring machine registers to the value
 they had in the previous frame, for instance within the context of a debugger
 or a profiler.
 This data is structured into tables, each row corresponding to an program
 counter (PC) range for which it describes valid unwinding data, and each column
 describing how to unwind a particular machine register (or virtual register
 used for various purposes). These rules are mostly basic, consisting in offsets
 from memory addresses stored in registers (such as \reg{rbp} or \reg{rsp}), but
 in some cases, they can take the form of a stack-machine expression that can
 access virtually all the process's memory and perform Turing-complete
 computation~\cite{oakley2011exploiting}.
 \subsection*{The research problem}
 As debugging data can easily get heavy beyond reasonable if stored carelessly,
 the DWARF standard pays a great attention to data compactness and compression,
@ -21,25 +40,23 @@ can be quite costly.
 This is often not a huge problem, as stack unwinding is mostly thought of as a
 debugging procedure: when something behaves unexpectedly, the programmer might
-be interested in exploring the stack, moving around between stack frames,
+be interested in exploring the stack.  Yet, stack unwinding might, in some
-tracing the program path leading to some bug, \ldots{} Yet, stack unwinding
+cases, be performance-critical: for instance, profiler programs needs to
-might, in some cases, be performance-critical: for instance, profiler programs
+perform a whole lot of stack unwindings. Even worse, exception handling relies
-needs to perform a whole lot of stack unwindings. Even worse, exception
+on stack unwinding in order to find a suitable catch-block! For such
-handling relies on stack unwinding in order to find a suitable catch-block!
+applications, it might be desirable to find a different time/space trade-off,
 allowing a slightly space-heavier, but far more time-efficient unwinding
 procedure.
-The most widely used library used for stack unwinding,
+This different trade-off is the question that I explored during this
 internship: what good alternative trade-off is reachable when storing the stack
 unwinding data completely differently?
 It seems that the subject has not really been explored yet, and as of now, the
 most widely used library for stack unwinding,
 \prog{libunwind}~\cite{libunwind}, essentially makes use of aggressive but
 fine-tuned caching and optimized code to mitigate this problem.
 \subsection*{The research problem}
 \todo{Split the previous paragraph into two paragraphs, fitting this section as
 well}
 \note{I have trouble figuring out what is expected here, and what is expected
 in the previous section…}
 % What is the question that you studied?
 % Why is it important, what are the applications/consequences?
 % Is it a new problem?
@ -48,11 +65,10 @@ in the previous section…}
 \subsection*{Your contribution}
-This internship explored the possibility to compile the standard ELF debugging
+This internship explored the possibility to compile DWARF's stack unwinding
-information format, DWARF, directly into native assembly on the x86\_64
+data directly into native assembly on the x86\_64 architecture. Instead of
-architecture. Instead of parsing and interpreting at runtime the debug data,
+parsing and interpreting at runtime the debug data, the stack unwinding data is
-the stack unwinding data is accessed as a function of a dynamically-loaded
+accessed as a function of a dynamically-loaded shared library.
 shared library.
 Multiple approaches have been tried, in order to determine which compilation
 process leads to the best time/space trade-off.
@ -74,7 +90,7 @@ allowing to benchmark \prog{perf} with both the standard stack unwinding data
 and the alternative experimental compiled format. As a free and enjoyable
 side-effect, the experimental unwinding data is perfectly interfaced with
 \prog{libunwind}, and thus interfaceable at practically no cost with any
-existing project using the common library \prog{libunwind}.
+existing project using the \textit{de facto} standard library \prog{libunwind}.
 % What is your solution to the question described in the last paragraph?
 %
@ -103,30 +119,27 @@ on around 200 cases more than \prog{libunwind} on a 27000 samples test (1099
 failures, against 885 for \prog{libunwind}).
 The prototype, unlike \prog{libunwind}, does not support $100\,\%$ of the DWARF
-instruction present in the DWARF5 standard~\cite{dwarf5std}. It is also limited
+instructions present in the DWARF5 standard~\cite{dwarf5std}. It is also
-to the x86\_64 architecture, and relies to some extent on the Linux operating
+limited to the x86\_64 architecture, and relies to some extent on the Linux
-system. But none of those limitations are real problems in practice. As argued
+operating system. But none of those limitations are real problems in practice.
-later on, the vast majority of the DWARF instructions actually used in the wild
+As argued later on, the vast majority of the DWARF instruction set actually
-are implemented; other processor architectures and ABIs are only a matter of
+used in the wild is implemented; other processor architectures and ABIs are
-time spent and engineering work; and the operating system dependency is only
+only a matter of time spent and engineering work; and the operating system
-present in the libraries developed in order to interact with the compiled
+dependency is only present in the libraries developed in order to interact with
-unwinding data, which can be developed for virtually any operating system.
+the compiled unwinding data, which can be developed for virtually any operating
 system.
 \subsection*{Summary and future work}
-In most cases of everyday's life, the slowness of stack unwinding is not a
+In most cases of everyday's life, a slow stack unwinding is not a problem, or
-problem, or even an annoyance. Yet, having a 25 times speed-up on stack
+even an annoyance. Yet, having a 25 times speed-up on stack unwinding-heavy
-unwinding-heavy tasks, such as profiling, can be really useful to analyse heavy
+tasks, such as profiling, can be really useful to profile heavy programs,
-programs, particularly if one wants to profile many times in order to analyze
+particularly if one wants to profile many times in order to analyze the impact
-the impact of multiple changes. It can also be useful for exception-heavy
+of multiple changes. It can also be useful for exception-heavy
 programs~\qtodo{cite Stephen's software?}. Thus, it might be interesting to
 implement a more stable version, and try to interface it cleanly with
 mainstream tools, such as \prog{perf}.
 It might also be interesting to investigate whether it is possible to reach
 even greater speeds by using some more complex compilation process that would
 have yet to be determined.
 Another question worth exploring might be whether it is possible to shrink even
 more the original DWARF unwinding data, which would be stored in a format not
 too far from the original standard, by applying techniques close to those
--- a/shared/report.bib
+++ b/shared/report.bib
@ -16,3 +16,12 @@
    title           = {Libunwind webpage},
    url             = {http://www.nongnu.org/libunwind/},
 }
@inproceedings{oakley2011exploiting,
  title={Exploiting the Hard-Working DWARF: Trojan and Exploit Techniques with No Native Executable Code.},
  author={Oakley, James and Bratus, Sergey},
  booktitle={WOOT},
  pages={91--102},
  year={2011}
 }