Rephrase everything but section 2

2018-08-18 22:06:55 +02:00 · 2018-08-18 22:06:55 +02:00 · 2f44049506
commit 2f44049506
parent f0809dbf1c
1 changed files with 72 additions and 61 deletions
--- a/report/report.tex
+++ b/report/report.tex
@ -702,14 +702,14 @@ Listing~\ref{lst:unw_ctx}. The function will then return a fresh memory
 context, containing the values the registers hold after unwinding this frame.
 The body of the function itself consists in a single monolithic switch, taking
-advantage of the non-standard --~yet widely implemented in C compilers~--
+advantage of the non-standard --~yet overwhelmingly implemented in common C
-syntax for range switches, in which each \lstinline{case} can refer to a range.
+compilers~-- syntax for range switches, in which each \lstinline{case} can
-All the FDEs are merged together into this switch, each row of a FDE being a
+refer to a range, \eg{} \lstc{case 17 ... 42:}.  All the FDEs are merged
-switch case.  Separating the various FDEs in the C code --~other than with
+together into this switch, each row of a FDE being a switch case.  Separating
-comments~-- is, unlike what is done in DWARF, pointless, since accessing a
+the various FDEs in the C code --~other than with comments~-- is, unlike what
-``row'' has a linear cost, and the C code is not meant to be read, except maybe
+is done in DWARF, pointless, since accessing a ``row'' has a linear cost, and
-for debugging purposes. The switch cases bodies then fill a context with
+the C code is not meant to be read, except maybe for debugging purposes. The
-unwound values, then return it.
+switch cases bodies then fill a context with unwound values before return it.
 A setting of the compiler also optionally enables another parameter to the
 \lstc{_eh_elf} function, \lstc{deref}, which is a function pointer. This
@ -724,12 +724,12 @@ real-world-proof version of the \ehelfs, the choice was made to keep this
 implementation simple, and only handle the few registers that were needed to
 simply unwind the stack. Thus, the only registers handled in \ehelfs{} are
 \reg{rip}, \reg{rbp}, \reg{rsp} and \reg{rbx}, the latter being used a few
-times in \prog{libc} to hold the CFA address in common functions. This is
+times in \prog{libc} and other less common libraries to hold the CFA address in
-enough to unwind the stack reliably, and thus enough for profiling, but is not
+common functions. This is enough to unwind the stack reliably, and thus enough
-sufficient to analyze every stack frame as \prog{gdb} would do after a
+for profiling, but is not sufficient to analyze every stack frame as \prog{gdb}
-\lstbash{frame n} command.  Yet, if one was to enhance the code to handle every
+would do after a \lstbash{frame n} command.  Yet, if one was to enhance the
-register, it would not be much harder and would probably be only a few hours of
+code to handle every register, it would not be much harder and would probably
-code refactoring and rewriting.
+be only a few hours worth of code refactoring and rewriting.
 \lstinputlisting[language=C, caption={Unwinding context}, label={lst:unw_ctx}]
    {src/dwarf_assembly_context/unwind_context.c}
@ -754,17 +754,19 @@ on or off, and it doesn't require to alter the base system by editing \eg{}
 \texttt{/usr/lib/libc-*.so}. Instead, when the \ehelf{} data is required, those
 files can simply be \lstc{dlopen}'d. It is also possible to imagine, in a
 future environment production, packaging \ehelfs{} files separately, so that
-people interested in heavy computation can have the choice to install them.
+people interested in better performance can have the choice to install them.
 This, in particular, means that each ELF file has its unwinding data in a
-separate \ehelf{} file --~just like with DWARF, where each ELF retains its own
+separate \ehelf{} file, implying that the unwinding data for a given program is
-DWARF data. Thus, an unwinder must first acquire a \emph{memory map}, a table
+scattered among various \ehelf{} files, one for each shared object loaded
-listing the various ELF files loaded and \emph{mapped} in memory, and on which
+--~just like with DWARF, where each ELF retains its own DWARF data. Thus, an
-memory segment. This memory map is provided by the operating system --~for
+unwinder must first acquire a \emph{memory map}, a table listing the various
-instance, on Linux, it is available as a file in \texttt{/proc}. Once this map
+ELF files loaded and \emph{mapped} in memory, and on which memory segment. This
-is acquired, when unwinding from a given IP, the unwinder must identify the
+memory map is provided by the operating system --~for instance, on Linux, it is
-memory segment from which it comes, deduce the source ELF file, and deduce the
+available as a file in \texttt{/proc}. Once this map is acquired, when
-corresponding \ehelf.
+unwinding from a given IP, the unwinder must identify the memory segment from
 which it comes, deduce the source ELF file, and deduce the corresponding
 \ehelf.
 \medskip
@ -772,8 +774,8 @@ corresponding \ehelf.
                 label={lst:fib7_eh_elf_basic}]
                 {src/fib7/fib7.eh_elf_basic.c}
-The C code in Listing~\ref{lst:fib7_eh_elf_basic} is a part of what was
+The C code in Listing~\ref{lst:fib7_eh_elf_basic} is the relevant part of what
-generated for the C code in Listing~\ref{lst:ex1_c}.
+was generated for the C code in Listing~\ref{lst:ex1_c}.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{First results}
@ -817,13 +819,13 @@ it depends.
 The first column only includes the sizes of the ELF sections \lstc{.text} (the
 program itself) and \lstc{.rodata}, the read-only data (such as static strings,
 etc.). Only the weight of the \lstc{.text} section of the generated \ehelfs{}
-is considered, because it is self-consistent (few data or none is stored in
+is considered, because it is self-contained (few data or none is stored in
 \lstc{.rodata}), and the other sections could be removed if the \ehelfs{}
 \lstc{.text} was somehow embedded in the original shared object.
 This first tentative version of \ehelfs{} is roughly 7 times heavier than the
 original \lstc{.eh_frame}, and represents a far too significant proportion of
-the original program size.
+the original program size ($65\,\%$).
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{Space optimization}\label{ssec:space_optim}
@ -838,13 +840,13 @@ The major optimization that most reduced the output size was to use an if/else
 tree implementing a binary search on the instruction pointer relevant
 intervals, instead of a single monolithic switch. In the process, we also
 \emph{outline} code whenever possible, that is, find out identical ``switch
-cases'' bodies --~which are not switch cases anymore, but if bodies~--, move
+cases'' bodies --~which are not switch cases anymore, but \texttt{if}
-them outside of the if/else tree, identify them by a label, and jump to them
+bodies~--, move them outside of the if/else tree, identify them by a label, and
-using a \lstc{goto}, which de-duplicates a lot of code and contributes greatly
+jump to them using a \lstc{goto}, which de-duplicates a lot of code and
-to the shrinking. In the process, we noticed that the vast majority of FDE rows
+contributes greatly to the shrinking. In the process, we noticed that the vast
-are actually taken among very few ``common'' FDE rows. For instance, in the
+majority of FDE rows are actually taken among very few ``common'' FDE rows. For
-\prog{libc}, out of a total of $20827$ rows, only $302$ ($1.5\,\%$) remain
+instance, in the \prog{libc}, out of a total of $20827$ rows, only $302$
-after the outlining.
+($1.5\,\%$) unique rows remain after the outlining.
 This makes this optimization really efficient, as seen later in
 Section~\ref{ssec:results_size}, but also makes it an interesting question
@ -874,13 +876,13 @@ solution working.
 \subsection{Requirements}\label{ssec:bench_req}
 To provide relevant benchmarks of the \ehelfs{} performance, one must sample at
-least a few hundreds or thousands of stack unwinding, since a single frame
+least a few hundreds or thousands of stack unwindings, since a single frame
 unwinding with regular DWARF takes the order of magnitude of $10\,\mu s$, and
 \ehelfs{} were expected to have significantly better performance.
 However, unwinding over and over again from the same program point would have
 had no interest at all, since \prog{libunwind} would have simply cached the
-relevant DWARF row. In the mean time, making sure that the various unwinding
+relevant DWARF rows. In the mean time, making sure that the various unwindings
 are made from different locations is somehow cheating, since it makes useless
 \prog{libunwind}'s caching and does not reproduce ``real-world'' unwinding
 distribution. All in all, the benchmarking method must have a ``natural''
@ -892,8 +894,8 @@ stack unwindings crossing some standard library functions, starting from inside
 them, etc.
 Finally, the unwound program must be interesting enough to enter and exit
-functions often, building a good stack of nested function calls (at least 5
+functions often, building a good stack of nested function calls (at least
-frequently), have FDEs that are not as simple as in Listing~\ref{lst:ex1_dw},
+frequently 5), have FDEs that are not as simple as in Listing~\ref{lst:ex1_dw},
 etc.
@ -925,7 +927,8 @@ Section~\ref{ssec:bench_req} above: since it stops at regular intervals and
 unwinds, the unwindings are evenly distributed \wrt{} the frequency of
 execution of the code, which is a natural enough setup for the benchmarks to be
 meaningful, while still unwinding from diversified locations, preventing
-caching from being be overwhelming. It also has the ability to unwind from
+caching from being be overwhelming --~as can be observed later in
 Section~\ref{ssec:timeperf}. It also has the ability to unwind from
 within any function, included functions of linked shared libraries. It can also
 be applied to virtually any program, which allows unwinding ``interesting''
 code.
@ -944,27 +947,26 @@ turned out necessary to slightly modify \prog{libunwind}'s interface to add a
 parameter to an initialisation function, since \prog{libunwind} is made to be
 agnostic of the system and process as much as possible, to be able to unwind in
 any context.  This very restricted information lacked a memory map (see
-Section~\ref{ssec:ehelfs}) in order to use \ehelfs. Apart from this, the
+Section~\ref{ssec:ehelfs}) in order to use \ehelfs{} --~while, on the other
-modified version of \prog{libunwind} produced is entirely compatible with the
+hand, providing information about the original DWARF that are now useless.
-vanilla version. This means that the only modifications required to use
+Apart from this, the modified version of \prog{libunwind} produced is entirely
-\ehelfs{} within any project using \prog{libunwind} should be changing one line
+compatible with the vanilla version. This means that the only modifications
-of code to add one parameter to a function call and linking against the
+required to use \ehelfs{} within any project using \prog{libunwind} should be
-modified version of \prog{libunwind} instead of the system version.
+changing one line of code to add one parameter to a function call and linking
 against the modified version of \prog{libunwind} instead of the system version.
 Once this was done, plugging it in \prog{perf} was the matter of a few lines of
 code only, left apart the benchmarking code. The major problem encountered was
 to understand how \prog{perf} works. In order to avoid perturbing the traced
 program, \prog{perf} does not unwind at runtime, but rather records at regular
 intervals the program's stack, and all the auxiliary information that is needed
-to unwind later. This is done when running \lstbash{perf record}. Then,
+to unwind later. This is done when running \lstbash{perf record}. Then, a
-\lstbash{perf report} unwinds the stack to analyze it; but at this point of
+subsequent call to \lstbash{perf report} unwinds the stack to analyze it; but
-time, the traced process is long dead, thus any PID-based approach, or any
+at this point of time, the traced process is long dead. Thus, any PID-based
-approach using \texttt{/proc} information will fail. However, as this was the
+approach, or any approach using \texttt{/proc} information will fail. However,
-easiest method, the first version of \ehelfs{} used those mechanisms; thus
+as this was the easiest method, the first version of \ehelfs{} used those
-requiring some code rewriting.
+mechanisms; it took some code rewriting to move to a PID- and
-
+\texttt{/proc}-agnostic implementation.
 The modified versions of both \prog{perf} and \prog{libunwind} are present in
 the repositories \prog{perf-eh\_elf} and \prog{libunwind-eh\_elf}.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{Other explored methods}
@ -1052,6 +1054,11 @@ instruction, however, would not slow down at all the implementation, since
 every instruction would simply be compiled to x86\_64 without affecting the
 already supported code.
 The fact that there is a sharp difference between cached and uncached
 \prog{libunwind} confirm that our experimental setup did not unwind at totally
 different locations every single time, and thus was not biased in this
 direction, since caching is still very efficient.
 It is also worth noting that the compilation time of \ehelfs{} is also
 reasonably short. On the machine described in Section~\ref{ssec:bench_hw}, and
 without using multiple cores to compile, the various shared objects needed to
@ -1117,8 +1124,10 @@ Section~\ref{ssec:instr_cov}).
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{Instructions coverage}\label{ssec:instr_cov}
-In order to determine which proportion of real-world ELF instructions are
+In order to determine which DWARF instructions are necessary to implement to
-covered by our compiler and \ehelfs.
+have meaningful results, as well as to assess the instruction coverage of our
 compiler and \ehelfs, we must look at real-world ELF files and inspect the
 instructions used.
 The method chosen was to take a random uniform sample of 4000 ELFs among those
 present on a basic ArchLinux system setup, in the directories \texttt{/bin},
@ -1211,7 +1220,7 @@ instructions encountered that were not supported by \ehelfs. The first row is
 only concerned about the columns CFA, \reg{rip}, \reg{rsp}, \reg{rbp} and
 \reg{rbx} (the supported registers --~see Section~\ref{ssec:ehelfs}). The
 second row analyzes all the columns that were encountered, no matter whether
-supported or not.
+supported or not in \ehelfs.
 The Table~\ref{table:instr_types} analyzes the proportion of each command
 --~the formal way a register is set~-- for non-CFA columns in the sampled data. For
@ -1221,11 +1230,13 @@ means stored at the address of an expression's result, and the \texttt{Val\_}
 prefix means that the value must not be dereferenced. Overall, it can be seen
 that supporting \texttt{Offset} already means supporting the vast majority of
 registers. The data gathered (not reproduced here) also suggests that
-supporting a few common expressions is enough to support most of them.
+supporting a few common expressions is enough to support most of them. This is
 further supported by the fact that we already support more than $80\,\%$ of
 expressions only by supporting two basic constructs.
-It is also worth noting that of all the 4000 analyzed files, there are only 12
+It is also worth noting that among all of the 4000 analyzed files, all the
-that contained all the unsupported expressions seen, and only 24 that contained
+unsupported expressions are clustered in only 12 of them, and only 24 contained
-some unsupported instruction at all.
+unsupported instructions at all.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%