Rephrase everything but section 2

This commit is contained in:
Théophile Bastian 2018-08-18 22:06:55 +02:00
parent f0809dbf1c
commit 2f44049506

View file

@ -702,14 +702,14 @@ Listing~\ref{lst:unw_ctx}. The function will then return a fresh memory
context, containing the values the registers hold after unwinding this frame. context, containing the values the registers hold after unwinding this frame.
The body of the function itself consists in a single monolithic switch, taking The body of the function itself consists in a single monolithic switch, taking
advantage of the non-standard --~yet widely implemented in C compilers~-- advantage of the non-standard --~yet overwhelmingly implemented in common C
syntax for range switches, in which each \lstinline{case} can refer to a range. compilers~-- syntax for range switches, in which each \lstinline{case} can
All the FDEs are merged together into this switch, each row of a FDE being a refer to a range, \eg{} \lstc{case 17 ... 42:}. All the FDEs are merged
switch case. Separating the various FDEs in the C code --~other than with together into this switch, each row of a FDE being a switch case. Separating
comments~-- is, unlike what is done in DWARF, pointless, since accessing a the various FDEs in the C code --~other than with comments~-- is, unlike what
``row'' has a linear cost, and the C code is not meant to be read, except maybe is done in DWARF, pointless, since accessing a ``row'' has a linear cost, and
for debugging purposes. The switch cases bodies then fill a context with the C code is not meant to be read, except maybe for debugging purposes. The
unwound values, then return it. switch cases bodies then fill a context with unwound values before return it.
A setting of the compiler also optionally enables another parameter to the A setting of the compiler also optionally enables another parameter to the
\lstc{_eh_elf} function, \lstc{deref}, which is a function pointer. This \lstc{_eh_elf} function, \lstc{deref}, which is a function pointer. This
@ -724,12 +724,12 @@ real-world-proof version of the \ehelfs, the choice was made to keep this
implementation simple, and only handle the few registers that were needed to implementation simple, and only handle the few registers that were needed to
simply unwind the stack. Thus, the only registers handled in \ehelfs{} are simply unwind the stack. Thus, the only registers handled in \ehelfs{} are
\reg{rip}, \reg{rbp}, \reg{rsp} and \reg{rbx}, the latter being used a few \reg{rip}, \reg{rbp}, \reg{rsp} and \reg{rbx}, the latter being used a few
times in \prog{libc} to hold the CFA address in common functions. This is times in \prog{libc} and other less common libraries to hold the CFA address in
enough to unwind the stack reliably, and thus enough for profiling, but is not common functions. This is enough to unwind the stack reliably, and thus enough
sufficient to analyze every stack frame as \prog{gdb} would do after a for profiling, but is not sufficient to analyze every stack frame as \prog{gdb}
\lstbash{frame n} command. Yet, if one was to enhance the code to handle every would do after a \lstbash{frame n} command. Yet, if one was to enhance the
register, it would not be much harder and would probably be only a few hours of code to handle every register, it would not be much harder and would probably
code refactoring and rewriting. be only a few hours worth of code refactoring and rewriting.
\lstinputlisting[language=C, caption={Unwinding context}, label={lst:unw_ctx}] \lstinputlisting[language=C, caption={Unwinding context}, label={lst:unw_ctx}]
{src/dwarf_assembly_context/unwind_context.c} {src/dwarf_assembly_context/unwind_context.c}
@ -754,17 +754,19 @@ on or off, and it doesn't require to alter the base system by editing \eg{}
\texttt{/usr/lib/libc-*.so}. Instead, when the \ehelf{} data is required, those \texttt{/usr/lib/libc-*.so}. Instead, when the \ehelf{} data is required, those
files can simply be \lstc{dlopen}'d. It is also possible to imagine, in a files can simply be \lstc{dlopen}'d. It is also possible to imagine, in a
future environment production, packaging \ehelfs{} files separately, so that future environment production, packaging \ehelfs{} files separately, so that
people interested in heavy computation can have the choice to install them. people interested in better performance can have the choice to install them.
This, in particular, means that each ELF file has its unwinding data in a This, in particular, means that each ELF file has its unwinding data in a
separate \ehelf{} file --~just like with DWARF, where each ELF retains its own separate \ehelf{} file, implying that the unwinding data for a given program is
DWARF data. Thus, an unwinder must first acquire a \emph{memory map}, a table scattered among various \ehelf{} files, one for each shared object loaded
listing the various ELF files loaded and \emph{mapped} in memory, and on which --~just like with DWARF, where each ELF retains its own DWARF data. Thus, an
memory segment. This memory map is provided by the operating system --~for unwinder must first acquire a \emph{memory map}, a table listing the various
instance, on Linux, it is available as a file in \texttt{/proc}. Once this map ELF files loaded and \emph{mapped} in memory, and on which memory segment. This
is acquired, when unwinding from a given IP, the unwinder must identify the memory map is provided by the operating system --~for instance, on Linux, it is
memory segment from which it comes, deduce the source ELF file, and deduce the available as a file in \texttt{/proc}. Once this map is acquired, when
corresponding \ehelf. unwinding from a given IP, the unwinder must identify the memory segment from
which it comes, deduce the source ELF file, and deduce the corresponding
\ehelf.
\medskip \medskip
@ -772,8 +774,8 @@ corresponding \ehelf.
label={lst:fib7_eh_elf_basic}] label={lst:fib7_eh_elf_basic}]
{src/fib7/fib7.eh_elf_basic.c} {src/fib7/fib7.eh_elf_basic.c}
The C code in Listing~\ref{lst:fib7_eh_elf_basic} is a part of what was The C code in Listing~\ref{lst:fib7_eh_elf_basic} is the relevant part of what
generated for the C code in Listing~\ref{lst:ex1_c}. was generated for the C code in Listing~\ref{lst:ex1_c}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{First results} \subsection{First results}
@ -817,13 +819,13 @@ it depends.
The first column only includes the sizes of the ELF sections \lstc{.text} (the The first column only includes the sizes of the ELF sections \lstc{.text} (the
program itself) and \lstc{.rodata}, the read-only data (such as static strings, program itself) and \lstc{.rodata}, the read-only data (such as static strings,
etc.). Only the weight of the \lstc{.text} section of the generated \ehelfs{} etc.). Only the weight of the \lstc{.text} section of the generated \ehelfs{}
is considered, because it is self-consistent (few data or none is stored in is considered, because it is self-contained (few data or none is stored in
\lstc{.rodata}), and the other sections could be removed if the \ehelfs{} \lstc{.rodata}), and the other sections could be removed if the \ehelfs{}
\lstc{.text} was somehow embedded in the original shared object. \lstc{.text} was somehow embedded in the original shared object.
This first tentative version of \ehelfs{} is roughly 7 times heavier than the This first tentative version of \ehelfs{} is roughly 7 times heavier than the
original \lstc{.eh_frame}, and represents a far too significant proportion of original \lstc{.eh_frame}, and represents a far too significant proportion of
the original program size. the original program size ($65\,\%$).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Space optimization}\label{ssec:space_optim} \subsection{Space optimization}\label{ssec:space_optim}
@ -838,13 +840,13 @@ The major optimization that most reduced the output size was to use an if/else
tree implementing a binary search on the instruction pointer relevant tree implementing a binary search on the instruction pointer relevant
intervals, instead of a single monolithic switch. In the process, we also intervals, instead of a single monolithic switch. In the process, we also
\emph{outline} code whenever possible, that is, find out identical ``switch \emph{outline} code whenever possible, that is, find out identical ``switch
cases'' bodies --~which are not switch cases anymore, but if bodies~--, move cases'' bodies --~which are not switch cases anymore, but \texttt{if}
them outside of the if/else tree, identify them by a label, and jump to them bodies~--, move them outside of the if/else tree, identify them by a label, and
using a \lstc{goto}, which de-duplicates a lot of code and contributes greatly jump to them using a \lstc{goto}, which de-duplicates a lot of code and
to the shrinking. In the process, we noticed that the vast majority of FDE rows contributes greatly to the shrinking. In the process, we noticed that the vast
are actually taken among very few ``common'' FDE rows. For instance, in the majority of FDE rows are actually taken among very few ``common'' FDE rows. For
\prog{libc}, out of a total of $20827$ rows, only $302$ ($1.5\,\%$) remain instance, in the \prog{libc}, out of a total of $20827$ rows, only $302$
after the outlining. ($1.5\,\%$) unique rows remain after the outlining.
This makes this optimization really efficient, as seen later in This makes this optimization really efficient, as seen later in
Section~\ref{ssec:results_size}, but also makes it an interesting question Section~\ref{ssec:results_size}, but also makes it an interesting question
@ -874,13 +876,13 @@ solution working.
\subsection{Requirements}\label{ssec:bench_req} \subsection{Requirements}\label{ssec:bench_req}
To provide relevant benchmarks of the \ehelfs{} performance, one must sample at To provide relevant benchmarks of the \ehelfs{} performance, one must sample at
least a few hundreds or thousands of stack unwinding, since a single frame least a few hundreds or thousands of stack unwindings, since a single frame
unwinding with regular DWARF takes the order of magnitude of $10\,\mu s$, and unwinding with regular DWARF takes the order of magnitude of $10\,\mu s$, and
\ehelfs{} were expected to have significantly better performance. \ehelfs{} were expected to have significantly better performance.
However, unwinding over and over again from the same program point would have However, unwinding over and over again from the same program point would have
had no interest at all, since \prog{libunwind} would have simply cached the had no interest at all, since \prog{libunwind} would have simply cached the
relevant DWARF row. In the mean time, making sure that the various unwinding relevant DWARF rows. In the mean time, making sure that the various unwindings
are made from different locations is somehow cheating, since it makes useless are made from different locations is somehow cheating, since it makes useless
\prog{libunwind}'s caching and does not reproduce ``real-world'' unwinding \prog{libunwind}'s caching and does not reproduce ``real-world'' unwinding
distribution. All in all, the benchmarking method must have a ``natural'' distribution. All in all, the benchmarking method must have a ``natural''
@ -892,8 +894,8 @@ stack unwindings crossing some standard library functions, starting from inside
them, etc. them, etc.
Finally, the unwound program must be interesting enough to enter and exit Finally, the unwound program must be interesting enough to enter and exit
functions often, building a good stack of nested function calls (at least 5 functions often, building a good stack of nested function calls (at least
frequently), have FDEs that are not as simple as in Listing~\ref{lst:ex1_dw}, frequently 5), have FDEs that are not as simple as in Listing~\ref{lst:ex1_dw},
etc. etc.
@ -925,7 +927,8 @@ Section~\ref{ssec:bench_req} above: since it stops at regular intervals and
unwinds, the unwindings are evenly distributed \wrt{} the frequency of unwinds, the unwindings are evenly distributed \wrt{} the frequency of
execution of the code, which is a natural enough setup for the benchmarks to be execution of the code, which is a natural enough setup for the benchmarks to be
meaningful, while still unwinding from diversified locations, preventing meaningful, while still unwinding from diversified locations, preventing
caching from being be overwhelming. It also has the ability to unwind from caching from being be overwhelming --~as can be observed later in
Section~\ref{ssec:timeperf}. It also has the ability to unwind from
within any function, included functions of linked shared libraries. It can also within any function, included functions of linked shared libraries. It can also
be applied to virtually any program, which allows unwinding ``interesting'' be applied to virtually any program, which allows unwinding ``interesting''
code. code.
@ -944,27 +947,26 @@ turned out necessary to slightly modify \prog{libunwind}'s interface to add a
parameter to an initialisation function, since \prog{libunwind} is made to be parameter to an initialisation function, since \prog{libunwind} is made to be
agnostic of the system and process as much as possible, to be able to unwind in agnostic of the system and process as much as possible, to be able to unwind in
any context. This very restricted information lacked a memory map (see any context. This very restricted information lacked a memory map (see
Section~\ref{ssec:ehelfs}) in order to use \ehelfs. Apart from this, the Section~\ref{ssec:ehelfs}) in order to use \ehelfs{} --~while, on the other
modified version of \prog{libunwind} produced is entirely compatible with the hand, providing information about the original DWARF that are now useless.
vanilla version. This means that the only modifications required to use Apart from this, the modified version of \prog{libunwind} produced is entirely
\ehelfs{} within any project using \prog{libunwind} should be changing one line compatible with the vanilla version. This means that the only modifications
of code to add one parameter to a function call and linking against the required to use \ehelfs{} within any project using \prog{libunwind} should be
modified version of \prog{libunwind} instead of the system version. changing one line of code to add one parameter to a function call and linking
against the modified version of \prog{libunwind} instead of the system version.
Once this was done, plugging it in \prog{perf} was the matter of a few lines of Once this was done, plugging it in \prog{perf} was the matter of a few lines of
code only, left apart the benchmarking code. The major problem encountered was code only, left apart the benchmarking code. The major problem encountered was
to understand how \prog{perf} works. In order to avoid perturbing the traced to understand how \prog{perf} works. In order to avoid perturbing the traced
program, \prog{perf} does not unwind at runtime, but rather records at regular program, \prog{perf} does not unwind at runtime, but rather records at regular
intervals the program's stack, and all the auxiliary information that is needed intervals the program's stack, and all the auxiliary information that is needed
to unwind later. This is done when running \lstbash{perf record}. Then, to unwind later. This is done when running \lstbash{perf record}. Then, a
\lstbash{perf report} unwinds the stack to analyze it; but at this point of subsequent call to \lstbash{perf report} unwinds the stack to analyze it; but
time, the traced process is long dead, thus any PID-based approach, or any at this point of time, the traced process is long dead. Thus, any PID-based
approach using \texttt{/proc} information will fail. However, as this was the approach, or any approach using \texttt{/proc} information will fail. However,
easiest method, the first version of \ehelfs{} used those mechanisms; thus as this was the easiest method, the first version of \ehelfs{} used those
requiring some code rewriting. mechanisms; it took some code rewriting to move to a PID- and
\texttt{/proc}-agnostic implementation.
The modified versions of both \prog{perf} and \prog{libunwind} are present in
the repositories \prog{perf-eh\_elf} and \prog{libunwind-eh\_elf}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Other explored methods} \subsection{Other explored methods}
@ -1052,6 +1054,11 @@ instruction, however, would not slow down at all the implementation, since
every instruction would simply be compiled to x86\_64 without affecting the every instruction would simply be compiled to x86\_64 without affecting the
already supported code. already supported code.
The fact that there is a sharp difference between cached and uncached
\prog{libunwind} confirm that our experimental setup did not unwind at totally
different locations every single time, and thus was not biased in this
direction, since caching is still very efficient.
It is also worth noting that the compilation time of \ehelfs{} is also It is also worth noting that the compilation time of \ehelfs{} is also
reasonably short. On the machine described in Section~\ref{ssec:bench_hw}, and reasonably short. On the machine described in Section~\ref{ssec:bench_hw}, and
without using multiple cores to compile, the various shared objects needed to without using multiple cores to compile, the various shared objects needed to
@ -1117,8 +1124,10 @@ Section~\ref{ssec:instr_cov}).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Instructions coverage}\label{ssec:instr_cov} \subsection{Instructions coverage}\label{ssec:instr_cov}
In order to determine which proportion of real-world ELF instructions are In order to determine which DWARF instructions are necessary to implement to
covered by our compiler and \ehelfs. have meaningful results, as well as to assess the instruction coverage of our
compiler and \ehelfs, we must look at real-world ELF files and inspect the
instructions used.
The method chosen was to take a random uniform sample of 4000 ELFs among those The method chosen was to take a random uniform sample of 4000 ELFs among those
present on a basic ArchLinux system setup, in the directories \texttt{/bin}, present on a basic ArchLinux system setup, in the directories \texttt{/bin},
@ -1211,7 +1220,7 @@ instructions encountered that were not supported by \ehelfs. The first row is
only concerned about the columns CFA, \reg{rip}, \reg{rsp}, \reg{rbp} and only concerned about the columns CFA, \reg{rip}, \reg{rsp}, \reg{rbp} and
\reg{rbx} (the supported registers --~see Section~\ref{ssec:ehelfs}). The \reg{rbx} (the supported registers --~see Section~\ref{ssec:ehelfs}). The
second row analyzes all the columns that were encountered, no matter whether second row analyzes all the columns that were encountered, no matter whether
supported or not. supported or not in \ehelfs.
The Table~\ref{table:instr_types} analyzes the proportion of each command The Table~\ref{table:instr_types} analyzes the proportion of each command
--~the formal way a register is set~-- for non-CFA columns in the sampled data. For --~the formal way a register is set~-- for non-CFA columns in the sampled data. For
@ -1221,11 +1230,13 @@ means stored at the address of an expression's result, and the \texttt{Val\_}
prefix means that the value must not be dereferenced. Overall, it can be seen prefix means that the value must not be dereferenced. Overall, it can be seen
that supporting \texttt{Offset} already means supporting the vast majority of that supporting \texttt{Offset} already means supporting the vast majority of
registers. The data gathered (not reproduced here) also suggests that registers. The data gathered (not reproduced here) also suggests that
supporting a few common expressions is enough to support most of them. supporting a few common expressions is enough to support most of them. This is
further supported by the fact that we already support more than $80\,\%$ of
expressions only by supporting two basic constructs.
It is also worth noting that of all the 4000 analyzed files, there are only 12 It is also worth noting that among all of the 4000 analyzed files, all the
that contained all the unsupported expressions seen, and only 24 that contained unsupported expressions are clustered in only 12 of them, and only 24 contained
some unsupported instruction at all. unsupported instructions at all.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%