Rephrase everything but section 2
This commit is contained in:
parent
f0809dbf1c
commit
2f44049506
1 changed files with 72 additions and 61 deletions
|
@ -702,14 +702,14 @@ Listing~\ref{lst:unw_ctx}. The function will then return a fresh memory
|
||||||
context, containing the values the registers hold after unwinding this frame.
|
context, containing the values the registers hold after unwinding this frame.
|
||||||
|
|
||||||
The body of the function itself consists in a single monolithic switch, taking
|
The body of the function itself consists in a single monolithic switch, taking
|
||||||
advantage of the non-standard --~yet widely implemented in C compilers~--
|
advantage of the non-standard --~yet overwhelmingly implemented in common C
|
||||||
syntax for range switches, in which each \lstinline{case} can refer to a range.
|
compilers~-- syntax for range switches, in which each \lstinline{case} can
|
||||||
All the FDEs are merged together into this switch, each row of a FDE being a
|
refer to a range, \eg{} \lstc{case 17 ... 42:}. All the FDEs are merged
|
||||||
switch case. Separating the various FDEs in the C code --~other than with
|
together into this switch, each row of a FDE being a switch case. Separating
|
||||||
comments~-- is, unlike what is done in DWARF, pointless, since accessing a
|
the various FDEs in the C code --~other than with comments~-- is, unlike what
|
||||||
``row'' has a linear cost, and the C code is not meant to be read, except maybe
|
is done in DWARF, pointless, since accessing a ``row'' has a linear cost, and
|
||||||
for debugging purposes. The switch cases bodies then fill a context with
|
the C code is not meant to be read, except maybe for debugging purposes. The
|
||||||
unwound values, then return it.
|
switch cases bodies then fill a context with unwound values before return it.
|
||||||
|
|
||||||
A setting of the compiler also optionally enables another parameter to the
|
A setting of the compiler also optionally enables another parameter to the
|
||||||
\lstc{_eh_elf} function, \lstc{deref}, which is a function pointer. This
|
\lstc{_eh_elf} function, \lstc{deref}, which is a function pointer. This
|
||||||
|
@ -724,12 +724,12 @@ real-world-proof version of the \ehelfs, the choice was made to keep this
|
||||||
implementation simple, and only handle the few registers that were needed to
|
implementation simple, and only handle the few registers that were needed to
|
||||||
simply unwind the stack. Thus, the only registers handled in \ehelfs{} are
|
simply unwind the stack. Thus, the only registers handled in \ehelfs{} are
|
||||||
\reg{rip}, \reg{rbp}, \reg{rsp} and \reg{rbx}, the latter being used a few
|
\reg{rip}, \reg{rbp}, \reg{rsp} and \reg{rbx}, the latter being used a few
|
||||||
times in \prog{libc} to hold the CFA address in common functions. This is
|
times in \prog{libc} and other less common libraries to hold the CFA address in
|
||||||
enough to unwind the stack reliably, and thus enough for profiling, but is not
|
common functions. This is enough to unwind the stack reliably, and thus enough
|
||||||
sufficient to analyze every stack frame as \prog{gdb} would do after a
|
for profiling, but is not sufficient to analyze every stack frame as \prog{gdb}
|
||||||
\lstbash{frame n} command. Yet, if one was to enhance the code to handle every
|
would do after a \lstbash{frame n} command. Yet, if one was to enhance the
|
||||||
register, it would not be much harder and would probably be only a few hours of
|
code to handle every register, it would not be much harder and would probably
|
||||||
code refactoring and rewriting.
|
be only a few hours worth of code refactoring and rewriting.
|
||||||
|
|
||||||
\lstinputlisting[language=C, caption={Unwinding context}, label={lst:unw_ctx}]
|
\lstinputlisting[language=C, caption={Unwinding context}, label={lst:unw_ctx}]
|
||||||
{src/dwarf_assembly_context/unwind_context.c}
|
{src/dwarf_assembly_context/unwind_context.c}
|
||||||
|
@ -754,17 +754,19 @@ on or off, and it doesn't require to alter the base system by editing \eg{}
|
||||||
\texttt{/usr/lib/libc-*.so}. Instead, when the \ehelf{} data is required, those
|
\texttt{/usr/lib/libc-*.so}. Instead, when the \ehelf{} data is required, those
|
||||||
files can simply be \lstc{dlopen}'d. It is also possible to imagine, in a
|
files can simply be \lstc{dlopen}'d. It is also possible to imagine, in a
|
||||||
future environment production, packaging \ehelfs{} files separately, so that
|
future environment production, packaging \ehelfs{} files separately, so that
|
||||||
people interested in heavy computation can have the choice to install them.
|
people interested in better performance can have the choice to install them.
|
||||||
|
|
||||||
This, in particular, means that each ELF file has its unwinding data in a
|
This, in particular, means that each ELF file has its unwinding data in a
|
||||||
separate \ehelf{} file --~just like with DWARF, where each ELF retains its own
|
separate \ehelf{} file, implying that the unwinding data for a given program is
|
||||||
DWARF data. Thus, an unwinder must first acquire a \emph{memory map}, a table
|
scattered among various \ehelf{} files, one for each shared object loaded
|
||||||
listing the various ELF files loaded and \emph{mapped} in memory, and on which
|
--~just like with DWARF, where each ELF retains its own DWARF data. Thus, an
|
||||||
memory segment. This memory map is provided by the operating system --~for
|
unwinder must first acquire a \emph{memory map}, a table listing the various
|
||||||
instance, on Linux, it is available as a file in \texttt{/proc}. Once this map
|
ELF files loaded and \emph{mapped} in memory, and on which memory segment. This
|
||||||
is acquired, when unwinding from a given IP, the unwinder must identify the
|
memory map is provided by the operating system --~for instance, on Linux, it is
|
||||||
memory segment from which it comes, deduce the source ELF file, and deduce the
|
available as a file in \texttt{/proc}. Once this map is acquired, when
|
||||||
corresponding \ehelf.
|
unwinding from a given IP, the unwinder must identify the memory segment from
|
||||||
|
which it comes, deduce the source ELF file, and deduce the corresponding
|
||||||
|
\ehelf.
|
||||||
|
|
||||||
\medskip
|
\medskip
|
||||||
|
|
||||||
|
@ -772,8 +774,8 @@ corresponding \ehelf.
|
||||||
label={lst:fib7_eh_elf_basic}]
|
label={lst:fib7_eh_elf_basic}]
|
||||||
{src/fib7/fib7.eh_elf_basic.c}
|
{src/fib7/fib7.eh_elf_basic.c}
|
||||||
|
|
||||||
The C code in Listing~\ref{lst:fib7_eh_elf_basic} is a part of what was
|
The C code in Listing~\ref{lst:fib7_eh_elf_basic} is the relevant part of what
|
||||||
generated for the C code in Listing~\ref{lst:ex1_c}.
|
was generated for the C code in Listing~\ref{lst:ex1_c}.
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\subsection{First results}
|
\subsection{First results}
|
||||||
|
@ -817,13 +819,13 @@ it depends.
|
||||||
The first column only includes the sizes of the ELF sections \lstc{.text} (the
|
The first column only includes the sizes of the ELF sections \lstc{.text} (the
|
||||||
program itself) and \lstc{.rodata}, the read-only data (such as static strings,
|
program itself) and \lstc{.rodata}, the read-only data (such as static strings,
|
||||||
etc.). Only the weight of the \lstc{.text} section of the generated \ehelfs{}
|
etc.). Only the weight of the \lstc{.text} section of the generated \ehelfs{}
|
||||||
is considered, because it is self-consistent (few data or none is stored in
|
is considered, because it is self-contained (few data or none is stored in
|
||||||
\lstc{.rodata}), and the other sections could be removed if the \ehelfs{}
|
\lstc{.rodata}), and the other sections could be removed if the \ehelfs{}
|
||||||
\lstc{.text} was somehow embedded in the original shared object.
|
\lstc{.text} was somehow embedded in the original shared object.
|
||||||
|
|
||||||
This first tentative version of \ehelfs{} is roughly 7 times heavier than the
|
This first tentative version of \ehelfs{} is roughly 7 times heavier than the
|
||||||
original \lstc{.eh_frame}, and represents a far too significant proportion of
|
original \lstc{.eh_frame}, and represents a far too significant proportion of
|
||||||
the original program size.
|
the original program size ($65\,\%$).
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\subsection{Space optimization}\label{ssec:space_optim}
|
\subsection{Space optimization}\label{ssec:space_optim}
|
||||||
|
@ -838,13 +840,13 @@ The major optimization that most reduced the output size was to use an if/else
|
||||||
tree implementing a binary search on the instruction pointer relevant
|
tree implementing a binary search on the instruction pointer relevant
|
||||||
intervals, instead of a single monolithic switch. In the process, we also
|
intervals, instead of a single monolithic switch. In the process, we also
|
||||||
\emph{outline} code whenever possible, that is, find out identical ``switch
|
\emph{outline} code whenever possible, that is, find out identical ``switch
|
||||||
cases'' bodies --~which are not switch cases anymore, but if bodies~--, move
|
cases'' bodies --~which are not switch cases anymore, but \texttt{if}
|
||||||
them outside of the if/else tree, identify them by a label, and jump to them
|
bodies~--, move them outside of the if/else tree, identify them by a label, and
|
||||||
using a \lstc{goto}, which de-duplicates a lot of code and contributes greatly
|
jump to them using a \lstc{goto}, which de-duplicates a lot of code and
|
||||||
to the shrinking. In the process, we noticed that the vast majority of FDE rows
|
contributes greatly to the shrinking. In the process, we noticed that the vast
|
||||||
are actually taken among very few ``common'' FDE rows. For instance, in the
|
majority of FDE rows are actually taken among very few ``common'' FDE rows. For
|
||||||
\prog{libc}, out of a total of $20827$ rows, only $302$ ($1.5\,\%$) remain
|
instance, in the \prog{libc}, out of a total of $20827$ rows, only $302$
|
||||||
after the outlining.
|
($1.5\,\%$) unique rows remain after the outlining.
|
||||||
|
|
||||||
This makes this optimization really efficient, as seen later in
|
This makes this optimization really efficient, as seen later in
|
||||||
Section~\ref{ssec:results_size}, but also makes it an interesting question
|
Section~\ref{ssec:results_size}, but also makes it an interesting question
|
||||||
|
@ -874,13 +876,13 @@ solution working.
|
||||||
\subsection{Requirements}\label{ssec:bench_req}
|
\subsection{Requirements}\label{ssec:bench_req}
|
||||||
|
|
||||||
To provide relevant benchmarks of the \ehelfs{} performance, one must sample at
|
To provide relevant benchmarks of the \ehelfs{} performance, one must sample at
|
||||||
least a few hundreds or thousands of stack unwinding, since a single frame
|
least a few hundreds or thousands of stack unwindings, since a single frame
|
||||||
unwinding with regular DWARF takes the order of magnitude of $10\,\mu s$, and
|
unwinding with regular DWARF takes the order of magnitude of $10\,\mu s$, and
|
||||||
\ehelfs{} were expected to have significantly better performance.
|
\ehelfs{} were expected to have significantly better performance.
|
||||||
|
|
||||||
However, unwinding over and over again from the same program point would have
|
However, unwinding over and over again from the same program point would have
|
||||||
had no interest at all, since \prog{libunwind} would have simply cached the
|
had no interest at all, since \prog{libunwind} would have simply cached the
|
||||||
relevant DWARF row. In the mean time, making sure that the various unwinding
|
relevant DWARF rows. In the mean time, making sure that the various unwindings
|
||||||
are made from different locations is somehow cheating, since it makes useless
|
are made from different locations is somehow cheating, since it makes useless
|
||||||
\prog{libunwind}'s caching and does not reproduce ``real-world'' unwinding
|
\prog{libunwind}'s caching and does not reproduce ``real-world'' unwinding
|
||||||
distribution. All in all, the benchmarking method must have a ``natural''
|
distribution. All in all, the benchmarking method must have a ``natural''
|
||||||
|
@ -892,8 +894,8 @@ stack unwindings crossing some standard library functions, starting from inside
|
||||||
them, etc.
|
them, etc.
|
||||||
|
|
||||||
Finally, the unwound program must be interesting enough to enter and exit
|
Finally, the unwound program must be interesting enough to enter and exit
|
||||||
functions often, building a good stack of nested function calls (at least 5
|
functions often, building a good stack of nested function calls (at least
|
||||||
frequently), have FDEs that are not as simple as in Listing~\ref{lst:ex1_dw},
|
frequently 5), have FDEs that are not as simple as in Listing~\ref{lst:ex1_dw},
|
||||||
etc.
|
etc.
|
||||||
|
|
||||||
|
|
||||||
|
@ -925,7 +927,8 @@ Section~\ref{ssec:bench_req} above: since it stops at regular intervals and
|
||||||
unwinds, the unwindings are evenly distributed \wrt{} the frequency of
|
unwinds, the unwindings are evenly distributed \wrt{} the frequency of
|
||||||
execution of the code, which is a natural enough setup for the benchmarks to be
|
execution of the code, which is a natural enough setup for the benchmarks to be
|
||||||
meaningful, while still unwinding from diversified locations, preventing
|
meaningful, while still unwinding from diversified locations, preventing
|
||||||
caching from being be overwhelming. It also has the ability to unwind from
|
caching from being be overwhelming --~as can be observed later in
|
||||||
|
Section~\ref{ssec:timeperf}. It also has the ability to unwind from
|
||||||
within any function, included functions of linked shared libraries. It can also
|
within any function, included functions of linked shared libraries. It can also
|
||||||
be applied to virtually any program, which allows unwinding ``interesting''
|
be applied to virtually any program, which allows unwinding ``interesting''
|
||||||
code.
|
code.
|
||||||
|
@ -944,27 +947,26 @@ turned out necessary to slightly modify \prog{libunwind}'s interface to add a
|
||||||
parameter to an initialisation function, since \prog{libunwind} is made to be
|
parameter to an initialisation function, since \prog{libunwind} is made to be
|
||||||
agnostic of the system and process as much as possible, to be able to unwind in
|
agnostic of the system and process as much as possible, to be able to unwind in
|
||||||
any context. This very restricted information lacked a memory map (see
|
any context. This very restricted information lacked a memory map (see
|
||||||
Section~\ref{ssec:ehelfs}) in order to use \ehelfs. Apart from this, the
|
Section~\ref{ssec:ehelfs}) in order to use \ehelfs{} --~while, on the other
|
||||||
modified version of \prog{libunwind} produced is entirely compatible with the
|
hand, providing information about the original DWARF that are now useless.
|
||||||
vanilla version. This means that the only modifications required to use
|
Apart from this, the modified version of \prog{libunwind} produced is entirely
|
||||||
\ehelfs{} within any project using \prog{libunwind} should be changing one line
|
compatible with the vanilla version. This means that the only modifications
|
||||||
of code to add one parameter to a function call and linking against the
|
required to use \ehelfs{} within any project using \prog{libunwind} should be
|
||||||
modified version of \prog{libunwind} instead of the system version.
|
changing one line of code to add one parameter to a function call and linking
|
||||||
|
against the modified version of \prog{libunwind} instead of the system version.
|
||||||
|
|
||||||
Once this was done, plugging it in \prog{perf} was the matter of a few lines of
|
Once this was done, plugging it in \prog{perf} was the matter of a few lines of
|
||||||
code only, left apart the benchmarking code. The major problem encountered was
|
code only, left apart the benchmarking code. The major problem encountered was
|
||||||
to understand how \prog{perf} works. In order to avoid perturbing the traced
|
to understand how \prog{perf} works. In order to avoid perturbing the traced
|
||||||
program, \prog{perf} does not unwind at runtime, but rather records at regular
|
program, \prog{perf} does not unwind at runtime, but rather records at regular
|
||||||
intervals the program's stack, and all the auxiliary information that is needed
|
intervals the program's stack, and all the auxiliary information that is needed
|
||||||
to unwind later. This is done when running \lstbash{perf record}. Then,
|
to unwind later. This is done when running \lstbash{perf record}. Then, a
|
||||||
\lstbash{perf report} unwinds the stack to analyze it; but at this point of
|
subsequent call to \lstbash{perf report} unwinds the stack to analyze it; but
|
||||||
time, the traced process is long dead, thus any PID-based approach, or any
|
at this point of time, the traced process is long dead. Thus, any PID-based
|
||||||
approach using \texttt{/proc} information will fail. However, as this was the
|
approach, or any approach using \texttt{/proc} information will fail. However,
|
||||||
easiest method, the first version of \ehelfs{} used those mechanisms; thus
|
as this was the easiest method, the first version of \ehelfs{} used those
|
||||||
requiring some code rewriting.
|
mechanisms; it took some code rewriting to move to a PID- and
|
||||||
|
\texttt{/proc}-agnostic implementation.
|
||||||
The modified versions of both \prog{perf} and \prog{libunwind} are present in
|
|
||||||
the repositories \prog{perf-eh\_elf} and \prog{libunwind-eh\_elf}.
|
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\subsection{Other explored methods}
|
\subsection{Other explored methods}
|
||||||
|
@ -1052,6 +1054,11 @@ instruction, however, would not slow down at all the implementation, since
|
||||||
every instruction would simply be compiled to x86\_64 without affecting the
|
every instruction would simply be compiled to x86\_64 without affecting the
|
||||||
already supported code.
|
already supported code.
|
||||||
|
|
||||||
|
The fact that there is a sharp difference between cached and uncached
|
||||||
|
\prog{libunwind} confirm that our experimental setup did not unwind at totally
|
||||||
|
different locations every single time, and thus was not biased in this
|
||||||
|
direction, since caching is still very efficient.
|
||||||
|
|
||||||
It is also worth noting that the compilation time of \ehelfs{} is also
|
It is also worth noting that the compilation time of \ehelfs{} is also
|
||||||
reasonably short. On the machine described in Section~\ref{ssec:bench_hw}, and
|
reasonably short. On the machine described in Section~\ref{ssec:bench_hw}, and
|
||||||
without using multiple cores to compile, the various shared objects needed to
|
without using multiple cores to compile, the various shared objects needed to
|
||||||
|
@ -1117,8 +1124,10 @@ Section~\ref{ssec:instr_cov}).
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\subsection{Instructions coverage}\label{ssec:instr_cov}
|
\subsection{Instructions coverage}\label{ssec:instr_cov}
|
||||||
|
|
||||||
In order to determine which proportion of real-world ELF instructions are
|
In order to determine which DWARF instructions are necessary to implement to
|
||||||
covered by our compiler and \ehelfs.
|
have meaningful results, as well as to assess the instruction coverage of our
|
||||||
|
compiler and \ehelfs, we must look at real-world ELF files and inspect the
|
||||||
|
instructions used.
|
||||||
|
|
||||||
The method chosen was to take a random uniform sample of 4000 ELFs among those
|
The method chosen was to take a random uniform sample of 4000 ELFs among those
|
||||||
present on a basic ArchLinux system setup, in the directories \texttt{/bin},
|
present on a basic ArchLinux system setup, in the directories \texttt{/bin},
|
||||||
|
@ -1211,7 +1220,7 @@ instructions encountered that were not supported by \ehelfs. The first row is
|
||||||
only concerned about the columns CFA, \reg{rip}, \reg{rsp}, \reg{rbp} and
|
only concerned about the columns CFA, \reg{rip}, \reg{rsp}, \reg{rbp} and
|
||||||
\reg{rbx} (the supported registers --~see Section~\ref{ssec:ehelfs}). The
|
\reg{rbx} (the supported registers --~see Section~\ref{ssec:ehelfs}). The
|
||||||
second row analyzes all the columns that were encountered, no matter whether
|
second row analyzes all the columns that were encountered, no matter whether
|
||||||
supported or not.
|
supported or not in \ehelfs.
|
||||||
|
|
||||||
The Table~\ref{table:instr_types} analyzes the proportion of each command
|
The Table~\ref{table:instr_types} analyzes the proportion of each command
|
||||||
--~the formal way a register is set~-- for non-CFA columns in the sampled data. For
|
--~the formal way a register is set~-- for non-CFA columns in the sampled data. For
|
||||||
|
@ -1221,11 +1230,13 @@ means stored at the address of an expression's result, and the \texttt{Val\_}
|
||||||
prefix means that the value must not be dereferenced. Overall, it can be seen
|
prefix means that the value must not be dereferenced. Overall, it can be seen
|
||||||
that supporting \texttt{Offset} already means supporting the vast majority of
|
that supporting \texttt{Offset} already means supporting the vast majority of
|
||||||
registers. The data gathered (not reproduced here) also suggests that
|
registers. The data gathered (not reproduced here) also suggests that
|
||||||
supporting a few common expressions is enough to support most of them.
|
supporting a few common expressions is enough to support most of them. This is
|
||||||
|
further supported by the fact that we already support more than $80\,\%$ of
|
||||||
|
expressions only by supporting two basic constructs.
|
||||||
|
|
||||||
It is also worth noting that of all the 4000 analyzed files, there are only 12
|
It is also worth noting that among all of the 4000 analyzed files, all the
|
||||||
that contained all the unsupported expressions seen, and only 24 that contained
|
unsupported expressions are clustered in only 12 of them, and only 24 contained
|
||||||
some unsupported instruction at all.
|
unsupported instructions at all.
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
%%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
|
|
Loading…
Reference in a new issue