Factor out irrelevant footnotes and parentheses

This commit is contained in:
Théophile Bastian 2018-08-16 00:26:59 +02:00
parent c5f1f8615b
commit 67b25ca038

View file

@ -80,15 +80,15 @@ restored before returning, the function's return address and local variables.
On the x86\_64 platform, with which this report is mostly concerned, the On the x86\_64 platform, with which this report is mostly concerned, the
calling convention that is followed is defined in the System V calling convention that is followed is defined in the System V
ABI~\cite{systemVabi} for the Unix-like operating systems (among which Linux). ABI~\cite{systemVabi} for the Unix-like operating systems --~among which Linux.
Under this calling convention, the first six arguments of a function are passed Under this calling convention, the first six arguments of a function are passed
in the registers \reg{rdi}, \reg{rsi}, \reg{rdx}, \reg{rcx}, \reg{r8}, in the registers \reg{rdi}, \reg{rsi}, \reg{rdx}, \reg{rcx}, \reg{r8},
\reg{r9}, while additional arguments are pushed onto the stack. It also defines \reg{r9}, while additional arguments are pushed onto the stack. It also defines
which registers may be overwritten by the callee, and which parameters must be which registers may be overwritten by the callee, and which parameters must be
restored before returning (which most of the time is done by pushing the restored before returning. This restoration, most of the time, is done by
register value onto the stack in the function prelude, and restoring it just pushing the register value onto the stack in the function prelude, and
before returning). Those preserved registers are \reg{rbx}, \reg{rsp}, restoring it just before returning. Those preserved registers are \reg{rbx},
\reg{rbp}, \reg{r12}, \reg{r13}, \reg{r14}, \reg{r15}. \reg{rsp}, \reg{rbp}, \reg{r12}, \reg{r13}, \reg{r14}, \reg{r15}.
\begin{wrapfigure}{r}{0.4\textwidth} \begin{wrapfigure}{r}{0.4\textwidth}
\centering \centering
@ -98,11 +98,8 @@ before returning). Those preserved registers are \reg{rbx}, \reg{rsp},
\end{wrapfigure} \end{wrapfigure}
The register \reg{rsp} is supposed to always point to the last used memory cell The register \reg{rsp} is supposed to always point to the last used memory cell
in the stack, thus, when the process just enters a new function, \reg{rsp} in the stack. Thus, when the process just enters a new function, \reg{rsp}
points right to the location of the return address\footnote{Remember that since points right to the location of the return address. Then, the compiler might
the stack grows \emph{downwards} in memory, the arrow of \reg{rsp} points
\emph{below} the RA cell in the figure, and yet the memory cell indexed is the
one \emph{above} in the drawing, that is, the RA.}. Then, the compiler might
use \reg{rbp} (``base pointer'') to save this value of \reg{rip}, by writing use \reg{rbp} (``base pointer'') to save this value of \reg{rip}, by writing
the old value of \reg{rbp} just below the return address on the stack, then the old value of \reg{rbp} just below the return address on the stack, then
copying \reg{rsp} to \reg{rbp}. This makes it easy to find the return address copying \reg{rsp} to \reg{rbp}. This makes it easy to find the return address
@ -148,8 +145,8 @@ Left apart analyzing the assembly code produced, there is no way to find where
the return address is stored, relatively to \reg{rsp}, at some arbitrary point the return address is stored, relatively to \reg{rsp}, at some arbitrary point
of the function. Even when \reg{rbp} is used, there is no easy way to guess of the function. Even when \reg{rbp} is used, there is no easy way to guess
where each callee-saved register is stored in the stack frame, and worse, which where each callee-saved register is stored in the stack frame, and worse, which
callee-saved registers were saved (since it is not necessary to save a register callee-saved registers were saved, since it is optional to save a register
that the function never touches). that the function never touches.
With this example, it seems pretty clear that it is often necessary to have With this example, it seems pretty clear that it is often necessary to have
additional data to perform stack unwinding. This data is often stored among the additional data to perform stack unwinding. This data is often stored among the
@ -171,11 +168,11 @@ context, by unwinding \lstc{fct_b}'s frame.
\medskip \medskip
Yet, stack unwinding (and thus debugging data) \emph{is not limited to Yet, stack unwinding, and thus, debugging data, \emph{is not limited to
debugging}. debugging}.
Another common usage is profiling. A profiling tool, such as \prog{perf} under Another common usage is profiling. A profiling tool, such as \prog{perf} under
Linux -- see Section~\ref{ssec:perf} --, is used to measure and analyze in Linux --~see Section~\ref{ssec:perf} --, is used to measure and analyze in
which functions a program spends its time, identify bottlenecks and find out which functions a program spends its time, identify bottlenecks and find out
which parts are critical to optimize. To do so, modern profilers pause the which parts are critical to optimize. To do so, modern profilers pause the
traced program at regular, short intervals, inspect their stack, and determine traced program at regular, short intervals, inspect their stack, and determine
@ -202,8 +199,8 @@ trigger the destructors of stack-allocated objects. Furthermore, this is often
undesirable: \lstc{setjmp} has a quite big overhead, which is introduced undesirable: \lstc{setjmp} has a quite big overhead, which is introduced
whenever a \lstc{try} block is encountered. Instead, it is often preferred to whenever a \lstc{try} block is encountered. Instead, it is often preferred to
have strictly no overhead when no exception happens, at the cost of a greater have strictly no overhead when no exception happens, at the cost of a greater
overhead when an exception is actually fired (after all, they are supposed to overhead when an exception is actually fired --~after all, they are supposed to
be \emph{exceptional}). For more details on C++ exception handling, be \emph{exceptional}. For more details on C++ exception handling,
see~\cite{koening1990exception} (especially Section~16.5). Possible see~\cite{koening1990exception} (especially Section~16.5). Possible
implementation mechanisms are also presented in~\cite{dinechin2000exn}. implementation mechanisms are also presented in~\cite{dinechin2000exn}.
@ -237,8 +234,8 @@ the previous paragraph, in an ELF section originally called
For any binary, debugging information can easily get quite large if no For any binary, debugging information can easily get quite large if no
attention is payed to keeping it as compact as possible. In this matter, DWARF attention is payed to keeping it as compact as possible. In this matter, DWARF
does an excellent job, and everything is stored in a very compact way. This, does an excellent job, and everything is stored in a very compact way. This,
however, as we will see, makes it both difficult to parse correctly (with \eg{} however, as we will see, makes it both difficult to parse correctly and quite
variable-length integers) and quite slow to interpret. slow to interpret.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{DWARF unwinding data} \subsection{DWARF unwinding data}
@ -259,19 +256,15 @@ a table, that has a range of IPs on which it has authority. Most often, but not
necessarily, it corresponds to a single function in the original source code. necessarily, it corresponds to a single function in the original source code.
Each column of the table is a register (\eg{} \reg{rsp}), with two additional Each column of the table is a register (\eg{} \reg{rsp}), with two additional
special registers, CFA (Canonical Frame Address) and RA (Return Address), special registers, CFA (Canonical Frame Address) and RA (Return Address),
containing respectively the base pointer of the current stack containing respectively the base pointer of the current stack frame and the
frame\footnote{The CFA is most commonly thought of as the base pointer of the return address of the current function. For instance, on a x86\_64
frame, yet this is not enforced by DWARF\@. The CFA is used as an address from architecture, RA would contain the unwound value of \reg{rip}, the instruction
which other registers will be deduced as offsets, and although it is supposed pointer. Each row has a certain validity interval, on which it describes
to be the actual base pointer, it can be anything as long as it is close enough accurate unwinding data. This range starts at the instruction pointer it is
to the addresses that will be deduced from it.} and the return address of the associated with, and ends at the start IP of the next table row (or the end IP
current function (\ie{} for x86\_64, the unwound value of \reg{rip}, the of the current FDE if it was the last row). In particular, there can be no ``IP
instruction pointer). Each row has a certain validity interval, on which it hole'' within a FDE --~unlike FDEs themselves, which can leave holes between
describes accurate unwinding data. This range starts at the instruction pointer them.
it is associated with, and ends at the start IP of the next table row (or the
end IP of the current FDE if it was the last row). In particular, there can be
no ``IP hole'' within a FDE --~unlike FDEs themselves, which can leave holes
between them.
\begin{figure}[h] \begin{figure}[h]
\begin{minipage}{0.45\textwidth} \begin{minipage}{0.45\textwidth}
@ -329,17 +322,17 @@ how the stack frame is constructed. When interpreting the generated \ehframe{}
with \lstbash{readelf -wF}, we obtain the (slightly edited) with \lstbash{readelf -wF}, we obtain the (slightly edited)
Listing~\ref{lst:ex1_dw}. During the function prelude, \ie{} for $\mhex{615} Listing~\ref{lst:ex1_dw}. During the function prelude, \ie{} for $\mhex{615}
\leq \reg{rip} < \mhex{619}$, the stack frame only contains the return address, \leq \reg{rip} < \mhex{619}$, the stack frame only contains the return address,
thus the CFA is 8 bytes above \reg{rsp} (which was the value of \reg{rsp} thus the CFA is 8 bytes above \reg{rsp}, and the return address is precisely at
before the call, and is the topmost value of used space for this stack frame), \reg{rsp} --~that is, stored between \reg{rsp} and $\reg{rsp} + 8$. Then, the
and the return address is precisely at \reg{rsp} --~that is, stored between contents of \lstc{fibo}, 8 integers of 4 bytes each, are allocated on the
\reg{rsp} and $\reg{rsp} + 8$. Then, 8 integers of 4 bytes each (for stack, which puts the CFA 32 bytes above \reg{rsp}; the return address still
\lstc{fibo}, \lstc{pos} being optimized out) are allocated on the stack, which being 8 bytes below the CFA\@. The variable \lstc{pos} is optimized out in the
puts the CFA 32 bytes above \reg{rsp}, and the return address still 8 bytes generated assembly code, thus no stack space is allocated for it. Yet,
below the CFA\@. Yet, \prog{gcc} decided to allocate a total space of 48 bytes \prog{gcc} decided to allocate a total space of 48 bytes for the stack frame
for the stack frame for memory alignment reasons, which means subtracting 40 for memory alignment reasons, which means subtracting 40 bytes to \reg{rsp}
bytes to \reg{rsp} (address $\mhex{615}$ in the assembly). Then, by the end of (address $\mhex{615}$ in the assembly). Then, by the end of the function, the
the function, the local variables are discarded and \reg{rsp} is reset to its local variables are discarded and \reg{rsp} is reset to its value from the
value from the first row. first row.
However, DWARF data isn't actually stored as a table in the binary files, but However, DWARF data isn't actually stored as a table in the binary files, but
is instead stored as in Listing~\ref{lst:ex1_dwraw}. The first row has the is instead stored as in Listing~\ref{lst:ex1_dwraw}. The first row has the
@ -425,9 +418,9 @@ These are the DWARF instructions used for CFI description, that is, the
instructions that contain the stack unwinding table informations. The following instructions that contain the stack unwinding table informations. The following
list is an exhaustive list of instructions from the DWARF5 list is an exhaustive list of instructions from the DWARF5
specification~\cite{dwarf5std} concerning CFI, with reworded descriptions for specification~\cite{dwarf5std} concerning CFI, with reworded descriptions for
brevity and clarity. All these instructions are up to variants (most brevity and clarity. All these instructions are up to variants --~most
instructions exist in multiple formats to handle various operands formatting, instructions exist in multiple formats to handle various operands formatting,
to optimize space). Since we won't be talking about the underlying file format to optimize space. Since we won't be talking about the underlying file format
here, those variations between eg. \dwcfa{advance\_loc1} and here, those variations between eg. \dwcfa{advance\_loc1} and
\dwcfa{advance\_loc2} --~which differ only on the number of bytes of their \dwcfa{advance\_loc2} --~which differ only on the number of bytes of their
operand~-- are irrelevant and will be eluded. operand~-- are irrelevant and will be eluded.
@ -517,10 +510,10 @@ only handled as register identifiers, so we can safely state that $\reg{reg}
A value can then be undefined, stored at memory address $x$ or be directly a A value can then be undefined, stored at memory address $x$ or be directly a
value $x$, $x$ being here a simple expression consisting of $\reg{reg} + value $x$, $x$ being here a simple expression consisting of $\reg{reg} +
\textit{offset}$. The CFA is considered a simple register here. For instance, to \textit{offset}$. The CFA is considered a simple register here. For instance,
define $\reg{rax}$ to the value contained in memory 16 bytes below the CFA, we to define $\reg{rax}$ to the value contained in memory 16 bytes below the CFA,
would have $\reg{rax} \mapsto \valaddr{\reg{CFA}, -16}$ (for the stack grows we would have $\reg{rax} \mapsto \valaddr{\reg{CFA}, -16}$, since the stack
downwards). grows downwards.
\subsection{Target language~: a C function body} \subsection{Target language~: a C function body}
@ -533,10 +526,10 @@ execution stack, or even the heap.
This function takes as arguments an instruction pointer --~supposedly This function takes as arguments an instruction pointer --~supposedly
extracted from $\reg{rip}$~-- and an array of register values; and returns a extracted from $\reg{rip}$~-- and an array of register values; and returns a
fresh array of register values after unwinding this call frame. The function is fresh array of register values after unwinding this call frame. The function is
compositional\footnote{up to technicities: the IP obtained after unwinding the compositional: it can be called twice in a row to unwind two stack frames,
first frame might be handled in a different dynamically loaded object, and this unless the IP obtained after the first unwinding comes from another shared
would require inspecting the DWARF located in another file}: it can be called object file, for instance a call to \prog{libc}. In this case, unwinding the
twice in a row to unwind two stack frames. second frame will require loading the corresponding DWARF information.
The function is the following~: The function is the following~:
@ -636,8 +629,8 @@ $F\left[0 \ldots |F|-2\right] \extrarrow{reg} \bullet$.
\semI{\dwcfa{nop()} \cdot d}{s}(F) &:= \contsem{F}\\ \semI{\dwcfa{nop()} \cdot d}{s}(F) &:= \contsem{F}\\
\end{align*} \end{align*}
(The stack is used for \texttt{remember\_state} and \texttt{restore\_state}. If The stack is used for \texttt{remember\_state} and \texttt{restore\_state}. If
we omit those two operations, we can plainly remove the stack). we omit those two operations, we can plainly remove the stack.
\subsection{From $\intermedlang$ to C} \subsection{From $\intermedlang$ to C}
@ -694,8 +687,9 @@ machine code on the x86\_64 platform.
The rough idea of the compilation is to produce, out of the \ehframe{} section The rough idea of the compilation is to produce, out of the \ehframe{} section
of a binary, C code that resembles the code shown in the DWARF semantics from of a binary, C code that resembles the code shown in the DWARF semantics from
Section~\ref{sec:semantics} above. This C code is then compiled by GCC in Section~\ref{sec:semantics} above. This C code is then compiled by GCC in
\lstbash{-O2} mode\footnote{Compiling in \lstbash{-O3} takes way too much \lstbash{-O2} mode, since it already provides a good level of optimization and
time.}, providing for free all the optimization passes of a modern compiler. compiling in \lstbash{-O3} takes way too much time. This saves us the trouble
of optimizing the generated C code whenever GCC does that by itself.
The generated code consists in a single monolithic function, \lstc{_eh_elf}, The generated code consists in a single monolithic function, \lstc{_eh_elf},
taking as arguments an instruction pointer and a memory context (\ie{} the taking as arguments an instruction pointer and a memory context (\ie{} the
@ -715,18 +709,18 @@ return it.
A setting of the compiler also optionally enables another parameter to the A setting of the compiler also optionally enables another parameter to the
\lstc{_eh_elf} function, \lstc{deref}, which is a function pointer. This \lstc{_eh_elf} function, \lstc{deref}, which is a function pointer. This
\lstc{deref} function, when enabled, replaces everywhere the dereferencing \lstc{deref} function, when present, replaces everywhere the dereferencing
\lstc{*} operator, and can be used to generate \ehelfs{} that will work on \lstc{*} operator, and can be used to generate \ehelfs{} that will work on
remote address spaces (\ie{} whenever the unwinding is not done on the process remote address spaces, that is, whenever the unwinding is not done on the
reading the \ehelf{} itself, but some other process, or even on a stack dump of process reading the \ehelf{} itself, but some other process, or even on a stack
a long-terminated process). dump of a long-terminated process.
Unlike in the \ehframe, and unlike what should be done in a release, Unlike in the \ehframe, and unlike what should be done in a release,
real-world-proof version of the \ehelfs, the choice was made to keep this real-world-proof version of the \ehelfs, the choice was made to keep this
prototype simple, and only handle the few registers that were needed to simply prototype simple, and only handle the few registers that were needed to simply
unwind the stack. Thus, the only registers handled in \ehelfs{} are \reg{rip}, unwind the stack. Thus, the only registers handled in \ehelfs{} are \reg{rip},
\reg{rbp}, \reg{rsp} and \reg{rbx} (the latter being used quite often in \reg{rbp}, \reg{rsp} and \reg{rbx}, the latter being used quite often in
\prog{libc} to hold the CFA address). This is enough to unwind the stack, but \prog{libc} to hold the CFA address. This is enough to unwind the stack, but
is not sufficient to analyze every stack frame as \prog{gdb} would do after a is not sufficient to analyze every stack frame as \prog{gdb} would do after a
\lstbash{frame n} command. \lstbash{frame n} command.
@ -736,10 +730,9 @@ is not sufficient to analyze every stack frame as \prog{gdb} would do after a
In the unwind context from Listing~\ref{lst:unw_ctx}, the values of type In the unwind context from Listing~\ref{lst:unw_ctx}, the values of type
\lstc{uintptr_t} are the values of the corresponding registers, and \lstc{uintptr_t} are the values of the corresponding registers, and
\lstc{flags} is a 8-bits value, indicating for each register whether it is \lstc{flags} is a 8-bits value, indicating for each register whether it is
present or not in this context (\ie{} if the \lstc{rbx} bit is not set, the present or not in this context, plus an error bit, indicating whether an error
value of \lstc{rbx} in the structure isn't meaningful), plus an error bit, occurred during unwinding. Such errors can be due \eg{} to an unsupported
indicating whether an error occurred during unwinding (which can be due \eg{} operation in the original DWARF\@.
to an unsupported operation in the original DWARF, thus compiled to an error).
This generated data is stored in separate shared object files, which we call This generated data is stored in separate shared object files, which we call
\ehelfs. It would have been possible to alter the original ELF file to embed \ehelfs. It would have been possible to alter the original ELF file to embed
@ -827,12 +820,12 @@ made in order to shrink the \ehelfs.
The major optimization that most reduced the output size was to use an if/else The major optimization that most reduced the output size was to use an if/else
tree implementing a binary search on the program counter relevant intervals, tree implementing a binary search on the program counter relevant intervals,
instead of a huge switch. In the process, we also \emph{outline} a lot of code, instead of a huge switch. In the process, we also \emph{outline} a lot of code,
that is, find out identical ``switch cases'' bodies (which are not switch cases that is, find out identical ``switch cases'' bodies --~which are not switch
anymore, but if bodies), move them outside of the if/else tree, identify them cases anymore, but if bodies~--, move them outside of the if/else tree,
by a label, and jump to them using a \lstc{goto}, which de-duplicates a lot of identify them by a label, and jump to them using a \lstc{goto}, which
code and contributes greatly to the shrinking. In the process, we noticed that de-duplicates a lot of code and contributes greatly to the shrinking. In the
the vast majority of FDE rows are actually taken among very few ``common'' FDE process, we noticed that the vast majority of FDE rows are actually taken among
rows. very few ``common'' FDE rows.
This makes this optimization really efficient, as seen later in This makes this optimization really efficient, as seen later in
Section~\ref{ssec:results_size}, but also makes it an interesting question Section~\ref{ssec:results_size}, but also makes it an interesting question
@ -886,13 +879,12 @@ Listing~\ref{lst:ex1_dw}, etc.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Presentation of \prog{perf}}\label{ssec:perf} \subsection{Presentation of \prog{perf}}\label{ssec:perf}
\prog{Perf} is a \emph{profiler} that comes with the Linux ecosystem (actually, \prog{Perf} is a \emph{profiler} that comes with the Linux ecosystem, and is
\prog{perf} is developed within the Linux kernel source tree). A profiler is an even developed within the Linux kernel source tree. A profiler is an important
important tool from the developer's toolbox that analyzes the performance of tool from the developer's toolbox that analyzes the performance of programs by
programs by recording the time spent in each function, including within nested recording the time spent in each function, including within nested calls. This
calls. This analysis often enables programmers to optimize critical paths and analysis often enables programmers to optimize critical paths and functions in
functions in their programs, while leaving unoptimized functions that are their programs, while leaving unoptimized functions that are seldom traversed.
seldom traversed.
For this purpose, the basic idea is to stop the traced program at regular For this purpose, the basic idea is to stop the traced program at regular
intervals, unwind its stack, write down the current nested function calls, and intervals, unwind its stack, write down the current nested function calls, and
@ -924,16 +916,16 @@ activity, be linked against \prog{libc} and \prog{pthread}, and be very light.
Interfacing \ehelfs{} with \prog{perf} required, in a first place, to fork Interfacing \ehelfs{} with \prog{perf} required, in a first place, to fork
\prog{libunwind} and implement \ehelfs{} support for it. In the process, it \prog{libunwind} and implement \ehelfs{} support for it. In the process, it
turned out necessary to slightly modify \prog{libunwind}'s interface to add a turned out necessary to slightly modify \prog{libunwind}'s interface to add a
parameter to a function, since \prog{libunwind} is made to be agnostic of the parameter to an initialisation function, since \prog{libunwind} is made to be
system and process as much as possible, to be able to unwind in any context. agnostic of the system and process as much as possible, to be able to unwind in
This very restricted information lacked a memory map (a table indicating which any context. This very restricted information lacked a \emph{memory map}, a
shared object is mapped at which address in memory) in order to use \ehelfs. table indicating which shared object is mapped at which address in memory, in
Apart from this, the modified version of \prog{libunwind} produced is entirely order to use \ehelfs. Apart from this, the modified version of \prog{libunwind}
compatible with the vanilla version, meaning that the only modifications produced is entirely compatible with the vanilla version. This means that the
required to use \ehelfs{} within any project using \prog{libunwind} should be only modifications required to use \ehelfs{} within any project using
modifying one line of code (this function call, which is a setup function) and \prog{libunwind} should be changing one line of code to add one parameter to a
linking against the modified version of \prog{libunwind} instead of the system function call and linking against the modified version of \prog{libunwind}
version. instead of the system version.
Once this was done, plugging it in \prog{perf} was the matter of a few lines of Once this was done, plugging it in \prog{perf} was the matter of a few lines of
code only, left apart the benchmarking code. The major problem encountered was code only, left apart the benchmarking code. The major problem encountered was
@ -984,9 +976,9 @@ swapping.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Measured time performance} \subsection{Measured time performance}
The benchmarking, as described in Section~\ref{ssec:bench_perf}, of \ehelfs{} A benchmarking of \ehelfs{} against the vanilla \prog{libunwind} was made using
against the vanilla \prog{libunwind} (using the same methodology, only linking the exact same methodology as in Section~\ref{ssec:bench_perf}, only linking
\prog{perf} against the vanilla \prog{libunwind}), gives the results in \prog{perf} against the vanilla \prog{libunwind}. It yields the results in
Table~\ref{table:bench_time}. Table~\ref{table:bench_time}.
\begin{table}[h] \begin{table}[h]
@ -1036,11 +1028,11 @@ instruction, however, would not slow down at all the implementation, since
every instruction would simply be compiled to x86\_64 without affecting the every instruction would simply be compiled to x86\_64 without affecting the
already supported code. already supported code.
It is also worth noting that on the machine described in It is also worth noting that the compilation time of \ehelfs{} is also
Section~\ref{ssec:bench_hw}, the compilation of the \ehelfs{} at a level of reasonably short. On the machine described in Section~\ref{ssec:bench_hw}, and
\lstc{-O2} needed to run \prog{hackbench}, that is, \prog{hackbench}, without using multiple cores to compile, the various shared objects needed to
\prog{libc}, \prog{ld}, and \prog{libpthread} takes an overall time of $25.28$ run \prog{hackbench} --~that is, \prog{hackbench}, \prog{libc}, \prog{ld} and
seconds (using only a single core). \prog{libpthread}~-- are compiled in an overall time of $25.28$ seconds.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Measured compactness}\label{ssec:results_size} \subsection{Measured compactness}\label{ssec:results_size}
@ -1189,8 +1181,8 @@ only concerned about the columns CFA, \reg{rip}, \reg{rsp}, \reg{rbp} and
second row analyzes all the columns that were encountered, no matter whether second row analyzes all the columns that were encountered, no matter whether
supported or not. supported or not.
The Table~\ref{table:instr_types} analyzes the proportion of each command (\ie\ The Table~\ref{table:instr_types} analyzes the proportion of each command
the formal way a register is set) for non-CFA columns in the sampled data. For --~the formal way a register is set~-- for non-CFA columns in the sampled data. For
a brief explanation, \texttt{Offset} means stored at offset from CFA, a brief explanation, \texttt{Offset} means stored at offset from CFA,
\texttt{Register} means the value from a machine register, \texttt{Expression} \texttt{Register} means the value from a machine register, \texttt{Expression}
means stored at the address of an expression's result, and the \texttt{Val\_} means stored at the address of an expression's result, and the \texttt{Val\_}