Eliminate widow lines
This commit is contained in:
parent
c847d71d28
commit
d4f417017e
2 changed files with 106 additions and 116 deletions
|
@ -32,22 +32,21 @@ computation~\cite{oakley2011exploiting}.
|
||||||
|
|
||||||
\subsection*{The research problem}
|
\subsection*{The research problem}
|
||||||
|
|
||||||
As debugging data can easily take an unreasonable space and grow larger than
|
As debugging data can easily grow larger than the program itself if stored
|
||||||
the program itself if stored carelessly, the DWARF standard pays a great
|
carelessly, the DWARF standard pays a great attention to data compactness and
|
||||||
attention to data compactness and compression. It succeeds particularly well
|
compression. It succeeds particularly well at it, but at the expense of
|
||||||
at it, but at the expense of efficiency: accessing stack
|
efficiency: accessing stack unwinding data for a particular program point is an
|
||||||
unwinding data for a particular program point is an expensive operation --~the
|
expensive operation --~the order of magnitude is $10\,\mu{}\text{s}$ on a
|
||||||
order of magnitude is $10\,\mu{}\text{s}$ on a modern computer.
|
modern computer.
|
||||||
|
|
||||||
This is often not a problem, as stack unwinding is often thought of as a
|
This is often not a problem, as stack unwinding is often thought of as a
|
||||||
debugging procedure: when something behaves unexpectedly, the programmer might
|
debugging procedure: when something behaves unexpectedly, the programmer might
|
||||||
be interested in opening their debugger and exploring the stack. Yet, stack
|
open their debugger and explore the stack. Yet, stack unwinding might, in some
|
||||||
unwinding might, in some cases, be performance-critical: for instance, polling
|
cases, be performance-critical: for instance, polling profilers repeatedly
|
||||||
profilers repeatedly perform stack unwindings to observe which functions are
|
perform stack unwindings to observe which functions are active. Even worse, C++
|
||||||
active. Even worse, C++ exception handling relies on stack unwinding in order
|
exception handling relies on stack unwinding in order to find a suitable
|
||||||
to find a suitable catch-block! For such applications, it might be desirable to
|
catch-block! For such applications, it might be desirable to find a different
|
||||||
find a different time/space trade-off, storing a bit more for a faster
|
time/space trade-off, storing a bit more for a faster unwinding.
|
||||||
unwinding.
|
|
||||||
|
|
||||||
This different trade-off is the question that I explored during this
|
This different trade-off is the question that I explored during this
|
||||||
internship: what good alternative trade-off is reachable when storing the stack
|
internship: what good alternative trade-off is reachable when storing the stack
|
||||||
|
@ -109,10 +108,10 @@ compiled debugging data.
|
||||||
|
|
||||||
The goal of this project was to design a compiled version of unwinding data
|
The goal of this project was to design a compiled version of unwinding data
|
||||||
that is faster than DWARF, while still being reliable and reasonably compact.
|
that is faster than DWARF, while still being reliable and reasonably compact.
|
||||||
The benchmarks mentioned have yielded convincing results: on the experimental
|
Benchmarking has yielded convincing results: on the experimental setup created
|
||||||
setup created --~detailed on Section~\ref{sec:benchmarking} below~\textendash,
|
--~detailed on Section~\ref{sec:benchmarking} below~\textendash, the compiled
|
||||||
the compiled version is around 26 times faster than the DWARF version, while it
|
version is around 26 times faster than the DWARF version, while it remains only
|
||||||
remains only around 2.5 times bigger than the original data.
|
around 2.5 times bigger than the original data.
|
||||||
|
|
||||||
We support the vast majority --~more than $99.9\,\%$~-- of the instructions
|
We support the vast majority --~more than $99.9\,\%$~-- of the instructions
|
||||||
actually used in binaries, although we do not support all of DWARF5 instruction
|
actually used in binaries, although we do not support all of DWARF5 instruction
|
||||||
|
|
|
@ -79,16 +79,16 @@ typically used for storing function arguments, machine registers that must be
|
||||||
restored before returning, the function's return address and local variables.
|
restored before returning, the function's return address and local variables.
|
||||||
|
|
||||||
On the x86\_64 platform, with which this report is mostly concerned, the
|
On the x86\_64 platform, with which this report is mostly concerned, the
|
||||||
calling convention that is followed is defined in the System V
|
calling convention followed on UNIX-like operating systems --~among which Linux
|
||||||
ABI~\cite{systemVabi} for the Unix-like operating systems --~among which Linux
|
and MacOS~-- is defined by the System V ABI~\cite{systemVabi}. Under this
|
||||||
and MacOS\@. Under this calling convention, the first six arguments of a
|
calling convention, the first six arguments of a function are passed in the
|
||||||
function are passed in the registers \reg{rdi}, \reg{rsi}, \reg{rdx},
|
registers \reg{rdi}, \reg{rsi}, \reg{rdx}, \reg{rcx}, \reg{r8}, \reg{r9}, while
|
||||||
\reg{rcx}, \reg{r8}, \reg{r9}, while additional arguments are pushed onto the
|
additional arguments are pushed onto the stack. It also defines which registers
|
||||||
stack. It also defines which registers may be overwritten by the callee, and
|
may be overwritten by the callee, and which registers must be restored by the
|
||||||
which registers must be restored before returning. This restoration, for most
|
callee before returning. This restoration, for most compilers, is done by
|
||||||
compilers, is done by pushing the register value onto the stack in the function
|
pushing the register value onto the stack during the function prelude, and
|
||||||
prelude, and restoring it just before returning. Those preserved registers are
|
restoring it just before returning. Those preserved registers are \reg{rbx},
|
||||||
\reg{rbx}, \reg{rsp}, \reg{rbp}, \reg{r12}, \reg{r13}, \reg{r14}, \reg{r15}.
|
\reg{rsp}, \reg{rbp}, \reg{r12}, \reg{r13}, \reg{r14}, \reg{r15}.
|
||||||
|
|
||||||
\begin{wrapfigure}{r}{0.4\textwidth}
|
\begin{wrapfigure}{r}{0.4\textwidth}
|
||||||
\centering
|
\centering
|
||||||
|
@ -97,24 +97,22 @@ prelude, and restoring it just before returning. Those preserved registers are
|
||||||
conventions}\label{fig:call_stack}
|
conventions}\label{fig:call_stack}
|
||||||
\end{wrapfigure}
|
\end{wrapfigure}
|
||||||
|
|
||||||
The register \reg{rsp} is supposed to always point to the last used memory cell
|
The register \reg{rsp} is supposed to always point to the last used address in
|
||||||
in the stack. Thus, when the process just enters a new function, \reg{rsp}
|
the stack. Thus, when the process enters a new function, \reg{rsp} points to
|
||||||
points right to the location of the return address. Then, the compiler might
|
the location of the return address. Then, the compiler might use \reg{rbp}
|
||||||
use \reg{rbp} (``base pointer'') to save this value of \reg{rip}, by writing
|
(``base pointer'') to save this value of \reg{rsp}, writing the old value of
|
||||||
the old value of \reg{rbp} just below the return address on the stack, then
|
\reg{rbp} below the return address on the stack and copying \reg{rsp} to
|
||||||
copying \reg{rsp} to \reg{rbp}. This makes it easy to find the return address
|
\reg{rbp}. This makes it easy to find the return address from anywhere within
|
||||||
from anywhere within the function, and also allows for easy addressing of local
|
the function, and allows for easy addressing of local variables. To some
|
||||||
variables. To some extents, it also allows for hot debugging, such as saving a
|
extents, it also allows for hot debugging, such as saving a useful core dump
|
||||||
useful core dump upon segfault. Yet, using \reg{rbp} to save \reg{rip} is not
|
upon segfault. Yet, using \reg{rbp} to save \reg{rip} wastes a register, and
|
||||||
always done, since it wastes a register. This decision is, on x86\_64 System V,
|
the decision of using it is, on x86\_64 System V, up to the compiler.
|
||||||
up to the compiler.
|
|
||||||
|
|
||||||
Usually, a function starts by subtracting some value to \reg{rsp}, allocating
|
Usually, a function starts by subtracting some value to \reg{rsp}, allocating
|
||||||
some space in the stack frame for its local variables. Then, it pushes on
|
some space in the stack frame for its local variables. Then, it saves on the
|
||||||
the stack the values of the callee-saved registers that are overwritten later,
|
stack the values of the callee-saved registers that are overwritten later.
|
||||||
effectively saving them. Before returning, it pops the values of the saved
|
Before returning, it pops the values of the saved registers back to their
|
||||||
registers back to their original registers and restore \reg{rsp} to its former
|
original registers and restore \reg{rsp} to its former value.
|
||||||
value.
|
|
||||||
|
|
||||||
\subsection{Stack unwinding}\label{ssec:stack_unwinding}
|
\subsection{Stack unwinding}\label{ssec:stack_unwinding}
|
||||||
|
|
||||||
|
@ -126,13 +124,12 @@ IP\@. This actually observes the stack to find the different stack frames, and
|
||||||
decode them to identify the function names, parameter values, etc.
|
decode them to identify the function names, parameter values, etc.
|
||||||
|
|
||||||
This operation is far from trivial. Often, a stack frame will only make sense
|
This operation is far from trivial. Often, a stack frame will only make sense
|
||||||
when the correct values are stored in the machine registers. These values,
|
when the machine registers hold the right values. These values,
|
||||||
however, are to be restored from the previous stack frame, where they are
|
however, are to be restored from the previous stack frame, where they are
|
||||||
stored. This imposes to \emph{walk} the stack, reading the entries one after
|
stored. This imposes to \emph{walk} the stack, reading the frames one after
|
||||||
the other, instead of peeking at some frame directly. Moreover, the size of one
|
the other, instead of peeking at some frame directly. Moreover, it is often not
|
||||||
stack frame is often not that easy to determine when looking at some
|
even easy to determine the boundaries of each stack frame alone, making it
|
||||||
instruction other than \texttt{return}, making it hard to extract single frames
|
impossible to just peek at a single frame.
|
||||||
from the whole stack.
|
|
||||||
|
|
||||||
Interpreting a frame in order to get the machine state \emph{before} this
|
Interpreting a frame in order to get the machine state \emph{before} this
|
||||||
frame, and thus be able to decode the next frame recursively, is called
|
frame, and thus be able to decode the next frame recursively, is called
|
||||||
|
@ -159,10 +156,10 @@ common format of debugging data is DWARF\@.
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\subsection{Unwinding usage and frequency}
|
\subsection{Unwinding usage and frequency}
|
||||||
|
|
||||||
Stack unwinding is a more common operation that one might think at first. The
|
Stack unwinding is more frequent that one might think at first. The use case
|
||||||
use case mostly thought of is simply to get a stack trace of a program, and
|
mostly thought of is simply to get a stack trace of a program, and provide a
|
||||||
provide a debugger with the information it needs. For instance, when inspecting
|
debugger with the information it needs. For instance, when inspecting a stack
|
||||||
a stack trace in \prog{gdb}, a common operation is to jump to a previous frame:
|
trace in \prog{gdb}, a common operation is to jump to a previous frame:
|
||||||
|
|
||||||
\lstinputlisting{src/segfault/gdb_session}
|
\lstinputlisting{src/segfault/gdb_session}
|
||||||
|
|
||||||
|
@ -174,18 +171,18 @@ context, by unwinding \lstc{fct_b}'s frame.
|
||||||
Yet, stack unwinding, and thus, debugging data, \emph{is not limited to
|
Yet, stack unwinding, and thus, debugging data, \emph{is not limited to
|
||||||
debugging}.
|
debugging}.
|
||||||
|
|
||||||
Another common usage is profiling. A profiling tool, such as \prog{perf} under
|
Another common usage is profiling. A profiler, such as \prog{perf} under Linux
|
||||||
Linux --~see Section~\ref{ssec:perf} --, is used to measure and analyze in
|
--~see Section~\ref{ssec:perf}~--, is used to measure and analyze in which
|
||||||
which functions a program spends its time, identify bottlenecks and find out
|
functions a program spends its time, and find out which parts are critical to
|
||||||
which parts are critical to optimize. To do so, modern profilers pause the
|
optimize. To do so, modern profilers pause the traced program at regular,
|
||||||
traced program at regular, short intervals, inspect their stack, and determine
|
short intervals, inspect their stack, and determine which function is currently
|
||||||
which function is currently being run. They also perform a stack unwinding to
|
being run. They also perform a stack unwinding to figure out the call path to
|
||||||
figure out the call path to this function, in order to determine which function
|
this function, in order to determine which function indirectly takes time: for
|
||||||
indirectly takes time: for instance, a function \lstc{fct_a} can call both
|
instance, a function \lstc{fct_a} can call both \lstc{fct_b} and \lstc{fct_c},
|
||||||
\lstc{fct_b} and \lstc{fct_c}, which both take a lot of time; spend practically
|
which both take a lot of time; spend practically no time directly in
|
||||||
no time directly in \lstc{fct_a}, but spend a lot of time in calls to the other
|
\lstc{fct_a}, but spend a lot of time in calls to the other two functions that
|
||||||
two functions that were made from \lstc{fct_a}. Knowing that after all,
|
were made from \lstc{fct_a}. Knowing that after all, \lstc{fct_a} is the
|
||||||
\lstc{fct_a} is the culprit can be useful to a programmer.
|
culprit can be useful to a programmer.
|
||||||
|
|
||||||
Exception handling also requires a stack unwinding mechanism in some languages.
|
Exception handling also requires a stack unwinding mechanism in some languages.
|
||||||
Indeed, an exception is completely different from a \lstinline{return}: while
|
Indeed, an exception is completely different from a \lstinline{return}: while
|
||||||
|
@ -413,10 +410,10 @@ registers values, which will represent the evaluated DWARF row.
|
||||||
\subsection{Concerning correctness}\label{ssec:sem_correctness}
|
\subsection{Concerning correctness}\label{ssec:sem_correctness}
|
||||||
|
|
||||||
The semantics described in this section are designed in a concern of
|
The semantics described in this section are designed in a concern of
|
||||||
\emph{formalization} of the original DWARF standard. This standard, sadly, only
|
\emph{formalization} of the original standard. This standard, sadly, only
|
||||||
devises a plain English description of each instruction's action and result,
|
describes in plain English each instruction's action and result. This basis
|
||||||
which cannot be used as a basis to \emph{prove} anything correct without
|
cannot be used to \emph{prove} anything correct without relying on informal
|
||||||
relying on informal interpretations.
|
interpretations.
|
||||||
|
|
||||||
\subsection{Original language: DWARF instructions}
|
\subsection{Original language: DWARF instructions}
|
||||||
|
|
||||||
|
@ -732,16 +729,15 @@ licenses.
|
||||||
\subsection{Compilation: \ehelfs}\label{ssec:ehelfs}
|
\subsection{Compilation: \ehelfs}\label{ssec:ehelfs}
|
||||||
|
|
||||||
The rough idea of the compilation is to produce, out of the \ehframe{} section
|
The rough idea of the compilation is to produce, out of the \ehframe{} section
|
||||||
of a binary, C code that resembles the code shown in the DWARF semantics from
|
of a binary, C code close to that of Section~\ref{sec:semantics} above. This C
|
||||||
Section~\ref{sec:semantics} above. This C code is then compiled by GCC in
|
code is then compiled by GCC in \lstbash{-O2} mode. This saves us the trouble
|
||||||
\lstbash{-O2} mode. This saves us the trouble of optimizing the generated C
|
of optimizing the generated C code whenever GCC does that by itself.
|
||||||
code whenever GCC does that by itself.
|
|
||||||
|
|
||||||
The generated code consists in a single monolithic function, \lstc{_eh_elf},
|
The generated code consists in a single function, \lstc{_eh_elf}, taking as
|
||||||
taking as arguments an instruction pointer and a memory context (\ie{} the
|
arguments an instruction pointer and a memory context (\ie{} the value of the
|
||||||
value of the various machine registers) as defined in
|
various machine registers) as defined in Listing~\ref{lst:unw_ctx}. The
|
||||||
Listing~\ref{lst:unw_ctx}. The function will then return a fresh memory
|
function then returns a fresh memory context loaded with the values the
|
||||||
context, containing the values the registers hold after unwinding this frame.
|
registers after unwinding this frame.
|
||||||
|
|
||||||
The body of the function itself consists in a single monolithic switch, taking
|
The body of the function itself consists in a single monolithic switch, taking
|
||||||
advantage of the non-standard --~yet overwhelmingly implemented in common C
|
advantage of the non-standard --~yet overwhelmingly implemented in common C
|
||||||
|
@ -953,10 +949,9 @@ across the program to mimic real-world unwinding: we would like to benchmark
|
||||||
stack unwindings crossing some standard library functions, starting from inside
|
stack unwindings crossing some standard library functions, starting from inside
|
||||||
them, etc.
|
them, etc.
|
||||||
|
|
||||||
Finally, the unwound program must be interesting enough to enter and exit
|
Finally, the unwound program must be interesting enough to call functions
|
||||||
functions often, building a good stack of nested function calls (at least
|
often, building a stack of nested function calls (at least frequently 5), have
|
||||||
frequently 5), have FDEs that are not as simple as in Listing~\ref{lst:ex1_dw},
|
FDEs that are not as simple as in Listing~\ref{lst:ex1_dw}, etc.
|
||||||
etc.
|
|
||||||
|
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
|
@ -1016,17 +1011,17 @@ changing one line of code to add one parameter to a function call and linking
|
||||||
against the modified version of \prog{libunwind} instead of the system version.
|
against the modified version of \prog{libunwind} instead of the system version.
|
||||||
|
|
||||||
Once this was done, plugging it in \prog{perf} was the matter of a few lines of
|
Once this was done, plugging it in \prog{perf} was the matter of a few lines of
|
||||||
code only, left apart the benchmarking code. The major problem encountered was
|
code only, left apart the benchmarking code. The major difficulty was to
|
||||||
to understand how \prog{perf} works. In order to avoid perturbing the traced
|
understand how \prog{perf} works. To avoid perturbing the traced program,
|
||||||
program, \prog{perf} does not unwind at runtime, but rather records at regular
|
\prog{perf} does not unwind at runtime, but rather records at regular intervals
|
||||||
intervals the program's stack, and all the auxiliary information that is needed
|
the program's stack, and all the auxiliary information that is needed to unwind
|
||||||
to unwind later. This is done when running \lstbash{perf record}. Then, a
|
later. This is done when running \lstbash{perf record}. Then, a subsequent call
|
||||||
subsequent call to \lstbash{perf report} unwinds the stack to analyze it; but
|
to \lstbash{perf report} unwinds the stack to analyze it; but at this point of
|
||||||
at this point of time, the traced process is long dead. Thus, any PID-based
|
time, the traced process is long dead. Thus, any PID-based approach, or any
|
||||||
approach, or any approach using \texttt{/proc} information will fail. However,
|
approach using \texttt{/proc} information will fail. However, as this was the
|
||||||
as this was the easiest method, the first version of \ehelfs{} used those
|
easiest method, the first version of \ehelfs{} used those mechanisms; it took
|
||||||
mechanisms; it took some code rewriting to move to a PID- and
|
some code rewriting to move to a PID- and \texttt{/proc}-agnostic
|
||||||
\texttt{/proc}-agnostic implementation.
|
implementation.
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\subsection{Other explored methods}
|
\subsection{Other explored methods}
|
||||||
|
@ -1040,8 +1035,8 @@ also never have met the requirement of unwinding from fairly distributed
|
||||||
locations anyway.
|
locations anyway.
|
||||||
|
|
||||||
Another attempt was made using CSmith~\cite{csmith}, a random C code generator
|
Another attempt was made using CSmith~\cite{csmith}, a random C code generator
|
||||||
initially made for C compilers random testing. The idea was still to craft an
|
designed for random testing on C compilers. The idea was still to craft a
|
||||||
interesting C program that would unwind on its own frequently, but to integrate
|
C program that would unwind on its own frequently, but to integrate
|
||||||
CSmith-randomly generated C code within hand-written C snippets that
|
CSmith-randomly generated C code within hand-written C snippets that
|
||||||
would generate large enough FDEs and nested calls. This was abandoned as well
|
would generate large enough FDEs and nested calls. This was abandoned as well
|
||||||
as the call graph of a CSmith-generated code is often far too small, and the
|
as the call graph of a CSmith-generated code is often far too small, and the
|
||||||
|
@ -1105,25 +1100,24 @@ Table~\ref{table:bench_time}.
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
The performance of \ehelfs{} is probably overestimated for a production-ready
|
The performance of \ehelfs{} is probably overestimated for a production-ready
|
||||||
version, since \ehelfs{} do not handle all registers from the original DWARF
|
version, since \ehelfs{} do not handle all the registers from the original
|
||||||
file, and thus the \prog{libunwind} version must perform more computation.
|
DWARF, lightening the computation. However, this overhead, although impossible
|
||||||
However, this overhead, although impossible to measure without first
|
to measure without first implementing supports for every register, would
|
||||||
implementing supports for every register, would probably not be that big, since
|
probably not be that big, since most of the time is spent finding the relevant
|
||||||
most of the time is spent finding the relevant row. Support for every DWARF
|
row. Support for every DWARF instruction, however, would not slow down at all
|
||||||
instruction, however, would not slow down at all the implementation, since
|
the implementation, since every instruction would simply be compiled to x86\_64
|
||||||
every instruction would simply be compiled to x86\_64 without affecting the
|
without affecting the already supported code.
|
||||||
already supported code.
|
|
||||||
|
|
||||||
The fact that there is a sharp difference between cached and uncached
|
The fact that there is a sharp difference between cached and uncached
|
||||||
\prog{libunwind} confirm that our experimental setup did not unwind at totally
|
\prog{libunwind} confirm that our experimental setup did not unwind at totally
|
||||||
different locations every single time, and thus was not biased in this
|
different locations every single time, and thus was not biased in this
|
||||||
direction, since caching is still very efficient.
|
direction, since caching is still very efficient.
|
||||||
|
|
||||||
It is also worth noting that the compilation time of \ehelfs{} is also
|
The compilation time of \ehelfs{} is also reasonable. On the machine
|
||||||
reasonably short. On the machine described in Section~\ref{ssec:bench_hw}, and
|
described in Section~\ref{ssec:bench_hw}, and without using multiple cores to
|
||||||
without using multiple cores to compile, the various shared objects needed to
|
compile, the various shared objects needed to run \prog{hackbench} --~that is,
|
||||||
run \prog{hackbench} --~that is, \prog{hackbench}, \prog{libc}, \prog{ld} and
|
\prog{hackbench}, \prog{libc}, \prog{ld} and \prog{libpthread}~-- are compiled
|
||||||
\prog{libpthread}~-- are compiled in an overall time of $25.28$ seconds.
|
in an overall time of $25.28$ seconds.
|
||||||
|
|
||||||
The unwinding errors observed are hard to investigate, but are most probably
|
The unwinding errors observed are hard to investigate, but are most probably
|
||||||
due to truncated stack records. Indeed, since \prog{perf} dumps the last $n$
|
due to truncated stack records. Indeed, since \prog{perf} dumps the last $n$
|
||||||
|
@ -1136,11 +1130,9 @@ the custom \prog{libunwind} implementation that were not spotted.
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\subsection{Measured compactness}\label{ssec:results_size}
|
\subsection{Measured compactness}\label{ssec:results_size}
|
||||||
|
|
||||||
A first measure of compactness was made in this report for one of the earliest
|
A first measure of compactness was made for one of the earliest working
|
||||||
working versions in Table~\ref{table:basic_eh_elf_space}.
|
versions in Table~\ref{table:basic_eh_elf_space}. The same data, generated for
|
||||||
|
the latest version of \ehelfs, can be seen in Table~\ref{table:bench_space}.
|
||||||
The same data, generated for the latest version of \ehelfs, can be seen in
|
|
||||||
Table~\ref{table:bench_space}.
|
|
||||||
|
|
||||||
The effect of the outlining mentioned in Section~\ref{ssec:space_optim} is
|
The effect of the outlining mentioned in Section~\ref{ssec:space_optim} is
|
||||||
particularly visible in this table: \prog{hackbench} has a significantly bigger
|
particularly visible in this table: \prog{hackbench} has a significantly bigger
|
||||||
|
@ -1150,9 +1142,8 @@ times, compared to \eg{} \prog{libc}, in which the outlined data is reused a
|
||||||
lot.
|
lot.
|
||||||
|
|
||||||
Just as with time performance, the measured compactness would be impacted by
|
Just as with time performance, the measured compactness would be impacted by
|
||||||
supporting every register, but probably not that much either, since most
|
supporting every register, but probably lightly, since the four supported
|
||||||
columns are concerned with the four supported registers (see
|
registers represent most columns --~see Section~\ref{ssec:instr_cov}.
|
||||||
Section~\ref{ssec:instr_cov}).
|
|
||||||
|
|
||||||
\begin{table}[h]
|
\begin{table}[h]
|
||||||
\centering
|
\centering
|
||||||
|
|
Loading…
Reference in a new issue