Review and reword end of §1, §3 and §4
This commit is contained in:
parent
b761f360cc
commit
b128ddd571
3 changed files with 176 additions and 110 deletions
|
@ -88,16 +88,19 @@ before returning). Those preserved registers are \reg{rbx}, \reg{rsp},
|
|||
conventions}\label{fig:call_stack}
|
||||
\end{wrapfigure}
|
||||
|
||||
The register \reg{rsp} is supposed to always point just past the last used
|
||||
memory cell in the stack, thus, when the process just enters a new function,
|
||||
\reg{rsp} points 8 bytes after the location of the return address. Then, the
|
||||
compiler might use \reg{rbp} (``base pointer'') to save this value of
|
||||
\reg{rip}, by writing the old value of \reg{rbp} just below the return address
|
||||
on the stack, then copying \reg{rsp} to \reg{rbp}. This makes it easy to find
|
||||
the return address from anywhere within the function, and also allows for easy
|
||||
addressing of local variables. Yet, using \reg{rbp} to save \reg{rip} is not
|
||||
always done, since it somehow ``wastes'' a register. This decision is, on
|
||||
x86\_64 System V, up to the compiler.
|
||||
The register \reg{rsp} is supposed to always point to the last used memory cell
|
||||
in the stack, thus, when the process just enters a new function, \reg{rsp}
|
||||
points right to the location of the return address\footnote{Remember that since
|
||||
the stack grows \emph{downwards} in memory, the arrow of \reg{rsp} points
|
||||
\emph{below} the RA cell in the figure, and yet the memory cell indexed is the
|
||||
one \emph{above} in the drawing, that is, the RA.}. Then, the compiler might
|
||||
use \reg{rbp} (``base pointer'') to save this value of \reg{rip}, by writing
|
||||
the old value of \reg{rbp} just below the return address on the stack, then
|
||||
copying \reg{rsp} to \reg{rbp}. This makes it easy to find the return address
|
||||
from anywhere within the function, and also allows for easy addressing of local
|
||||
variables. Yet, using \reg{rbp} to save \reg{rip} is not always done, since it
|
||||
somehow ``wastes'' a register. This decision is, on x86\_64 System V, up to the
|
||||
compiler.
|
||||
|
||||
Often, a function will start by subtracting some value to \reg{rsp}, allocating
|
||||
some space in the stack frame for its local variables. Then, it will push on
|
||||
|
@ -242,52 +245,92 @@ when talking about DWARF, a register is merely a numerical identifier that is
|
|||
often, but not necessarily, mapped to a real machine register by the ABI\@.
|
||||
|
||||
In practice, this data takes the form of a collection of tables, one table per
|
||||
Frame Description Entry (FDE), which most often corresponds to a function. Each
|
||||
column of the table is a register (\eg{} \reg{rsp}), with two additional
|
||||
Frame Description Entry (FDE). A FDE, in turn, is a DWARF entry describing such
|
||||
a table, that has a range of IPs on which it has authority. Most often, but not
|
||||
necessarily, it corresponds to a single function in the original source code.
|
||||
Each column of the table is a register (\eg{} \reg{rsp}), with two additional
|
||||
special registers, CFA (Canonical Frame Address) and RA (Return Address),
|
||||
containing respectively the base pointer of the current stack frame and the
|
||||
return address of the current function (\ie{} for x86\_64, the unwound value of
|
||||
\reg{rip}, the instruction pointer). Each row of the table is a particular
|
||||
instruction pointer, within the instruction pointer range of the tabulated FDE
|
||||
(assuming a FDE maps directly to a function, this range is simply the IP range
|
||||
of the given function in the \lstc{.text} section of the binary), a row being
|
||||
valid from its start IP to the start IP of the next row, or the end IP of the
|
||||
FDE if it is the last row.
|
||||
containing respectively the base pointer of the current stack
|
||||
frame\footnote{The CFA is most commonly thought of as the base pointer of the
|
||||
frame, yet this is not enforced by DWARF\@. The CFA is used as an address from
|
||||
which other registers will be deduced as offsets, and although it is supposed
|
||||
to be the actual base pointer, it can be anything as long as it is close enough
|
||||
to the addresses that will be deduced from it.} and the return address of the
|
||||
current function (\ie{} for x86\_64, the unwound value of \reg{rip}, the
|
||||
instruction pointer). Each row has a certain validity interval, on which it
|
||||
describes accurate unwinding data. This range starts at the instruction pointer
|
||||
it is associated with, and ends at the start IP of the next table row (or the
|
||||
end IP of the current FDE if it was the last row). In particular, there can be
|
||||
no ``IP hole'' within a FDE --~unlike FDEs themselves, which can leave holes
|
||||
between them.
|
||||
|
||||
\begin{minipage}{0.45\textwidth}
|
||||
\lstinputlisting[language=C, firstline=3, lastline=12,
|
||||
caption={Original C},label={lst:ex1_c}]
|
||||
{src/fib7/fib7.c}
|
||||
\end{minipage} \hfill \begin{minipage}{0.45\textwidth}
|
||||
\lstinputlisting[language=C,caption={Processed DWARF},label={lst:ex1_dw}]
|
||||
{src/fib7/fib7.fde}
|
||||
\lstinputlisting[language=C,caption={Raw DWARF},label={lst:ex1_dwraw}]
|
||||
{src/fib7/fib7.raw_fde}
|
||||
\end{minipage}
|
||||
\begin{figure}[h]
|
||||
\begin{minipage}{0.45\textwidth}
|
||||
\lstinputlisting[language=C, firstline=3, lastline=12,
|
||||
caption={Original C},label={lst:ex1_c}]
|
||||
{src/fib7/fib7.c}
|
||||
\end{minipage} \hfill \begin{minipage}{0.45\textwidth}
|
||||
\lstinputlisting[language=C,caption={Processed DWARF},
|
||||
label={lst:ex1_dw}]
|
||||
{src/fib7/fib7.fde}
|
||||
\lstinputlisting[language=C,caption={Raw DWARF},label={lst:ex1_dwraw}]
|
||||
{src/fib7/fib7.raw_fde}
|
||||
\end{minipage}
|
||||
\end{figure}
|
||||
|
||||
\begin{minipage}{0.45\textwidth}
|
||||
\lstinputlisting[language={[x86masm]Assembler},lastline=11,
|
||||
caption={Generated assembly},label={lst:ex1_asm}]
|
||||
{src/fib7/fib7.s}
|
||||
\end{minipage} \hfill \begin{minipage}{0.45\textwidth}
|
||||
\lstinputlisting[language={[x86masm]Assembler},firstline=12,
|
||||
firstnumber=last]
|
||||
{src/fib7/fib7.s}
|
||||
\end{minipage}
|
||||
\begin{figure}[h]
|
||||
\begin{minipage}{0.45\textwidth}
|
||||
\lstinputlisting[language={[x86masm]Assembler},lastline=11,
|
||||
caption={Generated assembly},label={lst:ex1_asm}]
|
||||
{src/fib7/fib7.s}
|
||||
\end{minipage} \hfill \begin{minipage}{0.45\textwidth}
|
||||
\lstinputlisting[language={[x86masm]Assembler},firstline=12,
|
||||
firstnumber=last]
|
||||
{src/fib7/fib7.s}
|
||||
\end{minipage}
|
||||
\end{figure}
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\begin{tabular}{|c|c|c|c|c|c}
|
||||
\stackfhead{+ \mhex{30}}
|
||||
& \stackfhead{+ \mhex{28}}
|
||||
& \stackfhead{+ \mhex{20}}
|
||||
& \stackfhead{+ \mhex{1c}}
|
||||
& \stackfhead{+ \mhex{4}}
|
||||
& \stackfhead{}
|
||||
\\
|
||||
\hline{}
|
||||
Return Address & \textit{Alignment space}
|
||||
& \spaced{2ex}{\lstc{fibo[7]}}
|
||||
& \spaced{4ex}{\ldots}
|
||||
& \spaced{2ex}{\lstc{fibo[0]}}
|
||||
& \textit{Next frame}
|
||||
\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{Stack frame schema}\label{table:ex1_stack_schema}
|
||||
\end{table}
|
||||
|
||||
For instance, the C source code in Listing~\ref{lst:ex1_c} above, when compiled
|
||||
with \lstbash{gcc -O1 -fomit-frame-pointer -fno-stack-protector}, yields the
|
||||
assembly code in Listing~\ref{lst:ex1_asm}. When interpreting the generated
|
||||
\ehframe{} with \lstbash{readelf -wF}, we obtain the (slightly edited)
|
||||
assembly code in Listing~\ref{lst:ex1_asm}. The memory layout of the stack
|
||||
frame is presented in Table~\ref{table:ex1_stack_schema}, to help understanding
|
||||
how the stack frame is constructed. When interpreting the generated \ehframe{}
|
||||
with \lstbash{readelf -wF}, we obtain the (slightly edited)
|
||||
Listing~\ref{lst:ex1_dw}. During the function prelude, \ie{} for $\mhex{615}
|
||||
\leq \reg{rip} < \mhex{619}$, the stack frame only contains the return address,
|
||||
thus the CFA is 8 bytes above \reg{rsp} (which was the value of \reg{rsp}
|
||||
before the call), and the return address is precisely at \reg{rsp}. Then, 9
|
||||
integers of 8 bytes each (8 for \lstc{fibo}, one for \lstc{pos}) are allocated
|
||||
on the stack, which puts the CFA 80 bytes above \reg{rsp}, and the return
|
||||
address still 8 bytes below the CFA\@. Then, by the end of the function, the
|
||||
local variables are discarded and \reg{rsp} is reset to its value from the
|
||||
first row.
|
||||
before the call, and is the topmost value of used space for this stack frame),
|
||||
and the return address is precisely at \reg{rsp} --~that is, stored between
|
||||
\reg{rsp} and $\reg{rsp} + 8$. Then, 8 integers of 4 bytes each (for
|
||||
\lstc{fibo}, \lstc{pos} being optimized out) are allocated on the stack, which
|
||||
puts the CFA 32 bytes above \reg{rsp}, and the return address still 8 bytes
|
||||
below the CFA\@. Yet, \prog{gcc} decided to allocate a total space of 48 bytes
|
||||
for the stack frame for memory alignment reasons, which means subtracting 40
|
||||
bytes to \reg{rsp} (address $\mhex{615}$ in the assembly). Then, by the end of
|
||||
the function, the local variables are discarded and \reg{rsp} is reset to its
|
||||
value from the first row.
|
||||
|
||||
However, DWARF data isn't actually stored as a table in the binary files, but
|
||||
is instead stored as in Listing~\ref{lst:ex1_dwraw}. The first row has the
|
||||
|
@ -295,12 +338,12 @@ location of the first IP in the FDE, and must define at least its CFA\@. Then,
|
|||
when all relevant registers are defined, it is possible to define a new row by
|
||||
providing a location offset (\eg{} here $4$), and the new row is defined as a
|
||||
clone of the previous one, which can then be altered (\eg{} here by setting
|
||||
\lstc{CFA} to $\reg{rsp} + 80$). This means that every line is defined \wrt{}
|
||||
\lstc{CFA} to $\reg{rsp} + 48$). This means that every line is defined \wrt{}
|
||||
the previous one, and that the IPs of the successive rows cannot be determined
|
||||
before evaluating every row before. Thus, unwinding a frame from an IP close to
|
||||
the end of the frame will require evaluating pretty much every DWARF row in the
|
||||
table before reaching the relevant information, slowing down drastically the
|
||||
unwinding process.
|
||||
without evaluating every row that comes before in the first place. Thus,
|
||||
unwinding a frame from an IP close to the end of the frame will require
|
||||
evaluating pretty much every DWARF row in the table before reaching the
|
||||
relevant information, slowing down drastically the unwinding process.
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsection{How big are FDEs?}
|
||||
|
@ -377,8 +420,8 @@ brevity and clarity. All these instructions are up to variants (most
|
|||
instructions exist in multiple formats to handle various operands formatting,
|
||||
to optimize space). Since we won't be talking about the underlying file format
|
||||
here, those variations between eg. \dwcfa{advance\_loc1} and
|
||||
\dwcfa{advance\_loc2} ---~which differ only on the number of bytes of their
|
||||
operand~--- are irrelevant and will be eluded.
|
||||
\dwcfa{advance\_loc2} --~which differ only on the number of bytes of their
|
||||
operand~-- are irrelevant and will be eluded.
|
||||
|
||||
\begin{itemize}
|
||||
\item{} \dwcfa{set\_loc(loc)}~:
|
||||
|
@ -478,8 +521,8 @@ in the context of the program being unwound. In particular, it must be able to
|
|||
dereference some pointer derived from DWARF instructions that will point to the
|
||||
execution stack, or even the heap.
|
||||
|
||||
This function takes as arguments an instruction pointer ---~supposedly
|
||||
extracted from $\reg{rip}$~--- and an array of register values; and returns a
|
||||
This function takes as arguments an instruction pointer --~supposedly
|
||||
extracted from $\reg{rip}$~-- and an array of register values; and returns a
|
||||
fresh array of register values after unwinding this call frame. The function is
|
||||
compositional\footnote{up to technicities: the IP obtained after unwinding the
|
||||
first frame might be handled in a different dynamically loaded object, and this
|
||||
|
@ -641,25 +684,33 @@ machine code on the x86\_64 platform.
|
|||
|
||||
The rough idea of the compilation is to produce, out of the \ehframe{} section
|
||||
of a binary, C code that resembles the code shown in the DWARF semantics from
|
||||
Section~\ref{sec:semantics} above. This C code is then compiled by GCC,
|
||||
providing for free all the optimization passes of a modern compiler.
|
||||
Section~\ref{sec:semantics} above. This C code is then compiled by GCC in
|
||||
\lstbash{-O2} mode\footnote{Compiling in \lstbash{-O3} takes way too much
|
||||
time.}, providing for free all the optimization passes of a modern compiler.
|
||||
|
||||
The generated code consists in a single monolithic function, taking as
|
||||
arguments an instruction pointer and a memory context (\ie{} the value of the
|
||||
various machine registers) as defined in Listing~\ref{lst:unw_ctx}. The
|
||||
function will then return a fresh memory context, containing the values the
|
||||
registers hold after unwinding this frame.
|
||||
The generated code consists in a single monolithic function, \lstc{_eh_elf},
|
||||
taking as arguments an instruction pointer and a memory context (\ie{} the
|
||||
value of the various machine registers) as defined in
|
||||
Listing~\ref{lst:unw_ctx}. The function will then return a fresh memory
|
||||
context, containing the values the registers hold after unwinding this frame.
|
||||
|
||||
The body of the function itself is mostly a huge switch, taking advantage of
|
||||
the non-standard ---~yet widely implemented in C compilers~--- syntax for range
|
||||
switches, in which each \lstc{case} can refer to a range. All the FDEs are
|
||||
merged together into this switch, each row of a FDE being a switch case. The
|
||||
cases then fill a context with unwound values, then return it.
|
||||
the non-standard --~yet widely implemented in C compilers~-- syntax for range
|
||||
switches, in which each \lstinline{case} can refer to a range. All the FDEs are
|
||||
merged together into this switch, each row of a FDE being a switch case.
|
||||
Separating the various FDEs in the C code --~other than with comments~-- is,
|
||||
unlike what is done in DWARF, pointless, since accessing a ``row'' has a linear
|
||||
cost, and the C code is not meant to be read, except maybe for debugging
|
||||
purposes. The switch cases bodies then fill a context with unwound values, then
|
||||
return it.
|
||||
|
||||
An optionally enabled parameter can be used to pass a function pointer to a
|
||||
dereferencing function, that conceptually does what the dereferencing \lstc{*}
|
||||
operator does on a pointer, and is used to unwind a process that is not the
|
||||
currently running process, and thus not sharing the same address space. A call
|
||||
A setting of the compiler also optionally enables another parameter to the
|
||||
\lstc{_eh_elf} function, \lstc{deref}, which is a function pointer. This
|
||||
\lstc{deref} function, when enabled, replaces everywhere the dereferencing
|
||||
\lstc{*} operator, and can be used to generate \ehelfs{} that will work on
|
||||
remote address spaces (\ie{} whenever the unwinding is not done on the process
|
||||
reading the \ehelf{} itself, but some other process, or even on a stack dump of
|
||||
a long-terminated process).
|
||||
|
||||
Unlike in the \ehframe, and unlike what should be done in a release,
|
||||
real-world-proof version of the \ehelfs, the choice was made to keep this
|
||||
|
@ -675,20 +726,24 @@ is not sufficient to analyze every stack frame as \prog{gdb} would do after a
|
|||
|
||||
In the unwind context from Listing~\ref{lst:unw_ctx}, the values of type
|
||||
\lstc{uintptr_t} are the values of the corresponding registers, and
|
||||
\lstc{flags} is a 8-bytes value, indicating for each register whether it is
|
||||
\lstc{flags} is a 8-bits value, indicating for each register whether it is
|
||||
present or not in this context (\ie{} if the \lstc{rbx} bit is not set, the
|
||||
value of \lstc{rbx} in the structure isn't meaningful), plus an error bit,
|
||||
indicating whether an error occurred during unwinding.
|
||||
indicating whether an error occurred during unwinding (which can be due \eg{}
|
||||
to an unsupported operation in the original DWARF, thus compiled to an error).
|
||||
|
||||
This generated data is stored in separate shared object files, which we call
|
||||
\ehelfs. It would have been possible to alter the original ELF file to embed
|
||||
this data as a new section, but it getting it to be executed just as any
|
||||
this data as a new section, but getting it to be executed just as any
|
||||
portion of the \lstc{.text} section would probably have been painful, and
|
||||
keeping it separated during the experimental phase is quite convenient. It is
|
||||
possible to have multiple versions of \ehelfs{} files in parallel, with various
|
||||
options turned on or off, and it doesn't require to alter the base system by
|
||||
editing \eg{} \texttt{/usr/lib/libc-*.so}. Instead, when the \ehelf{} data is
|
||||
required, those files can simply be \lstc{dlopen}'d.
|
||||
required, those files can simply be \lstc{dlopen}'d. It is also possible to
|
||||
imagine, in a future environment production, packaging \ehelfs{} files
|
||||
separately, so that people interested in heavy computation can have the choice
|
||||
to install them.
|
||||
|
||||
\medskip
|
||||
|
||||
|
@ -705,15 +760,19 @@ generated for the C code in Listing~\ref{lst:ex1_c}.
|
|||
Without any particular care to efficiency or compactness, it is already
|
||||
possible to produce a compiled version very close to the one described in
|
||||
Section~\ref{sec:semantics}. Although the unwinding speed cannot yet be
|
||||
actually benchmarked, it is already possible to write in a few hundreds of line
|
||||
of C a simple stack walker printing the functions traversed. It already works
|
||||
actually benchmarked, it is already possible to write in a few hundred lines of
|
||||
C code a simple stack walker printing the functions traversed. It already works
|
||||
without any problem on the easily tested cases, since corner cases are mostly
|
||||
found in standard and highly optimal libraries, and it is not that easy to get
|
||||
found in standard and highly optimized libraries, and it is not that easy to get
|
||||
the program to stop and print a stack trace from within a system library
|
||||
without using a debugger.
|
||||
|
||||
The major drawback of this approach, without any particular care taken, is the
|
||||
space waste.
|
||||
space waste. The space taken by those tentative \ehelfs{} is analyzed in
|
||||
Table~\ref{table:basic_eh_elf_space} for \prog{hackbench}, a small program
|
||||
introduced later in Section~\ref{ssec:bench_perf}, and the libraries on which
|
||||
it depends.
|
||||
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
|
@ -736,11 +795,6 @@ space waste.
|
|||
\caption{Basic \ehelfs{} space usage}\label{table:basic_eh_elf_space}
|
||||
\end{table}
|
||||
|
||||
The space taken by those tentative \ehelfs{} is analyzed in
|
||||
Table~\ref{table:basic_eh_elf_space} for \prog{hackbench}, a small program
|
||||
introduced later in Section~\ref{ssec:bench_perf}, and the libraries on which
|
||||
it depends.
|
||||
|
||||
The first column only includes the sizes of the ELF sections \lstc{.text} (the
|
||||
program itself) and \lstc{.rodata}, the read-only data (such as static strings,
|
||||
etc.). Only the weight of the \lstc{.text} section of the generated \ehelfs{}
|
||||
|
@ -764,16 +818,17 @@ made in order to shrink the \ehelfs.
|
|||
The major optimization that most reduced the output size was to use an if/else
|
||||
tree implementing a binary search on the program counter relevant intervals,
|
||||
instead of a huge switch. In the process, we also \emph{outline} a lot of code,
|
||||
that is, find out identical code blocks, move them outside of the if/else tree,
|
||||
identify them by a label, and jump to them using a \lstc{goto}, which
|
||||
de-duplicates a lot of code and contributes greatly to the shrinking. In the
|
||||
process, we noticed that the vast majority of FDE rows are actually taken among
|
||||
very few ``common'' FDE rows.
|
||||
that is, find out identical ``switch cases'' bodies (which are not switch cases
|
||||
anymore, but if bodies), move them outside of the if/else tree, identify them
|
||||
by a label, and jump to them using a \lstc{goto}, which de-duplicates a lot of
|
||||
code and contributes greatly to the shrinking. In the process, we noticed that
|
||||
the vast majority of FDE rows are actually taken among very few ``common'' FDE
|
||||
rows.
|
||||
|
||||
This makes this optimization really efficient, as seen later in
|
||||
Section~\ref{ssec:results_size}, but also makes it an interesting question ---
|
||||
not investigated during this internship --- to find out whether standard DWARF
|
||||
data could be efficiently compressed in this way.
|
||||
Section~\ref{ssec:results_size}, but also makes it an interesting question
|
||||
--~not investigated during this internship~-- to find out whether standard
|
||||
DWARF data could be efficiently compressed in this way.
|
||||
|
||||
\begin{minipage}{0.45\textwidth}
|
||||
\lstinputlisting[language=C, caption={\ehelf{} for the previous example},
|
||||
|
@ -806,15 +861,16 @@ However, unwinding over and over again from the same program point would have
|
|||
had no interest at all, since \prog{libunwind} would have simply cached the
|
||||
relevant DWARF row. In the mean time, making sure that the various unwinding
|
||||
are made from different locations is somehow cheating, since it makes useless
|
||||
\prog{libunwind}'s caching. All in all, the benchmarking method must have a
|
||||
``natural'' distribution of unwindings.
|
||||
\prog{libunwind}'s caching and does not reproduce ``real-world'' unwinding
|
||||
distribution. All in all, the benchmarking method must have a ``natural''
|
||||
distribution of unwindings.
|
||||
|
||||
Another requirement is to also distribute quite evenly the unwinding points
|
||||
across the program: we would like to benchmark stack unwindings crossing some
|
||||
standard library functions, starting from inside them, etc.
|
||||
|
||||
Finally, the unwound program must be interesting enough to enter and exit a lot
|
||||
of function, nest function calls, have FDEs that are not as simple as in
|
||||
of functions, nest function calls, have FDEs that are not as simple as in
|
||||
Listing~\ref{lst:ex1_dw}, etc.
|
||||
|
||||
|
||||
|
@ -864,19 +920,23 @@ system and process as much as possible, to be able to unwind in any context.
|
|||
This very restricted information lacked a memory map (a table indicating which
|
||||
shared object is mapped at which address in memory) in order to use \ehelfs.
|
||||
Apart from this, the modified version of \prog{libunwind} produced is entirely
|
||||
compatible with the vanilla version.
|
||||
compatible with the vanilla version, meaning that the only modifications
|
||||
required to use \ehelfs{} within any project using \prog{libunwind} should be
|
||||
modifying one line of code (this function call, which is a setup function) and
|
||||
linking against the modified version of \prog{libunwind} instead of the system
|
||||
version.
|
||||
|
||||
Once this was done, plugging it in \prog{perf} was the matter of a few lines of
|
||||
code only. The major problem encountered was to understand how \prog{perf}
|
||||
works. In order to avoid perturbing the traced program, \prog{perf} does not
|
||||
unwind at runtime, but rather records at regular interval the program's stack,
|
||||
and all the auxiliary information that is needed to unwind later. This is done
|
||||
when running \lstbash{perf record}. Then, \lstbash{perf report} unwinds the
|
||||
stack to analyze it; but at this point of time, the traced process is long
|
||||
dead, thus any PID-based approach, or any approach using \texttt{/proc}
|
||||
information will fail. However, as this was the easiest method, this approach
|
||||
was chosen when implementing the first version of \ehelfs; thus requiring some
|
||||
code rewriting.
|
||||
code only, left apart the benchmarking code. The major problem encountered was
|
||||
to understand how \prog{perf} works. In order to avoid perturbing the traced
|
||||
program, \prog{perf} does not unwind at runtime, but rather records at regular
|
||||
intervals the program's stack, and all the auxiliary information that is needed
|
||||
to unwind later. This is done when running \lstbash{perf record}. Then,
|
||||
\lstbash{perf report} unwinds the stack to analyze it; but at this point of
|
||||
time, the traced process is long dead, thus any PID-based approach, or any
|
||||
approach using \texttt{/proc} information will fail. However, as this was the
|
||||
easiest method, the first version of \ehelfs{} used those mechanisms; thus
|
||||
requiring some code rewriting.
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsection{Other explored methods}
|
||||
|
@ -884,15 +944,15 @@ code rewriting.
|
|||
The first approach tried to benchmark was trying to create some specific C code
|
||||
that would meet the requirements from Section~\ref{ssec:bench_req}, while
|
||||
calling itself a benchmarking procedure from time to time. This was abandoned
|
||||
quite fast, because generating C code interesting enough to be unwound turned
|
||||
out hard, and the generated FDEs invariably ended out uninteresting. It would
|
||||
also never have met the requirement of unwinding from fairly distributed
|
||||
quite quickly, because generating C code interesting enough to be unwound
|
||||
turned out hard, and the generated FDEs invariably ended out uninteresting. It
|
||||
would also never have met the requirement of unwinding from fairly distributed
|
||||
locations anyway.
|
||||
|
||||
Another attempt was made using CSmith~\cite{csmith}, a random C code generator
|
||||
initially made for C compilers random testing. The idea was still to craft an
|
||||
interesting C program that would unwind on its own frequently, but to integrate
|
||||
randomly generated C code with CSmith to integrate interesting C snippets that
|
||||
CSmith-randomly generated C code within hand-written C snippets that
|
||||
would generate large enough FDEs and nested calls. This was abandoned as well
|
||||
as the call graph of a CSmith-generated code is often far too small, and the
|
||||
CSmith code is notoriously hard to understand and edit.
|
||||
|
|
|
@ -7,3 +7,6 @@
|
|||
\newcommand{\set}[1]{\left\{ #1 \right\}}
|
||||
\newcommand{\card}[1]{\left\vert{} #1 \right\vert}
|
||||
\newcommand{\abs}[1]{\left\vert{} #1 \right\vert}
|
||||
|
||||
\newcommand{\tnhead}[2]{\multicolumn{1}{#1}{#2}} % Table neutral head
|
||||
\newcommand{\spaced}[2]{\hspace{#1} #2 \hspace{#1}}
|
||||
|
|
|
@ -1,5 +1,8 @@
|
|||
%% Specific commands for this project
|
||||
|
||||
\newcommand{\stackfhead}[1]
|
||||
{\tnhead{l}{\hspace{-5ex}$\reg{rsp} #1$ \hspace{2em}}}
|
||||
|
||||
\newcommand{\prog}[1]{\texttt{#1}}
|
||||
\newcommand{\ehelf}{\texttt{eh\_elf}}
|
||||
\newcommand{\ehelfs}{\texttt{eh\_elfs}}
|
||||
|
|
Loading…
Reference in a new issue