Review and reword end of §1, §3 and §4
This commit is contained in:
parent
b761f360cc
commit
b128ddd571
3 changed files with 176 additions and 110 deletions
|
@ -88,16 +88,19 @@ before returning). Those preserved registers are \reg{rbx}, \reg{rsp},
|
||||||
conventions}\label{fig:call_stack}
|
conventions}\label{fig:call_stack}
|
||||||
\end{wrapfigure}
|
\end{wrapfigure}
|
||||||
|
|
||||||
The register \reg{rsp} is supposed to always point just past the last used
|
The register \reg{rsp} is supposed to always point to the last used memory cell
|
||||||
memory cell in the stack, thus, when the process just enters a new function,
|
in the stack, thus, when the process just enters a new function, \reg{rsp}
|
||||||
\reg{rsp} points 8 bytes after the location of the return address. Then, the
|
points right to the location of the return address\footnote{Remember that since
|
||||||
compiler might use \reg{rbp} (``base pointer'') to save this value of
|
the stack grows \emph{downwards} in memory, the arrow of \reg{rsp} points
|
||||||
\reg{rip}, by writing the old value of \reg{rbp} just below the return address
|
\emph{below} the RA cell in the figure, and yet the memory cell indexed is the
|
||||||
on the stack, then copying \reg{rsp} to \reg{rbp}. This makes it easy to find
|
one \emph{above} in the drawing, that is, the RA.}. Then, the compiler might
|
||||||
the return address from anywhere within the function, and also allows for easy
|
use \reg{rbp} (``base pointer'') to save this value of \reg{rip}, by writing
|
||||||
addressing of local variables. Yet, using \reg{rbp} to save \reg{rip} is not
|
the old value of \reg{rbp} just below the return address on the stack, then
|
||||||
always done, since it somehow ``wastes'' a register. This decision is, on
|
copying \reg{rsp} to \reg{rbp}. This makes it easy to find the return address
|
||||||
x86\_64 System V, up to the compiler.
|
from anywhere within the function, and also allows for easy addressing of local
|
||||||
|
variables. Yet, using \reg{rbp} to save \reg{rip} is not always done, since it
|
||||||
|
somehow ``wastes'' a register. This decision is, on x86\_64 System V, up to the
|
||||||
|
compiler.
|
||||||
|
|
||||||
Often, a function will start by subtracting some value to \reg{rsp}, allocating
|
Often, a function will start by subtracting some value to \reg{rsp}, allocating
|
||||||
some space in the stack frame for its local variables. Then, it will push on
|
some space in the stack frame for its local variables. Then, it will push on
|
||||||
|
@ -242,29 +245,40 @@ when talking about DWARF, a register is merely a numerical identifier that is
|
||||||
often, but not necessarily, mapped to a real machine register by the ABI\@.
|
often, but not necessarily, mapped to a real machine register by the ABI\@.
|
||||||
|
|
||||||
In practice, this data takes the form of a collection of tables, one table per
|
In practice, this data takes the form of a collection of tables, one table per
|
||||||
Frame Description Entry (FDE), which most often corresponds to a function. Each
|
Frame Description Entry (FDE). A FDE, in turn, is a DWARF entry describing such
|
||||||
column of the table is a register (\eg{} \reg{rsp}), with two additional
|
a table, that has a range of IPs on which it has authority. Most often, but not
|
||||||
|
necessarily, it corresponds to a single function in the original source code.
|
||||||
|
Each column of the table is a register (\eg{} \reg{rsp}), with two additional
|
||||||
special registers, CFA (Canonical Frame Address) and RA (Return Address),
|
special registers, CFA (Canonical Frame Address) and RA (Return Address),
|
||||||
containing respectively the base pointer of the current stack frame and the
|
containing respectively the base pointer of the current stack
|
||||||
return address of the current function (\ie{} for x86\_64, the unwound value of
|
frame\footnote{The CFA is most commonly thought of as the base pointer of the
|
||||||
\reg{rip}, the instruction pointer). Each row of the table is a particular
|
frame, yet this is not enforced by DWARF\@. The CFA is used as an address from
|
||||||
instruction pointer, within the instruction pointer range of the tabulated FDE
|
which other registers will be deduced as offsets, and although it is supposed
|
||||||
(assuming a FDE maps directly to a function, this range is simply the IP range
|
to be the actual base pointer, it can be anything as long as it is close enough
|
||||||
of the given function in the \lstc{.text} section of the binary), a row being
|
to the addresses that will be deduced from it.} and the return address of the
|
||||||
valid from its start IP to the start IP of the next row, or the end IP of the
|
current function (\ie{} for x86\_64, the unwound value of \reg{rip}, the
|
||||||
FDE if it is the last row.
|
instruction pointer). Each row has a certain validity interval, on which it
|
||||||
|
describes accurate unwinding data. This range starts at the instruction pointer
|
||||||
|
it is associated with, and ends at the start IP of the next table row (or the
|
||||||
|
end IP of the current FDE if it was the last row). In particular, there can be
|
||||||
|
no ``IP hole'' within a FDE --~unlike FDEs themselves, which can leave holes
|
||||||
|
between them.
|
||||||
|
|
||||||
|
\begin{figure}[h]
|
||||||
\begin{minipage}{0.45\textwidth}
|
\begin{minipage}{0.45\textwidth}
|
||||||
\lstinputlisting[language=C, firstline=3, lastline=12,
|
\lstinputlisting[language=C, firstline=3, lastline=12,
|
||||||
caption={Original C},label={lst:ex1_c}]
|
caption={Original C},label={lst:ex1_c}]
|
||||||
{src/fib7/fib7.c}
|
{src/fib7/fib7.c}
|
||||||
\end{minipage} \hfill \begin{minipage}{0.45\textwidth}
|
\end{minipage} \hfill \begin{minipage}{0.45\textwidth}
|
||||||
\lstinputlisting[language=C,caption={Processed DWARF},label={lst:ex1_dw}]
|
\lstinputlisting[language=C,caption={Processed DWARF},
|
||||||
|
label={lst:ex1_dw}]
|
||||||
{src/fib7/fib7.fde}
|
{src/fib7/fib7.fde}
|
||||||
\lstinputlisting[language=C,caption={Raw DWARF},label={lst:ex1_dwraw}]
|
\lstinputlisting[language=C,caption={Raw DWARF},label={lst:ex1_dwraw}]
|
||||||
{src/fib7/fib7.raw_fde}
|
{src/fib7/fib7.raw_fde}
|
||||||
\end{minipage}
|
\end{minipage}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
\begin{figure}[h]
|
||||||
\begin{minipage}{0.45\textwidth}
|
\begin{minipage}{0.45\textwidth}
|
||||||
\lstinputlisting[language={[x86masm]Assembler},lastline=11,
|
\lstinputlisting[language={[x86masm]Assembler},lastline=11,
|
||||||
caption={Generated assembly},label={lst:ex1_asm}]
|
caption={Generated assembly},label={lst:ex1_asm}]
|
||||||
|
@ -274,20 +288,49 @@ FDE if it is the last row.
|
||||||
firstnumber=last]
|
firstnumber=last]
|
||||||
{src/fib7/fib7.s}
|
{src/fib7/fib7.s}
|
||||||
\end{minipage}
|
\end{minipage}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
\begin{table}[h]
|
||||||
|
\centering
|
||||||
|
\begin{tabular}{|c|c|c|c|c|c}
|
||||||
|
\stackfhead{+ \mhex{30}}
|
||||||
|
& \stackfhead{+ \mhex{28}}
|
||||||
|
& \stackfhead{+ \mhex{20}}
|
||||||
|
& \stackfhead{+ \mhex{1c}}
|
||||||
|
& \stackfhead{+ \mhex{4}}
|
||||||
|
& \stackfhead{}
|
||||||
|
\\
|
||||||
|
\hline{}
|
||||||
|
Return Address & \textit{Alignment space}
|
||||||
|
& \spaced{2ex}{\lstc{fibo[7]}}
|
||||||
|
& \spaced{4ex}{\ldots}
|
||||||
|
& \spaced{2ex}{\lstc{fibo[0]}}
|
||||||
|
& \textit{Next frame}
|
||||||
|
\\
|
||||||
|
\hline
|
||||||
|
\end{tabular}
|
||||||
|
\caption{Stack frame schema}\label{table:ex1_stack_schema}
|
||||||
|
\end{table}
|
||||||
|
|
||||||
For instance, the C source code in Listing~\ref{lst:ex1_c} above, when compiled
|
For instance, the C source code in Listing~\ref{lst:ex1_c} above, when compiled
|
||||||
with \lstbash{gcc -O1 -fomit-frame-pointer -fno-stack-protector}, yields the
|
with \lstbash{gcc -O1 -fomit-frame-pointer -fno-stack-protector}, yields the
|
||||||
assembly code in Listing~\ref{lst:ex1_asm}. When interpreting the generated
|
assembly code in Listing~\ref{lst:ex1_asm}. The memory layout of the stack
|
||||||
\ehframe{} with \lstbash{readelf -wF}, we obtain the (slightly edited)
|
frame is presented in Table~\ref{table:ex1_stack_schema}, to help understanding
|
||||||
|
how the stack frame is constructed. When interpreting the generated \ehframe{}
|
||||||
|
with \lstbash{readelf -wF}, we obtain the (slightly edited)
|
||||||
Listing~\ref{lst:ex1_dw}. During the function prelude, \ie{} for $\mhex{615}
|
Listing~\ref{lst:ex1_dw}. During the function prelude, \ie{} for $\mhex{615}
|
||||||
\leq \reg{rip} < \mhex{619}$, the stack frame only contains the return address,
|
\leq \reg{rip} < \mhex{619}$, the stack frame only contains the return address,
|
||||||
thus the CFA is 8 bytes above \reg{rsp} (which was the value of \reg{rsp}
|
thus the CFA is 8 bytes above \reg{rsp} (which was the value of \reg{rsp}
|
||||||
before the call), and the return address is precisely at \reg{rsp}. Then, 9
|
before the call, and is the topmost value of used space for this stack frame),
|
||||||
integers of 8 bytes each (8 for \lstc{fibo}, one for \lstc{pos}) are allocated
|
and the return address is precisely at \reg{rsp} --~that is, stored between
|
||||||
on the stack, which puts the CFA 80 bytes above \reg{rsp}, and the return
|
\reg{rsp} and $\reg{rsp} + 8$. Then, 8 integers of 4 bytes each (for
|
||||||
address still 8 bytes below the CFA\@. Then, by the end of the function, the
|
\lstc{fibo}, \lstc{pos} being optimized out) are allocated on the stack, which
|
||||||
local variables are discarded and \reg{rsp} is reset to its value from the
|
puts the CFA 32 bytes above \reg{rsp}, and the return address still 8 bytes
|
||||||
first row.
|
below the CFA\@. Yet, \prog{gcc} decided to allocate a total space of 48 bytes
|
||||||
|
for the stack frame for memory alignment reasons, which means subtracting 40
|
||||||
|
bytes to \reg{rsp} (address $\mhex{615}$ in the assembly). Then, by the end of
|
||||||
|
the function, the local variables are discarded and \reg{rsp} is reset to its
|
||||||
|
value from the first row.
|
||||||
|
|
||||||
However, DWARF data isn't actually stored as a table in the binary files, but
|
However, DWARF data isn't actually stored as a table in the binary files, but
|
||||||
is instead stored as in Listing~\ref{lst:ex1_dwraw}. The first row has the
|
is instead stored as in Listing~\ref{lst:ex1_dwraw}. The first row has the
|
||||||
|
@ -295,12 +338,12 @@ location of the first IP in the FDE, and must define at least its CFA\@. Then,
|
||||||
when all relevant registers are defined, it is possible to define a new row by
|
when all relevant registers are defined, it is possible to define a new row by
|
||||||
providing a location offset (\eg{} here $4$), and the new row is defined as a
|
providing a location offset (\eg{} here $4$), and the new row is defined as a
|
||||||
clone of the previous one, which can then be altered (\eg{} here by setting
|
clone of the previous one, which can then be altered (\eg{} here by setting
|
||||||
\lstc{CFA} to $\reg{rsp} + 80$). This means that every line is defined \wrt{}
|
\lstc{CFA} to $\reg{rsp} + 48$). This means that every line is defined \wrt{}
|
||||||
the previous one, and that the IPs of the successive rows cannot be determined
|
the previous one, and that the IPs of the successive rows cannot be determined
|
||||||
before evaluating every row before. Thus, unwinding a frame from an IP close to
|
without evaluating every row that comes before in the first place. Thus,
|
||||||
the end of the frame will require evaluating pretty much every DWARF row in the
|
unwinding a frame from an IP close to the end of the frame will require
|
||||||
table before reaching the relevant information, slowing down drastically the
|
evaluating pretty much every DWARF row in the table before reaching the
|
||||||
unwinding process.
|
relevant information, slowing down drastically the unwinding process.
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\subsection{How big are FDEs?}
|
\subsection{How big are FDEs?}
|
||||||
|
@ -377,8 +420,8 @@ brevity and clarity. All these instructions are up to variants (most
|
||||||
instructions exist in multiple formats to handle various operands formatting,
|
instructions exist in multiple formats to handle various operands formatting,
|
||||||
to optimize space). Since we won't be talking about the underlying file format
|
to optimize space). Since we won't be talking about the underlying file format
|
||||||
here, those variations between eg. \dwcfa{advance\_loc1} and
|
here, those variations between eg. \dwcfa{advance\_loc1} and
|
||||||
\dwcfa{advance\_loc2} ---~which differ only on the number of bytes of their
|
\dwcfa{advance\_loc2} --~which differ only on the number of bytes of their
|
||||||
operand~--- are irrelevant and will be eluded.
|
operand~-- are irrelevant and will be eluded.
|
||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item{} \dwcfa{set\_loc(loc)}~:
|
\item{} \dwcfa{set\_loc(loc)}~:
|
||||||
|
@ -478,8 +521,8 @@ in the context of the program being unwound. In particular, it must be able to
|
||||||
dereference some pointer derived from DWARF instructions that will point to the
|
dereference some pointer derived from DWARF instructions that will point to the
|
||||||
execution stack, or even the heap.
|
execution stack, or even the heap.
|
||||||
|
|
||||||
This function takes as arguments an instruction pointer ---~supposedly
|
This function takes as arguments an instruction pointer --~supposedly
|
||||||
extracted from $\reg{rip}$~--- and an array of register values; and returns a
|
extracted from $\reg{rip}$~-- and an array of register values; and returns a
|
||||||
fresh array of register values after unwinding this call frame. The function is
|
fresh array of register values after unwinding this call frame. The function is
|
||||||
compositional\footnote{up to technicities: the IP obtained after unwinding the
|
compositional\footnote{up to technicities: the IP obtained after unwinding the
|
||||||
first frame might be handled in a different dynamically loaded object, and this
|
first frame might be handled in a different dynamically loaded object, and this
|
||||||
|
@ -641,25 +684,33 @@ machine code on the x86\_64 platform.
|
||||||
|
|
||||||
The rough idea of the compilation is to produce, out of the \ehframe{} section
|
The rough idea of the compilation is to produce, out of the \ehframe{} section
|
||||||
of a binary, C code that resembles the code shown in the DWARF semantics from
|
of a binary, C code that resembles the code shown in the DWARF semantics from
|
||||||
Section~\ref{sec:semantics} above. This C code is then compiled by GCC,
|
Section~\ref{sec:semantics} above. This C code is then compiled by GCC in
|
||||||
providing for free all the optimization passes of a modern compiler.
|
\lstbash{-O2} mode\footnote{Compiling in \lstbash{-O3} takes way too much
|
||||||
|
time.}, providing for free all the optimization passes of a modern compiler.
|
||||||
|
|
||||||
The generated code consists in a single monolithic function, taking as
|
The generated code consists in a single monolithic function, \lstc{_eh_elf},
|
||||||
arguments an instruction pointer and a memory context (\ie{} the value of the
|
taking as arguments an instruction pointer and a memory context (\ie{} the
|
||||||
various machine registers) as defined in Listing~\ref{lst:unw_ctx}. The
|
value of the various machine registers) as defined in
|
||||||
function will then return a fresh memory context, containing the values the
|
Listing~\ref{lst:unw_ctx}. The function will then return a fresh memory
|
||||||
registers hold after unwinding this frame.
|
context, containing the values the registers hold after unwinding this frame.
|
||||||
|
|
||||||
The body of the function itself is mostly a huge switch, taking advantage of
|
The body of the function itself is mostly a huge switch, taking advantage of
|
||||||
the non-standard ---~yet widely implemented in C compilers~--- syntax for range
|
the non-standard --~yet widely implemented in C compilers~-- syntax for range
|
||||||
switches, in which each \lstc{case} can refer to a range. All the FDEs are
|
switches, in which each \lstinline{case} can refer to a range. All the FDEs are
|
||||||
merged together into this switch, each row of a FDE being a switch case. The
|
merged together into this switch, each row of a FDE being a switch case.
|
||||||
cases then fill a context with unwound values, then return it.
|
Separating the various FDEs in the C code --~other than with comments~-- is,
|
||||||
|
unlike what is done in DWARF, pointless, since accessing a ``row'' has a linear
|
||||||
|
cost, and the C code is not meant to be read, except maybe for debugging
|
||||||
|
purposes. The switch cases bodies then fill a context with unwound values, then
|
||||||
|
return it.
|
||||||
|
|
||||||
An optionally enabled parameter can be used to pass a function pointer to a
|
A setting of the compiler also optionally enables another parameter to the
|
||||||
dereferencing function, that conceptually does what the dereferencing \lstc{*}
|
\lstc{_eh_elf} function, \lstc{deref}, which is a function pointer. This
|
||||||
operator does on a pointer, and is used to unwind a process that is not the
|
\lstc{deref} function, when enabled, replaces everywhere the dereferencing
|
||||||
currently running process, and thus not sharing the same address space. A call
|
\lstc{*} operator, and can be used to generate \ehelfs{} that will work on
|
||||||
|
remote address spaces (\ie{} whenever the unwinding is not done on the process
|
||||||
|
reading the \ehelf{} itself, but some other process, or even on a stack dump of
|
||||||
|
a long-terminated process).
|
||||||
|
|
||||||
Unlike in the \ehframe, and unlike what should be done in a release,
|
Unlike in the \ehframe, and unlike what should be done in a release,
|
||||||
real-world-proof version of the \ehelfs, the choice was made to keep this
|
real-world-proof version of the \ehelfs, the choice was made to keep this
|
||||||
|
@ -675,20 +726,24 @@ is not sufficient to analyze every stack frame as \prog{gdb} would do after a
|
||||||
|
|
||||||
In the unwind context from Listing~\ref{lst:unw_ctx}, the values of type
|
In the unwind context from Listing~\ref{lst:unw_ctx}, the values of type
|
||||||
\lstc{uintptr_t} are the values of the corresponding registers, and
|
\lstc{uintptr_t} are the values of the corresponding registers, and
|
||||||
\lstc{flags} is a 8-bytes value, indicating for each register whether it is
|
\lstc{flags} is a 8-bits value, indicating for each register whether it is
|
||||||
present or not in this context (\ie{} if the \lstc{rbx} bit is not set, the
|
present or not in this context (\ie{} if the \lstc{rbx} bit is not set, the
|
||||||
value of \lstc{rbx} in the structure isn't meaningful), plus an error bit,
|
value of \lstc{rbx} in the structure isn't meaningful), plus an error bit,
|
||||||
indicating whether an error occurred during unwinding.
|
indicating whether an error occurred during unwinding (which can be due \eg{}
|
||||||
|
to an unsupported operation in the original DWARF, thus compiled to an error).
|
||||||
|
|
||||||
This generated data is stored in separate shared object files, which we call
|
This generated data is stored in separate shared object files, which we call
|
||||||
\ehelfs. It would have been possible to alter the original ELF file to embed
|
\ehelfs. It would have been possible to alter the original ELF file to embed
|
||||||
this data as a new section, but it getting it to be executed just as any
|
this data as a new section, but getting it to be executed just as any
|
||||||
portion of the \lstc{.text} section would probably have been painful, and
|
portion of the \lstc{.text} section would probably have been painful, and
|
||||||
keeping it separated during the experimental phase is quite convenient. It is
|
keeping it separated during the experimental phase is quite convenient. It is
|
||||||
possible to have multiple versions of \ehelfs{} files in parallel, with various
|
possible to have multiple versions of \ehelfs{} files in parallel, with various
|
||||||
options turned on or off, and it doesn't require to alter the base system by
|
options turned on or off, and it doesn't require to alter the base system by
|
||||||
editing \eg{} \texttt{/usr/lib/libc-*.so}. Instead, when the \ehelf{} data is
|
editing \eg{} \texttt{/usr/lib/libc-*.so}. Instead, when the \ehelf{} data is
|
||||||
required, those files can simply be \lstc{dlopen}'d.
|
required, those files can simply be \lstc{dlopen}'d. It is also possible to
|
||||||
|
imagine, in a future environment production, packaging \ehelfs{} files
|
||||||
|
separately, so that people interested in heavy computation can have the choice
|
||||||
|
to install them.
|
||||||
|
|
||||||
\medskip
|
\medskip
|
||||||
|
|
||||||
|
@ -705,15 +760,19 @@ generated for the C code in Listing~\ref{lst:ex1_c}.
|
||||||
Without any particular care to efficiency or compactness, it is already
|
Without any particular care to efficiency or compactness, it is already
|
||||||
possible to produce a compiled version very close to the one described in
|
possible to produce a compiled version very close to the one described in
|
||||||
Section~\ref{sec:semantics}. Although the unwinding speed cannot yet be
|
Section~\ref{sec:semantics}. Although the unwinding speed cannot yet be
|
||||||
actually benchmarked, it is already possible to write in a few hundreds of line
|
actually benchmarked, it is already possible to write in a few hundred lines of
|
||||||
of C a simple stack walker printing the functions traversed. It already works
|
C code a simple stack walker printing the functions traversed. It already works
|
||||||
without any problem on the easily tested cases, since corner cases are mostly
|
without any problem on the easily tested cases, since corner cases are mostly
|
||||||
found in standard and highly optimal libraries, and it is not that easy to get
|
found in standard and highly optimized libraries, and it is not that easy to get
|
||||||
the program to stop and print a stack trace from within a system library
|
the program to stop and print a stack trace from within a system library
|
||||||
without using a debugger.
|
without using a debugger.
|
||||||
|
|
||||||
The major drawback of this approach, without any particular care taken, is the
|
The major drawback of this approach, without any particular care taken, is the
|
||||||
space waste.
|
space waste. The space taken by those tentative \ehelfs{} is analyzed in
|
||||||
|
Table~\ref{table:basic_eh_elf_space} for \prog{hackbench}, a small program
|
||||||
|
introduced later in Section~\ref{ssec:bench_perf}, and the libraries on which
|
||||||
|
it depends.
|
||||||
|
|
||||||
|
|
||||||
\begin{table}[h]
|
\begin{table}[h]
|
||||||
\centering
|
\centering
|
||||||
|
@ -736,11 +795,6 @@ space waste.
|
||||||
\caption{Basic \ehelfs{} space usage}\label{table:basic_eh_elf_space}
|
\caption{Basic \ehelfs{} space usage}\label{table:basic_eh_elf_space}
|
||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
The space taken by those tentative \ehelfs{} is analyzed in
|
|
||||||
Table~\ref{table:basic_eh_elf_space} for \prog{hackbench}, a small program
|
|
||||||
introduced later in Section~\ref{ssec:bench_perf}, and the libraries on which
|
|
||||||
it depends.
|
|
||||||
|
|
||||||
The first column only includes the sizes of the ELF sections \lstc{.text} (the
|
The first column only includes the sizes of the ELF sections \lstc{.text} (the
|
||||||
program itself) and \lstc{.rodata}, the read-only data (such as static strings,
|
program itself) and \lstc{.rodata}, the read-only data (such as static strings,
|
||||||
etc.). Only the weight of the \lstc{.text} section of the generated \ehelfs{}
|
etc.). Only the weight of the \lstc{.text} section of the generated \ehelfs{}
|
||||||
|
@ -764,16 +818,17 @@ made in order to shrink the \ehelfs.
|
||||||
The major optimization that most reduced the output size was to use an if/else
|
The major optimization that most reduced the output size was to use an if/else
|
||||||
tree implementing a binary search on the program counter relevant intervals,
|
tree implementing a binary search on the program counter relevant intervals,
|
||||||
instead of a huge switch. In the process, we also \emph{outline} a lot of code,
|
instead of a huge switch. In the process, we also \emph{outline} a lot of code,
|
||||||
that is, find out identical code blocks, move them outside of the if/else tree,
|
that is, find out identical ``switch cases'' bodies (which are not switch cases
|
||||||
identify them by a label, and jump to them using a \lstc{goto}, which
|
anymore, but if bodies), move them outside of the if/else tree, identify them
|
||||||
de-duplicates a lot of code and contributes greatly to the shrinking. In the
|
by a label, and jump to them using a \lstc{goto}, which de-duplicates a lot of
|
||||||
process, we noticed that the vast majority of FDE rows are actually taken among
|
code and contributes greatly to the shrinking. In the process, we noticed that
|
||||||
very few ``common'' FDE rows.
|
the vast majority of FDE rows are actually taken among very few ``common'' FDE
|
||||||
|
rows.
|
||||||
|
|
||||||
This makes this optimization really efficient, as seen later in
|
This makes this optimization really efficient, as seen later in
|
||||||
Section~\ref{ssec:results_size}, but also makes it an interesting question ---
|
Section~\ref{ssec:results_size}, but also makes it an interesting question
|
||||||
not investigated during this internship --- to find out whether standard DWARF
|
--~not investigated during this internship~-- to find out whether standard
|
||||||
data could be efficiently compressed in this way.
|
DWARF data could be efficiently compressed in this way.
|
||||||
|
|
||||||
\begin{minipage}{0.45\textwidth}
|
\begin{minipage}{0.45\textwidth}
|
||||||
\lstinputlisting[language=C, caption={\ehelf{} for the previous example},
|
\lstinputlisting[language=C, caption={\ehelf{} for the previous example},
|
||||||
|
@ -806,15 +861,16 @@ However, unwinding over and over again from the same program point would have
|
||||||
had no interest at all, since \prog{libunwind} would have simply cached the
|
had no interest at all, since \prog{libunwind} would have simply cached the
|
||||||
relevant DWARF row. In the mean time, making sure that the various unwinding
|
relevant DWARF row. In the mean time, making sure that the various unwinding
|
||||||
are made from different locations is somehow cheating, since it makes useless
|
are made from different locations is somehow cheating, since it makes useless
|
||||||
\prog{libunwind}'s caching. All in all, the benchmarking method must have a
|
\prog{libunwind}'s caching and does not reproduce ``real-world'' unwinding
|
||||||
``natural'' distribution of unwindings.
|
distribution. All in all, the benchmarking method must have a ``natural''
|
||||||
|
distribution of unwindings.
|
||||||
|
|
||||||
Another requirement is to also distribute quite evenly the unwinding points
|
Another requirement is to also distribute quite evenly the unwinding points
|
||||||
across the program: we would like to benchmark stack unwindings crossing some
|
across the program: we would like to benchmark stack unwindings crossing some
|
||||||
standard library functions, starting from inside them, etc.
|
standard library functions, starting from inside them, etc.
|
||||||
|
|
||||||
Finally, the unwound program must be interesting enough to enter and exit a lot
|
Finally, the unwound program must be interesting enough to enter and exit a lot
|
||||||
of function, nest function calls, have FDEs that are not as simple as in
|
of functions, nest function calls, have FDEs that are not as simple as in
|
||||||
Listing~\ref{lst:ex1_dw}, etc.
|
Listing~\ref{lst:ex1_dw}, etc.
|
||||||
|
|
||||||
|
|
||||||
|
@ -864,19 +920,23 @@ system and process as much as possible, to be able to unwind in any context.
|
||||||
This very restricted information lacked a memory map (a table indicating which
|
This very restricted information lacked a memory map (a table indicating which
|
||||||
shared object is mapped at which address in memory) in order to use \ehelfs.
|
shared object is mapped at which address in memory) in order to use \ehelfs.
|
||||||
Apart from this, the modified version of \prog{libunwind} produced is entirely
|
Apart from this, the modified version of \prog{libunwind} produced is entirely
|
||||||
compatible with the vanilla version.
|
compatible with the vanilla version, meaning that the only modifications
|
||||||
|
required to use \ehelfs{} within any project using \prog{libunwind} should be
|
||||||
|
modifying one line of code (this function call, which is a setup function) and
|
||||||
|
linking against the modified version of \prog{libunwind} instead of the system
|
||||||
|
version.
|
||||||
|
|
||||||
Once this was done, plugging it in \prog{perf} was the matter of a few lines of
|
Once this was done, plugging it in \prog{perf} was the matter of a few lines of
|
||||||
code only. The major problem encountered was to understand how \prog{perf}
|
code only, left apart the benchmarking code. The major problem encountered was
|
||||||
works. In order to avoid perturbing the traced program, \prog{perf} does not
|
to understand how \prog{perf} works. In order to avoid perturbing the traced
|
||||||
unwind at runtime, but rather records at regular interval the program's stack,
|
program, \prog{perf} does not unwind at runtime, but rather records at regular
|
||||||
and all the auxiliary information that is needed to unwind later. This is done
|
intervals the program's stack, and all the auxiliary information that is needed
|
||||||
when running \lstbash{perf record}. Then, \lstbash{perf report} unwinds the
|
to unwind later. This is done when running \lstbash{perf record}. Then,
|
||||||
stack to analyze it; but at this point of time, the traced process is long
|
\lstbash{perf report} unwinds the stack to analyze it; but at this point of
|
||||||
dead, thus any PID-based approach, or any approach using \texttt{/proc}
|
time, the traced process is long dead, thus any PID-based approach, or any
|
||||||
information will fail. However, as this was the easiest method, this approach
|
approach using \texttt{/proc} information will fail. However, as this was the
|
||||||
was chosen when implementing the first version of \ehelfs; thus requiring some
|
easiest method, the first version of \ehelfs{} used those mechanisms; thus
|
||||||
code rewriting.
|
requiring some code rewriting.
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\subsection{Other explored methods}
|
\subsection{Other explored methods}
|
||||||
|
@ -884,15 +944,15 @@ code rewriting.
|
||||||
The first approach tried to benchmark was trying to create some specific C code
|
The first approach tried to benchmark was trying to create some specific C code
|
||||||
that would meet the requirements from Section~\ref{ssec:bench_req}, while
|
that would meet the requirements from Section~\ref{ssec:bench_req}, while
|
||||||
calling itself a benchmarking procedure from time to time. This was abandoned
|
calling itself a benchmarking procedure from time to time. This was abandoned
|
||||||
quite fast, because generating C code interesting enough to be unwound turned
|
quite quickly, because generating C code interesting enough to be unwound
|
||||||
out hard, and the generated FDEs invariably ended out uninteresting. It would
|
turned out hard, and the generated FDEs invariably ended out uninteresting. It
|
||||||
also never have met the requirement of unwinding from fairly distributed
|
would also never have met the requirement of unwinding from fairly distributed
|
||||||
locations anyway.
|
locations anyway.
|
||||||
|
|
||||||
Another attempt was made using CSmith~\cite{csmith}, a random C code generator
|
Another attempt was made using CSmith~\cite{csmith}, a random C code generator
|
||||||
initially made for C compilers random testing. The idea was still to craft an
|
initially made for C compilers random testing. The idea was still to craft an
|
||||||
interesting C program that would unwind on its own frequently, but to integrate
|
interesting C program that would unwind on its own frequently, but to integrate
|
||||||
randomly generated C code with CSmith to integrate interesting C snippets that
|
CSmith-randomly generated C code within hand-written C snippets that
|
||||||
would generate large enough FDEs and nested calls. This was abandoned as well
|
would generate large enough FDEs and nested calls. This was abandoned as well
|
||||||
as the call graph of a CSmith-generated code is often far too small, and the
|
as the call graph of a CSmith-generated code is often far too small, and the
|
||||||
CSmith code is notoriously hard to understand and edit.
|
CSmith code is notoriously hard to understand and edit.
|
||||||
|
|
|
@ -7,3 +7,6 @@
|
||||||
\newcommand{\set}[1]{\left\{ #1 \right\}}
|
\newcommand{\set}[1]{\left\{ #1 \right\}}
|
||||||
\newcommand{\card}[1]{\left\vert{} #1 \right\vert}
|
\newcommand{\card}[1]{\left\vert{} #1 \right\vert}
|
||||||
\newcommand{\abs}[1]{\left\vert{} #1 \right\vert}
|
\newcommand{\abs}[1]{\left\vert{} #1 \right\vert}
|
||||||
|
|
||||||
|
\newcommand{\tnhead}[2]{\multicolumn{1}{#1}{#2}} % Table neutral head
|
||||||
|
\newcommand{\spaced}[2]{\hspace{#1} #2 \hspace{#1}}
|
||||||
|
|
|
@ -1,5 +1,8 @@
|
||||||
%% Specific commands for this project
|
%% Specific commands for this project
|
||||||
|
|
||||||
|
\newcommand{\stackfhead}[1]
|
||||||
|
{\tnhead{l}{\hspace{-5ex}$\reg{rsp} #1$ \hspace{2em}}}
|
||||||
|
|
||||||
\newcommand{\prog}[1]{\texttt{#1}}
|
\newcommand{\prog}[1]{\texttt{#1}}
|
||||||
\newcommand{\ehelf}{\texttt{eh\_elf}}
|
\newcommand{\ehelf}{\texttt{eh\_elf}}
|
||||||
\newcommand{\ehelfs}{\texttt{eh\_elfs}}
|
\newcommand{\ehelfs}{\texttt{eh\_elfs}}
|
||||||
|
|
Loading…
Reference in a new issue