Eliminate future tense
This commit is contained in:
parent
d031d8ec49
commit
887027f0f3
1 changed files with 22 additions and 23 deletions
|
@ -125,7 +125,7 @@ from it. For instance, when running a debugger, a frequent usage is to obtain a
|
|||
IP\@. This actually observes the stack to find the different stack frames, and
|
||||
decode them to identify the function names, parameter values, etc.
|
||||
|
||||
This operation is far from trivial. Often, a stack frame will only make sense
|
||||
This operation is far from trivial. Often, a stack frame only makes sense
|
||||
when the machine registers hold the right values. These values,
|
||||
however, are to be restored from the previous stack frame, where they are
|
||||
stored. This imposes to \emph{walk} the stack, reading the frames one after
|
||||
|
@ -140,8 +140,8 @@ frame, and thus be able to decode the next frame recursively, is called
|
|||
Let us consider a stack with x86\_64 calling conventions, such as shown in
|
||||
Figure~\ref{fig:call_stack}. Assuming the compiler decided here \emph{not} to
|
||||
use \reg{rbp}, and assuming the function allocates \eg{} a buffer of 8
|
||||
integers, the area allocated for local variables should be at least $32$ bytes
|
||||
long (for 4-bytes integers), and \reg{rsp} will be pointing below this area.
|
||||
integers, the area allocated for local variables is at least $32$ bytes
|
||||
long (for 4-bytes integers), and \reg{rsp} points below this area.
|
||||
Left apart analyzing the assembly code produced, there is no way to find where
|
||||
the return address is stored, relatively to \reg{rsp}, at some arbitrary point
|
||||
of the function. Even when \reg{rbp} is used, there is no easy way to guess
|
||||
|
@ -347,9 +347,9 @@ clone of the previous one, which can then be altered (\eg{} here by setting
|
|||
\lstc{CFA} to $\reg{rsp} + 48$). This means that every line is defined \wrt{}
|
||||
the previous one, and that the IPs of the successive rows cannot be determined
|
||||
without evaluating every row that comes before in the first place. Thus,
|
||||
unwinding a frame from an IP close to the end of the frame will require
|
||||
evaluating pretty much every DWARF row in the table before reaching the
|
||||
relevant information, slowing down drastically the unwinding process.
|
||||
unwinding a frame from an IP close to the end of the frame requires evaluating
|
||||
pretty much every DWARF row in the table before reaching the relevant
|
||||
information, slowing down drastically the unwinding process.
|
||||
|
||||
\FloatBarrier{}
|
||||
|
||||
|
@ -397,19 +397,18 @@ parse the relevant FDE from its start, until it finds the row it was seeking.
|
|||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\section{DWARF semantics}\label{sec:semantics}
|
||||
|
||||
We will now define semantics covering the operations used for FDEs described in
|
||||
the DWARF standard~\cite{dwarf5std}, such as seen in
|
||||
Listing~\ref{lst:ex1_dwraw}, with the exception of DWARF expressions. These are
|
||||
not treated here, because they form a rich language and would take a lot of
|
||||
time and space to formalize, while in the mean time being seldom used --~see
|
||||
Section~\ref{ssec:instr_cov}.
|
||||
We now define semantics covering the operations used for FDEs described in the
|
||||
DWARF standard~\cite{dwarf5std}, such as seen in Listing~\ref{lst:ex1_dwraw},
|
||||
with the exception of DWARF expressions. These are not treated here, because
|
||||
they form a rich language and would take a lot of time and space to formalize,
|
||||
while in the mean time being seldom used --~see Section~\ref{ssec:instr_cov}.
|
||||
|
||||
These semantics are defined \wrt{} the well-formalized C language, and
|
||||
are passing through an intermediary language. The DWARF language can read the
|
||||
whole memory, as well as registers, and is always executed for some instruction
|
||||
pointer. The C function representing it will thus take as parameters an array
|
||||
of the registers' values as well as an IP, and will return another array of
|
||||
registers values, which will represent the evaluated DWARF row.
|
||||
pointer. The C function representing it thus takes as parameters an array
|
||||
of the registers' values as well as an IP, and returns another array of
|
||||
registers values, which represents the evaluated DWARF row.
|
||||
|
||||
\subsection{Concerning correctness}\label{ssec:sem_correctness}
|
||||
|
||||
|
@ -429,7 +428,7 @@ instructions are up to variants --~most instructions exist in multiple formats
|
|||
to handle various operands formatting for space optimisation. Since we won't be
|
||||
talking about the underlying file format here, those variations between \eg{}
|
||||
\dwcfa{advance\_loc1} and \dwcfa{advance\_loc2} --~which differ only on the
|
||||
number of bytes of their operand~-- are irrelevant and will be eluded.
|
||||
number of bytes of their operand~-- are irrelevant and are eluded.
|
||||
|
||||
As said before, we also elude here references to DWARF expressions, as they are
|
||||
complex and are mostly not implemented in the actual compiler anyway --~left
|
||||
|
@ -487,7 +486,7 @@ a language.
|
|||
|
||||
\subsection{Intermediary language $\intermedlang$}
|
||||
|
||||
A first pass will translate DWARF instructions into this intermediary language
|
||||
A first pass translates DWARF instructions into this intermediary language
|
||||
$\intermedlang$. It is designed to be more mathematical, representing the same
|
||||
thing, but abstracting all the data compression of the DWARF format away, so
|
||||
that we can better reason on it and transform it into C code.
|
||||
|
@ -535,7 +534,7 @@ here.
|
|||
The target language of these semantics is a C function, to be interpreted
|
||||
\wrt{} the C11 standard~\cite{c11std}. The function is supposed to be run
|
||||
in the context of the program being unwound. In particular, it must be able to
|
||||
dereference some pointer derived from DWARF instructions that will point to the
|
||||
dereference some pointer derived from DWARF instructions that points to the
|
||||
execution stack, or even the heap.
|
||||
|
||||
This function takes as arguments an instruction pointer --~supposedly
|
||||
|
@ -544,7 +543,7 @@ fresh array of register values after unwinding this call frame. The function is
|
|||
compositional: it can be called twice in a row to unwind two stack frames,
|
||||
unless the IP obtained after the first unwinding comes from another shared
|
||||
object file, for instance a call to \prog{libc}. In this case, unwinding the
|
||||
second frame will require loading the corresponding DWARF information.
|
||||
second frame requires loading the corresponding DWARF information.
|
||||
|
||||
The function is the following:
|
||||
|
||||
|
@ -558,7 +557,7 @@ duly defined elsewhere, unwinding multiple frames would then look like this:
|
|||
|
||||
\lstinputlisting[language=C]{src/dw_semantics/stack_walker.c}
|
||||
|
||||
Thus, if we hold for true that the IP will remain in the same memory segment
|
||||
Thus, if we hold for true that the IP remains in the same memory segment
|
||||
--~\ie{} binary file~-- for two frames, we can safely unwind two frames this
|
||||
way:
|
||||
|
||||
|
@ -578,7 +577,7 @@ interpreted. We then define the interpretation function $\llbracket t
|
|||
having the knowledge of $H$, the current interpreted row.
|
||||
|
||||
But we also need to keep track of this state-saving stack DWARF uses, which
|
||||
will be kept in subscript.
|
||||
is kept in subscript.
|
||||
Thus, we define $\semI{\bullet}{s}(\bullet): \DWARF \times \FDE \to \FDE$, for
|
||||
$s$ a stack of $\dwrow$, that is,
|
||||
|
||||
|
@ -756,7 +755,7 @@ switch cases bodies then fill a context with unwound values before return it.
|
|||
A setting of the compiler also optionally enables another parameter to the
|
||||
\lstc{_eh_elf} function, \lstc{deref}, which is a function pointer. This
|
||||
\lstc{deref} function, when present, replaces everywhere the dereferencing
|
||||
\lstc{*} operator, and can be used to generate \ehelfs{} that will work on
|
||||
\lstc{*} operator, and can be used to generate \ehelfs{} that works on
|
||||
remote address spaces, that is, whenever the unwinding is not done on the
|
||||
process reading the \ehelf{} itself, but some other process, or even on a stack
|
||||
dump of a long-terminated process.
|
||||
|
@ -1022,7 +1021,7 @@ the program's stack, and all the auxiliary information that is needed to unwind
|
|||
later. This is done when running \lstbash{perf record}. Then, a subsequent call
|
||||
to \lstbash{perf report} unwinds the stack to analyze it; but at this point of
|
||||
time, the traced process is long dead. Thus, any PID-based approach, or any
|
||||
approach using \texttt{/proc} information will fail. However, as this was the
|
||||
approach using \texttt{/proc} information fails. However, as this was the
|
||||
easiest method, the first version of \ehelfs{} used those mechanisms; it took
|
||||
some code rewriting to move to a PID- and \texttt{/proc}-agnostic
|
||||
implementation.
|
||||
|
|
Loading…
Reference in a new issue