Eliminate future tense

This commit is contained in:
Théophile Bastian 2018-08-19 18:55:16 +02:00
parent d031d8ec49
commit 887027f0f3

View file

@ -125,7 +125,7 @@ from it. For instance, when running a debugger, a frequent usage is to obtain a
IP\@. This actually observes the stack to find the different stack frames, and IP\@. This actually observes the stack to find the different stack frames, and
decode them to identify the function names, parameter values, etc. decode them to identify the function names, parameter values, etc.
This operation is far from trivial. Often, a stack frame will only make sense This operation is far from trivial. Often, a stack frame only makes sense
when the machine registers hold the right values. These values, when the machine registers hold the right values. These values,
however, are to be restored from the previous stack frame, where they are however, are to be restored from the previous stack frame, where they are
stored. This imposes to \emph{walk} the stack, reading the frames one after stored. This imposes to \emph{walk} the stack, reading the frames one after
@ -140,8 +140,8 @@ frame, and thus be able to decode the next frame recursively, is called
Let us consider a stack with x86\_64 calling conventions, such as shown in Let us consider a stack with x86\_64 calling conventions, such as shown in
Figure~\ref{fig:call_stack}. Assuming the compiler decided here \emph{not} to Figure~\ref{fig:call_stack}. Assuming the compiler decided here \emph{not} to
use \reg{rbp}, and assuming the function allocates \eg{} a buffer of 8 use \reg{rbp}, and assuming the function allocates \eg{} a buffer of 8
integers, the area allocated for local variables should be at least $32$ bytes integers, the area allocated for local variables is at least $32$ bytes
long (for 4-bytes integers), and \reg{rsp} will be pointing below this area. long (for 4-bytes integers), and \reg{rsp} points below this area.
Left apart analyzing the assembly code produced, there is no way to find where Left apart analyzing the assembly code produced, there is no way to find where
the return address is stored, relatively to \reg{rsp}, at some arbitrary point the return address is stored, relatively to \reg{rsp}, at some arbitrary point
of the function. Even when \reg{rbp} is used, there is no easy way to guess of the function. Even when \reg{rbp} is used, there is no easy way to guess
@ -347,9 +347,9 @@ clone of the previous one, which can then be altered (\eg{} here by setting
\lstc{CFA} to $\reg{rsp} + 48$). This means that every line is defined \wrt{} \lstc{CFA} to $\reg{rsp} + 48$). This means that every line is defined \wrt{}
the previous one, and that the IPs of the successive rows cannot be determined the previous one, and that the IPs of the successive rows cannot be determined
without evaluating every row that comes before in the first place. Thus, without evaluating every row that comes before in the first place. Thus,
unwinding a frame from an IP close to the end of the frame will require unwinding a frame from an IP close to the end of the frame requires evaluating
evaluating pretty much every DWARF row in the table before reaching the pretty much every DWARF row in the table before reaching the relevant
relevant information, slowing down drastically the unwinding process. information, slowing down drastically the unwinding process.
\FloatBarrier{} \FloatBarrier{}
@ -397,19 +397,18 @@ parse the relevant FDE from its start, until it finds the row it was seeking.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{DWARF semantics}\label{sec:semantics} \section{DWARF semantics}\label{sec:semantics}
We will now define semantics covering the operations used for FDEs described in We now define semantics covering the operations used for FDEs described in the
the DWARF standard~\cite{dwarf5std}, such as seen in DWARF standard~\cite{dwarf5std}, such as seen in Listing~\ref{lst:ex1_dwraw},
Listing~\ref{lst:ex1_dwraw}, with the exception of DWARF expressions. These are with the exception of DWARF expressions. These are not treated here, because
not treated here, because they form a rich language and would take a lot of they form a rich language and would take a lot of time and space to formalize,
time and space to formalize, while in the mean time being seldom used --~see while in the mean time being seldom used --~see Section~\ref{ssec:instr_cov}.
Section~\ref{ssec:instr_cov}.
These semantics are defined \wrt{} the well-formalized C language, and These semantics are defined \wrt{} the well-formalized C language, and
are passing through an intermediary language. The DWARF language can read the are passing through an intermediary language. The DWARF language can read the
whole memory, as well as registers, and is always executed for some instruction whole memory, as well as registers, and is always executed for some instruction
pointer. The C function representing it will thus take as parameters an array pointer. The C function representing it thus takes as parameters an array
of the registers' values as well as an IP, and will return another array of of the registers' values as well as an IP, and returns another array of
registers values, which will represent the evaluated DWARF row. registers values, which represents the evaluated DWARF row.
\subsection{Concerning correctness}\label{ssec:sem_correctness} \subsection{Concerning correctness}\label{ssec:sem_correctness}
@ -429,7 +428,7 @@ instructions are up to variants --~most instructions exist in multiple formats
to handle various operands formatting for space optimisation. Since we won't be to handle various operands formatting for space optimisation. Since we won't be
talking about the underlying file format here, those variations between \eg{} talking about the underlying file format here, those variations between \eg{}
\dwcfa{advance\_loc1} and \dwcfa{advance\_loc2} --~which differ only on the \dwcfa{advance\_loc1} and \dwcfa{advance\_loc2} --~which differ only on the
number of bytes of their operand~-- are irrelevant and will be eluded. number of bytes of their operand~-- are irrelevant and are eluded.
As said before, we also elude here references to DWARF expressions, as they are As said before, we also elude here references to DWARF expressions, as they are
complex and are mostly not implemented in the actual compiler anyway --~left complex and are mostly not implemented in the actual compiler anyway --~left
@ -487,7 +486,7 @@ a language.
\subsection{Intermediary language $\intermedlang$} \subsection{Intermediary language $\intermedlang$}
A first pass will translate DWARF instructions into this intermediary language A first pass translates DWARF instructions into this intermediary language
$\intermedlang$. It is designed to be more mathematical, representing the same $\intermedlang$. It is designed to be more mathematical, representing the same
thing, but abstracting all the data compression of the DWARF format away, so thing, but abstracting all the data compression of the DWARF format away, so
that we can better reason on it and transform it into C code. that we can better reason on it and transform it into C code.
@ -535,7 +534,7 @@ here.
The target language of these semantics is a C function, to be interpreted The target language of these semantics is a C function, to be interpreted
\wrt{} the C11 standard~\cite{c11std}. The function is supposed to be run \wrt{} the C11 standard~\cite{c11std}. The function is supposed to be run
in the context of the program being unwound. In particular, it must be able to in the context of the program being unwound. In particular, it must be able to
dereference some pointer derived from DWARF instructions that will point to the dereference some pointer derived from DWARF instructions that points to the
execution stack, or even the heap. execution stack, or even the heap.
This function takes as arguments an instruction pointer --~supposedly This function takes as arguments an instruction pointer --~supposedly
@ -544,7 +543,7 @@ fresh array of register values after unwinding this call frame. The function is
compositional: it can be called twice in a row to unwind two stack frames, compositional: it can be called twice in a row to unwind two stack frames,
unless the IP obtained after the first unwinding comes from another shared unless the IP obtained after the first unwinding comes from another shared
object file, for instance a call to \prog{libc}. In this case, unwinding the object file, for instance a call to \prog{libc}. In this case, unwinding the
second frame will require loading the corresponding DWARF information. second frame requires loading the corresponding DWARF information.
The function is the following: The function is the following:
@ -558,7 +557,7 @@ duly defined elsewhere, unwinding multiple frames would then look like this:
\lstinputlisting[language=C]{src/dw_semantics/stack_walker.c} \lstinputlisting[language=C]{src/dw_semantics/stack_walker.c}
Thus, if we hold for true that the IP will remain in the same memory segment Thus, if we hold for true that the IP remains in the same memory segment
--~\ie{} binary file~-- for two frames, we can safely unwind two frames this --~\ie{} binary file~-- for two frames, we can safely unwind two frames this
way: way:
@ -578,7 +577,7 @@ interpreted. We then define the interpretation function $\llbracket t
having the knowledge of $H$, the current interpreted row. having the knowledge of $H$, the current interpreted row.
But we also need to keep track of this state-saving stack DWARF uses, which But we also need to keep track of this state-saving stack DWARF uses, which
will be kept in subscript. is kept in subscript.
Thus, we define $\semI{\bullet}{s}(\bullet): \DWARF \times \FDE \to \FDE$, for Thus, we define $\semI{\bullet}{s}(\bullet): \DWARF \times \FDE \to \FDE$, for
$s$ a stack of $\dwrow$, that is, $s$ a stack of $\dwrow$, that is,
@ -756,7 +755,7 @@ switch cases bodies then fill a context with unwound values before return it.
A setting of the compiler also optionally enables another parameter to the A setting of the compiler also optionally enables another parameter to the
\lstc{_eh_elf} function, \lstc{deref}, which is a function pointer. This \lstc{_eh_elf} function, \lstc{deref}, which is a function pointer. This
\lstc{deref} function, when present, replaces everywhere the dereferencing \lstc{deref} function, when present, replaces everywhere the dereferencing
\lstc{*} operator, and can be used to generate \ehelfs{} that will work on \lstc{*} operator, and can be used to generate \ehelfs{} that works on
remote address spaces, that is, whenever the unwinding is not done on the remote address spaces, that is, whenever the unwinding is not done on the
process reading the \ehelf{} itself, but some other process, or even on a stack process reading the \ehelf{} itself, but some other process, or even on a stack
dump of a long-terminated process. dump of a long-terminated process.
@ -1022,7 +1021,7 @@ the program's stack, and all the auxiliary information that is needed to unwind
later. This is done when running \lstbash{perf record}. Then, a subsequent call later. This is done when running \lstbash{perf record}. Then, a subsequent call
to \lstbash{perf report} unwinds the stack to analyze it; but at this point of to \lstbash{perf report} unwinds the stack to analyze it; but at this point of
time, the traced process is long dead. Thus, any PID-based approach, or any time, the traced process is long dead. Thus, any PID-based approach, or any
approach using \texttt{/proc} information will fail. However, as this was the approach using \texttt{/proc} information fails. However, as this was the
easiest method, the first version of \ehelfs{} used those mechanisms; it took easiest method, the first version of \ehelfs{} used those mechanisms; it took
some code rewriting to move to a PID- and \texttt{/proc}-agnostic some code rewriting to move to a PID- and \texttt{/proc}-agnostic
implementation. implementation.