Write conclusion, move correctness §
This commit is contained in:
parent
c651fb151e
commit
df7252238e
3 changed files with 69 additions and 30 deletions
|
@ -73,8 +73,8 @@ of compiled DWARF into existing projects have been made easy by implementing an
|
|||
alternative version of the \textit{de facto} standard library for this purpose,
|
||||
\prog{libunwind}.
|
||||
|
||||
Multiple approaches have been tried and evaluated to determine which
|
||||
compilation process leads to the best time/space trade-off.
|
||||
We explored and evaluated multiple approaches to determine which compilation
|
||||
process leads to the best time/space trade-off.
|
||||
|
||||
Unexpectedly, the part that proved hardest of the project was finding and
|
||||
implementing a benchmarking protocol that was both relevant and reliable.
|
||||
|
@ -128,9 +128,9 @@ the reference implementation. Indeed, corner cases occur often, and on a 27000
|
|||
samples test, 885 failures were observed for \prog{libunwind}, against 1099 for
|
||||
the compiled DWARF version (see Section~\ref{ssec:timeperf}).
|
||||
|
||||
The implementation, however, is not production-ready: it only supports the
|
||||
The implementation, however, is not yet production-ready: it only supports the
|
||||
x86\_64 architecture, and relies to some extent on the Linux operating system.
|
||||
None of those are real problems in practice. Supporting other processor
|
||||
None of these pose a fundamental problem. Supporting other processor
|
||||
architectures and ABIs are only a matter of engineering,. The operating system
|
||||
dependency is only present in the libraries developed in order to interact with
|
||||
the compiled unwinding data, which can be developed for virtually any operating
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
\author{Th\'eophile Bastian\\
|
||||
Under supervision of Francesco Zappa Nardelli, March -- August 2018\\
|
||||
{\textsc{parkas}, \'Ecole Normale Sup\'erieure de Paris}}
|
||||
{\textsc{parkas}, \textsc{inria}}}
|
||||
|
||||
%\date{March -- August 2018\\August 20, 2018}
|
||||
\date{\vspace{-2em}}
|
||||
|
@ -397,27 +397,21 @@ parse the relevant FDE from its start, until it finds the row it was seeking.
|
|||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\section{DWARF semantics}\label{sec:semantics}
|
||||
|
||||
We now define semantics covering the operations used for FDEs described in the
|
||||
DWARF standard~\cite{dwarf5std}, such as seen in Listing~\ref{lst:ex1_dwraw},
|
||||
with the exception of DWARF expressions. These are not treated here, because
|
||||
they form a rich language and would take a lot of time and space to formalize,
|
||||
while in the mean time being seldom used --~see Section~\ref{ssec:instr_cov}.
|
||||
The DWARF 5 standard~\cite{dwarf5std} is written in English prose, and our
|
||||
first task is to formalize it. Thus, in this section, we first recall the
|
||||
informal behaviour of DWARF instructions as provided by the standard; and then
|
||||
we formalize their semantics by mapping them to well-defined C code. We omit
|
||||
the translation of DWARF expressions, because they form a rich language and
|
||||
would take a lot of time and space to formalize, while in the mean time being
|
||||
seldom used --~see Section~\ref{ssec:instr_cov}.
|
||||
|
||||
These semantics are defined \wrt{} the well-formalized C language, and
|
||||
are passing through an intermediary language. The DWARF language can read the
|
||||
are passing through an intermediate language. The DWARF language can read the
|
||||
whole memory, as well as registers, and is always executed for some instruction
|
||||
pointer. The C function representing it thus takes as parameters an array
|
||||
of the registers' values as well as an IP, and returns another array of
|
||||
registers values, which represents the evaluated DWARF row.
|
||||
|
||||
\subsection{Concerning correctness}\label{ssec:sem_correctness}
|
||||
|
||||
The semantics described in this section are designed in a concern of
|
||||
\emph{formalization} of the original standard. This standard, sadly, only
|
||||
describes in plain English each instruction's action and result. This basis
|
||||
cannot be used to \emph{prove} anything correct without relying on informal
|
||||
interpretations.
|
||||
|
||||
\subsection{Original language: DWARF instructions}
|
||||
|
||||
These are the DWARF instructions used for CFI description, that is, the
|
||||
|
@ -486,7 +480,7 @@ a language.
|
|||
|
||||
\subsection{Intermediary language $\intermedlang$}
|
||||
|
||||
A first pass translates DWARF instructions into this intermediary language
|
||||
A first pass translates DWARF instructions into this intermediate language
|
||||
$\intermedlang$. It is designed to be more mathematical, representing the same
|
||||
thing, but abstracting all the data compression of the DWARF format away, so
|
||||
that we can better reason on it and transform it into C code.
|
||||
|
@ -503,7 +497,7 @@ Its grammar is as follows:
|
|||
\values &::= \bot & \text{Values: undefined,}\\
|
||||
&\quad\vert~\valaddr{\spexpr} & \text{at address $x$},\\
|
||||
&\quad\vert~\valval{\spexpr} & \text{of value $x$} \\
|
||||
&\quad\vert~\valexpr{??} & \text{of expression $x$, see in text} \\
|
||||
&\quad\vert~\valexpr{} & \text{of expression $x$, see in text} \\
|
||||
\spexpr &::= \regs \times \mathbb{Z}
|
||||
& \text{A ``simple'' expression $\reg{reg} + \textit{offset}$} \\
|
||||
\end{align*}
|
||||
|
@ -614,7 +608,7 @@ $f$. If we consider the fictive following fictive row $R_0$,
|
|||
\end{array}\right.
|
||||
\]
|
||||
|
||||
then, we would have
|
||||
\noindent{}then, we would have
|
||||
|
||||
\[
|
||||
R \insarrow{\reg{rbx}} \left(\valaddr{\reg{rip - 24}}\right)
|
||||
|
@ -701,7 +695,7 @@ if(ip >= $loc$) {
|
|||
} \end{lstlisting}
|
||||
\end{itemize}
|
||||
|
||||
while $\semR{\bullet}$ is defined as
|
||||
\noindent{}\noindent{}while $\semR{\bullet}$ is defined as
|
||||
\begin{align*}
|
||||
\semR{\bot} &\eqspace{}
|
||||
\text{\lstc{ERROR_VALUE}} \\
|
||||
|
@ -711,6 +705,16 @@ while $\semR{\bullet}$ is defined as
|
|||
\text{\lstc{(old_ctx[reg] + offset)}} \\
|
||||
\end{align*}
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsection{Concerning correctness}\label{ssec:sem_correctness}
|
||||
|
||||
The semantics described in this section are designed in a concern of
|
||||
\emph{formalization} of the original standard. This standard, sadly, only
|
||||
describes in plain English each instruction's action and result. This basis
|
||||
cannot be used to \emph{prove} anything correct without relying on informal
|
||||
interpretations.
|
||||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\section{Stack unwinding data compilation}
|
||||
|
@ -721,12 +725,12 @@ actual C implementation.
|
|||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsection{Code availability}\label{ssec:code_avail}
|
||||
|
||||
All the code produced during this internship is available on the various
|
||||
repositories from \url{https://git.tobast.fr/m2-internship/}. The repositories
|
||||
contain \texttt{README} files describing them; a summary and global description
|
||||
can be found in the \texttt{abstract} repository. This should be detailed
|
||||
enough to run the project. The source code is entirely under free software
|
||||
licenses.
|
||||
All the code produced during the course of this internship is available on the
|
||||
various repositories from \url{https://git.tobast.fr/m2-internship/}. The
|
||||
repositories contain \texttt{README} files describing them; a summary and
|
||||
global description can be found in the \texttt{abstract} repository. This
|
||||
should be detailed enough to run the project. The source code is entirely under
|
||||
free software licenses.
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsection{Compilation: \ehelfs}\label{ssec:ehelfs}
|
||||
|
@ -1292,6 +1296,41 @@ It is also worth noting that among all of the 4000 analyzed files, all the
|
|||
unsupported expressions are clustered in only 12 of them, and only 24 contained
|
||||
unsupported instructions at all.
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\section*{Conclusion}
|
||||
|
||||
From this data, we can deduce that
|
||||
|
||||
\begin{itemize}[itemsep=3pt, parsep=0pt]
|
||||
|
||||
\item compilation of the DWARF unwinding data is effective to speed up
|
||||
drastically unwinding procedures: speedup of $\times 25.9$;
|
||||
|
||||
\item code outlining is effective to reduce the produced binary size: from
|
||||
$1\ \text{MiB}$ to $370\ \text{KiB}$, from a growth factor of $7$
|
||||
compared to DWARF unwinding data to a growth factor of $2.45$;
|
||||
|
||||
\item unwinding relies on small subset of DWARF instructions and
|
||||
expressions, while most instructions are not used at all in DWARF code
|
||||
produced by compilers.
|
||||
|
||||
\end{itemize}
|
||||
|
||||
The overall size of the project is
|
||||
|
||||
\begin{itemize}[itemsep=3pt, parsep=0pt]
|
||||
\item compiler: 1628 lines,
|
||||
\item \prog{libunwind}: 810 lines,
|
||||
\item \prog{perf}: 222 lines
|
||||
\end{itemize}
|
||||
|
||||
\noindent{} for a total of 2660 lines of code on the main project. The various
|
||||
statistics, benchmarking, testing and analyzing code modules add up to around
|
||||
1500 more lines.
|
||||
|
||||
\pagebreak{}
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
%%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
|
|
|
@ -24,7 +24,7 @@
|
|||
|
||||
\newcommand{\valaddr}[1]{\operatorname{Addr}\left(#1\right)}
|
||||
\newcommand{\valval}[1]{\operatorname{Val}\left(#1\right)}
|
||||
\newcommand{\valexpr}[1]{\operatorname{Expr}\left(#1\right)}
|
||||
\newcommand{\valexpr}{\operatorname{Expr}}
|
||||
|
||||
\newcommand{\intermedlang}{\mathcal{I}}
|
||||
|
||||
|
|
Loading…
Reference in a new issue