Write conclusion, move correctness §

This commit is contained in:
Théophile Bastian 2018-08-20 00:30:00 +02:00
parent c651fb151e
commit df7252238e
3 changed files with 69 additions and 30 deletions

View file

@ -73,8 +73,8 @@ of compiled DWARF into existing projects have been made easy by implementing an
alternative version of the \textit{de facto} standard library for this purpose, alternative version of the \textit{de facto} standard library for this purpose,
\prog{libunwind}. \prog{libunwind}.
Multiple approaches have been tried and evaluated to determine which We explored and evaluated multiple approaches to determine which compilation
compilation process leads to the best time/space trade-off. process leads to the best time/space trade-off.
Unexpectedly, the part that proved hardest of the project was finding and Unexpectedly, the part that proved hardest of the project was finding and
implementing a benchmarking protocol that was both relevant and reliable. implementing a benchmarking protocol that was both relevant and reliable.
@ -128,9 +128,9 @@ the reference implementation. Indeed, corner cases occur often, and on a 27000
samples test, 885 failures were observed for \prog{libunwind}, against 1099 for samples test, 885 failures were observed for \prog{libunwind}, against 1099 for
the compiled DWARF version (see Section~\ref{ssec:timeperf}). the compiled DWARF version (see Section~\ref{ssec:timeperf}).
The implementation, however, is not production-ready: it only supports the The implementation, however, is not yet production-ready: it only supports the
x86\_64 architecture, and relies to some extent on the Linux operating system. x86\_64 architecture, and relies to some extent on the Linux operating system.
None of those are real problems in practice. Supporting other processor None of these pose a fundamental problem. Supporting other processor
architectures and ABIs are only a matter of engineering,. The operating system architectures and ABIs are only a matter of engineering,. The operating system
dependency is only present in the libraries developed in order to interact with dependency is only present in the libraries developed in order to interact with
the compiled unwinding data, which can be developed for virtually any operating the compiled unwinding data, which can be developed for virtually any operating

View file

@ -2,7 +2,7 @@
\author{Th\'eophile Bastian\\ \author{Th\'eophile Bastian\\
Under supervision of Francesco Zappa Nardelli, March -- August 2018\\ Under supervision of Francesco Zappa Nardelli, March -- August 2018\\
{\textsc{parkas}, \'Ecole Normale Sup\'erieure de Paris}} {\textsc{parkas}, \textsc{inria}}}
%\date{March -- August 2018\\August 20, 2018} %\date{March -- August 2018\\August 20, 2018}
\date{\vspace{-2em}} \date{\vspace{-2em}}
@ -397,27 +397,21 @@ parse the relevant FDE from its start, until it finds the row it was seeking.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{DWARF semantics}\label{sec:semantics} \section{DWARF semantics}\label{sec:semantics}
We now define semantics covering the operations used for FDEs described in the The DWARF 5 standard~\cite{dwarf5std} is written in English prose, and our
DWARF standard~\cite{dwarf5std}, such as seen in Listing~\ref{lst:ex1_dwraw}, first task is to formalize it. Thus, in this section, we first recall the
with the exception of DWARF expressions. These are not treated here, because informal behaviour of DWARF instructions as provided by the standard; and then
they form a rich language and would take a lot of time and space to formalize, we formalize their semantics by mapping them to well-defined C code. We omit
while in the mean time being seldom used --~see Section~\ref{ssec:instr_cov}. the translation of DWARF expressions, because they form a rich language and
would take a lot of time and space to formalize, while in the mean time being
seldom used --~see Section~\ref{ssec:instr_cov}.
These semantics are defined \wrt{} the well-formalized C language, and These semantics are defined \wrt{} the well-formalized C language, and
are passing through an intermediary language. The DWARF language can read the are passing through an intermediate language. The DWARF language can read the
whole memory, as well as registers, and is always executed for some instruction whole memory, as well as registers, and is always executed for some instruction
pointer. The C function representing it thus takes as parameters an array pointer. The C function representing it thus takes as parameters an array
of the registers' values as well as an IP, and returns another array of of the registers' values as well as an IP, and returns another array of
registers values, which represents the evaluated DWARF row. registers values, which represents the evaluated DWARF row.
\subsection{Concerning correctness}\label{ssec:sem_correctness}
The semantics described in this section are designed in a concern of
\emph{formalization} of the original standard. This standard, sadly, only
describes in plain English each instruction's action and result. This basis
cannot be used to \emph{prove} anything correct without relying on informal
interpretations.
\subsection{Original language: DWARF instructions} \subsection{Original language: DWARF instructions}
These are the DWARF instructions used for CFI description, that is, the These are the DWARF instructions used for CFI description, that is, the
@ -486,7 +480,7 @@ a language.
\subsection{Intermediary language $\intermedlang$} \subsection{Intermediary language $\intermedlang$}
A first pass translates DWARF instructions into this intermediary language A first pass translates DWARF instructions into this intermediate language
$\intermedlang$. It is designed to be more mathematical, representing the same $\intermedlang$. It is designed to be more mathematical, representing the same
thing, but abstracting all the data compression of the DWARF format away, so thing, but abstracting all the data compression of the DWARF format away, so
that we can better reason on it and transform it into C code. that we can better reason on it and transform it into C code.
@ -503,7 +497,7 @@ Its grammar is as follows:
\values &::= \bot & \text{Values: undefined,}\\ \values &::= \bot & \text{Values: undefined,}\\
&\quad\vert~\valaddr{\spexpr} & \text{at address $x$},\\ &\quad\vert~\valaddr{\spexpr} & \text{at address $x$},\\
&\quad\vert~\valval{\spexpr} & \text{of value $x$} \\ &\quad\vert~\valval{\spexpr} & \text{of value $x$} \\
&\quad\vert~\valexpr{??} & \text{of expression $x$, see in text} \\ &\quad\vert~\valexpr{} & \text{of expression $x$, see in text} \\
\spexpr &::= \regs \times \mathbb{Z} \spexpr &::= \regs \times \mathbb{Z}
& \text{A ``simple'' expression $\reg{reg} + \textit{offset}$} \\ & \text{A ``simple'' expression $\reg{reg} + \textit{offset}$} \\
\end{align*} \end{align*}
@ -614,7 +608,7 @@ $f$. If we consider the fictive following fictive row $R_0$,
\end{array}\right. \end{array}\right.
\] \]
then, we would have \noindent{}then, we would have
\[ \[
R \insarrow{\reg{rbx}} \left(\valaddr{\reg{rip - 24}}\right) R \insarrow{\reg{rbx}} \left(\valaddr{\reg{rip - 24}}\right)
@ -701,7 +695,7 @@ if(ip >= $loc$) {
} \end{lstlisting} } \end{lstlisting}
\end{itemize} \end{itemize}
while $\semR{\bullet}$ is defined as \noindent{}\noindent{}while $\semR{\bullet}$ is defined as
\begin{align*} \begin{align*}
\semR{\bot} &\eqspace{} \semR{\bot} &\eqspace{}
\text{\lstc{ERROR_VALUE}} \\ \text{\lstc{ERROR_VALUE}} \\
@ -711,6 +705,16 @@ while $\semR{\bullet}$ is defined as
\text{\lstc{(old_ctx[reg] + offset)}} \\ \text{\lstc{(old_ctx[reg] + offset)}} \\
\end{align*} \end{align*}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Concerning correctness}\label{ssec:sem_correctness}
The semantics described in this section are designed in a concern of
\emph{formalization} of the original standard. This standard, sadly, only
describes in plain English each instruction's action and result. This basis
cannot be used to \emph{prove} anything correct without relying on informal
interpretations.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Stack unwinding data compilation} \section{Stack unwinding data compilation}
@ -721,12 +725,12 @@ actual C implementation.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Code availability}\label{ssec:code_avail} \subsection{Code availability}\label{ssec:code_avail}
All the code produced during this internship is available on the various All the code produced during the course of this internship is available on the
repositories from \url{https://git.tobast.fr/m2-internship/}. The repositories various repositories from \url{https://git.tobast.fr/m2-internship/}. The
contain \texttt{README} files describing them; a summary and global description repositories contain \texttt{README} files describing them; a summary and
can be found in the \texttt{abstract} repository. This should be detailed global description can be found in the \texttt{abstract} repository. This
enough to run the project. The source code is entirely under free software should be detailed enough to run the project. The source code is entirely under
licenses. free software licenses.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Compilation: \ehelfs}\label{ssec:ehelfs} \subsection{Compilation: \ehelfs}\label{ssec:ehelfs}
@ -1292,6 +1296,41 @@ It is also worth noting that among all of the 4000 analyzed files, all the
unsupported expressions are clustered in only 12 of them, and only 24 contained unsupported expressions are clustered in only 12 of them, and only 24 contained
unsupported instructions at all. unsupported instructions at all.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{Conclusion}
From this data, we can deduce that
\begin{itemize}[itemsep=3pt, parsep=0pt]
\item compilation of the DWARF unwinding data is effective to speed up
drastically unwinding procedures: speedup of $\times 25.9$;
\item code outlining is effective to reduce the produced binary size: from
$1\ \text{MiB}$ to $370\ \text{KiB}$, from a growth factor of $7$
compared to DWARF unwinding data to a growth factor of $2.45$;
\item unwinding relies on small subset of DWARF instructions and
expressions, while most instructions are not used at all in DWARF code
produced by compilers.
\end{itemize}
The overall size of the project is
\begin{itemize}[itemsep=3pt, parsep=0pt]
\item compiler: 1628 lines,
\item \prog{libunwind}: 810 lines,
\item \prog{perf}: 222 lines
\end{itemize}
\noindent{} for a total of 2660 lines of code on the main project. The various
statistics, benchmarking, testing and analyzing code modules add up to around
1500 more lines.
\pagebreak{}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

View file

@ -24,7 +24,7 @@
\newcommand{\valaddr}[1]{\operatorname{Addr}\left(#1\right)} \newcommand{\valaddr}[1]{\operatorname{Addr}\left(#1\right)}
\newcommand{\valval}[1]{\operatorname{Val}\left(#1\right)} \newcommand{\valval}[1]{\operatorname{Val}\left(#1\right)}
\newcommand{\valexpr}[1]{\operatorname{Expr}\left(#1\right)} \newcommand{\valexpr}{\operatorname{Expr}}
\newcommand{\intermedlang}{\mathcal{I}} \newcommand{\intermedlang}{\mathcal{I}}