Write conclusion, move correctness §

2018-08-20 00:30:00 +02:00 · 2018-08-20 00:30:00 +02:00 · df7252238e
commit df7252238e
parent c651fb151e
3 changed files with 69 additions and 30 deletions
--- a/report/fiche_synthese.tex
+++ b/report/fiche_synthese.tex
@ -73,8 +73,8 @@ of compiled DWARF into existing projects have been made easy by implementing an
 alternative version of the \textit{de facto} standard library for this purpose,
 \prog{libunwind}.

-Multiple approaches have been tried and evaluated to determine which
-compilation process leads to the best time/space trade-off.
+We explored and evaluated multiple approaches to determine which compilation
+process leads to the best time/space trade-off.

 Unexpectedly, the part that proved hardest of the project was finding and
 implementing a benchmarking protocol that was both relevant and reliable.
@ -128,9 +128,9 @@ the reference implementation.  Indeed, corner cases occur often, and on a 27000
 samples test, 885 failures were observed for \prog{libunwind}, against 1099 for
 the compiled DWARF version (see Section~\ref{ssec:timeperf}).

-The implementation, however, is not production-ready: it only supports the
+The implementation, however, is not yet production-ready: it only supports the
 x86\_64 architecture, and relies to some extent on the Linux operating system.
-None of those are real problems in practice. Supporting other processor
+None of these pose a fundamental problem. Supporting other processor
 architectures and ABIs are only a matter of engineering,. The operating system
 dependency is only present in the libraries developed in order to interact with
 the compiled unwinding data, which can be developed for virtually any operating
--- a/report/report.tex
+++ b/report/report.tex
@ -2,7 +2,7 @@

 \author{Th\'eophile Bastian\\
 Under supervision of Francesco Zappa Nardelli, March -- August 2018\\
-{\textsc{parkas}, \'Ecole Normale Sup\'erieure de Paris}}
+{\textsc{parkas}, \textsc{inria}}}

 %\date{March -- August 2018\\August 20, 2018}
 \date{\vspace{-2em}}
@ -397,27 +397,21 @@ parse the relevant FDE from its start, until it finds the row it was seeking.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{DWARF semantics}\label{sec:semantics}

-We now define semantics covering the operations used for FDEs described in the
-DWARF standard~\cite{dwarf5std}, such as seen in Listing~\ref{lst:ex1_dwraw},
-with the exception of DWARF expressions. These are not treated here, because
-they form a rich language and would take a lot of time and space to formalize,
-while in the mean time being seldom used --~see Section~\ref{ssec:instr_cov}.
+The DWARF 5 standard~\cite{dwarf5std} is written in English prose, and our
+first task is to formalize it. Thus, in this section, we first recall the
+informal behaviour of DWARF instructions as provided by the standard; and then
+we formalize their semantics by mapping them to well-defined C code. We omit
+the translation of DWARF expressions, because they form a rich language and
+would take a lot of time and space to formalize, while in the mean time being
+seldom used --~see Section~\ref{ssec:instr_cov}.

 These semantics are defined \wrt{} the well-formalized C language, and
-are passing through an intermediary language. The DWARF language can read the
+are passing through an intermediate language. The DWARF language can read the
 whole memory, as well as registers, and is always executed for some instruction
 pointer. The C function representing it thus takes as parameters an array
 of the registers' values as well as an IP, and returns another array of
 registers values, which represents the evaluated DWARF row.

-\subsection{Concerning correctness}\label{ssec:sem_correctness}
-
-The semantics described in this section are designed in a concern of
-\emph{formalization} of the original standard. This standard, sadly, only
-describes in plain English each instruction's action and result. This basis
-cannot be used to \emph{prove} anything correct without relying on informal
-interpretations.
-
 \subsection{Original language: DWARF instructions}

 These are the DWARF instructions used for CFI description, that is, the
@ -486,7 +480,7 @@ a language.

 \subsection{Intermediary language $\intermedlang$}

-A first pass translates DWARF instructions into this intermediary language
+A first pass translates DWARF instructions into this intermediate language
 $\intermedlang$. It is designed to be more mathematical, representing the same
 thing, but abstracting all the data compression of the DWARF format away, so
 that we can better reason on it and transform it into C code.
@ -503,7 +497,7 @@ Its grammar is as follows:
    \values &::= \bot & \text{Values: undefined,}\\
        &\quad\vert~\valaddr{\spexpr} & \text{at address $x$},\\
        &\quad\vert~\valval{\spexpr} & \text{of value $x$} \\
-        &\quad\vert~\valexpr{??} & \text{of expression $x$, see in text} \\
+        &\quad\vert~\valexpr{} & \text{of expression $x$, see in text} \\
    \spexpr &::= \regs \times \mathbb{Z}
        & \text{A ``simple'' expression $\reg{reg} + \textit{offset}$} \\
 \end{align*}
@ -614,7 +608,7 @@ $f$. If we consider the fictive following fictive row $R_0$,
    \end{array}\right.
 \]

-then, we would have
+\noindent{}then, we would have

 \[
    R \insarrow{\reg{rbx}} \left(\valaddr{\reg{rip - 24}}\right)
@ -701,7 +695,7 @@ if(ip >= $loc$) {
 } \end{lstlisting}
 \end{itemize}

-while $\semR{\bullet}$ is defined as
+\noindent{}\noindent{}while $\semR{\bullet}$ is defined as
 \begin{align*}
    \semR{\bot} &\eqspace{}
        \text{\lstc{ERROR_VALUE}} \\
@ -711,6 +705,16 @@ while $\semR{\bullet}$ is defined as
        \text{\lstc{(old_ctx[reg] + offset)}} \\
 \end{align*}

+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\subsection{Concerning correctness}\label{ssec:sem_correctness}
+
+The semantics described in this section are designed in a concern of
+\emph{formalization} of the original standard. This standard, sadly, only
+describes in plain English each instruction's action and result. This basis
+cannot be used to \emph{prove} anything correct without relying on informal
+interpretations.
+
+
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{Stack unwinding data compilation}
@ -721,12 +725,12 @@ actual C implementation.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{Code availability}\label{ssec:code_avail}

-All the code produced during this internship is available on the various
-repositories from \url{https://git.tobast.fr/m2-internship/}. The repositories
-contain \texttt{README} files describing them; a summary and global description
-can be found in the \texttt{abstract} repository. This should be detailed
-enough to run the project. The source code is entirely under free software
-licenses.
+All the code produced during the course of this internship is available on the
+various repositories from \url{https://git.tobast.fr/m2-internship/}. The
+repositories contain \texttt{README} files describing them; a summary and
+global description can be found in the \texttt{abstract} repository. This
+should be detailed enough to run the project. The source code is entirely under
+free software licenses.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{Compilation: \ehelfs}\label{ssec:ehelfs}
@ -1292,6 +1296,41 @@ It is also worth noting that among all of the 4000 analyzed files, all the
 unsupported expressions are clustered in only 12 of them, and only 24 contained
 unsupported instructions at all.

+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section*{Conclusion}
+
+From this data, we can deduce that
+
+\begin{itemize}[itemsep=3pt, parsep=0pt]
+
+    \item compilation of the DWARF unwinding data is effective to speed up
+        drastically unwinding procedures: speedup of $\times 25.9$;
+
+    \item code outlining is effective to reduce the produced binary size: from
+        $1\ \text{MiB}$ to $370\ \text{KiB}$, from a growth factor of $7$
+        compared to DWARF unwinding data to a growth factor of $2.45$;
+
+    \item unwinding relies on small subset of DWARF instructions and
+        expressions, while most instructions are not used at all in DWARF code
+        produced by compilers.
+
+\end{itemize}
+
+The overall size of the project is
+
+\begin{itemize}[itemsep=3pt, parsep=0pt]
+    \item compiler: 1628 lines,
+    \item \prog{libunwind}: 810 lines,
+    \item \prog{perf}: 222 lines
+\end{itemize}
+
+\noindent{} for a total of 2660 lines of code on the main project. The various
+statistics, benchmarking, testing and analyzing code modules add up to around
+1500 more lines.
+
+\pagebreak{}
+
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
--- a/shared/specific.sty
+++ b/shared/specific.sty
@ -24,7 +24,7 @@

 \newcommand{\valaddr}[1]{\operatorname{Addr}\left(#1\right)}
 \newcommand{\valval}[1]{\operatorname{Val}\left(#1\right)}
-\newcommand{\valexpr}[1]{\operatorname{Expr}\left(#1\right)}
+\newcommand{\valexpr}{\operatorname{Expr}}

 \newcommand{\intermedlang}{\mathcal{I}}