diff --git a/report/fiche_synthese.tex b/report/fiche_synthese.tex index 67a2898..2952b0b 100644 --- a/report/fiche_synthese.tex +++ b/report/fiche_synthese.tex @@ -73,8 +73,8 @@ of compiled DWARF into existing projects have been made easy by implementing an alternative version of the \textit{de facto} standard library for this purpose, \prog{libunwind}. -Multiple approaches have been tried and evaluated to determine which -compilation process leads to the best time/space trade-off. +We explored and evaluated multiple approaches to determine which compilation +process leads to the best time/space trade-off. Unexpectedly, the part that proved hardest of the project was finding and implementing a benchmarking protocol that was both relevant and reliable. @@ -128,9 +128,9 @@ the reference implementation. Indeed, corner cases occur often, and on a 27000 samples test, 885 failures were observed for \prog{libunwind}, against 1099 for the compiled DWARF version (see Section~\ref{ssec:timeperf}). -The implementation, however, is not production-ready: it only supports the +The implementation, however, is not yet production-ready: it only supports the x86\_64 architecture, and relies to some extent on the Linux operating system. -None of those are real problems in practice. Supporting other processor +None of these pose a fundamental problem. Supporting other processor architectures and ABIs are only a matter of engineering,. The operating system dependency is only present in the libraries developed in order to interact with the compiled unwinding data, which can be developed for virtually any operating diff --git a/report/report.tex b/report/report.tex index 862776e..33a5a16 100644 --- a/report/report.tex +++ b/report/report.tex @@ -2,7 +2,7 @@ \author{Th\'eophile Bastian\\ Under supervision of Francesco Zappa Nardelli, March -- August 2018\\ -{\textsc{parkas}, \'Ecole Normale Sup\'erieure de Paris}} +{\textsc{parkas}, \textsc{inria}}} %\date{March -- August 2018\\August 20, 2018} \date{\vspace{-2em}} @@ -397,27 +397,21 @@ parse the relevant FDE from its start, until it finds the row it was seeking. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{DWARF semantics}\label{sec:semantics} -We now define semantics covering the operations used for FDEs described in the -DWARF standard~\cite{dwarf5std}, such as seen in Listing~\ref{lst:ex1_dwraw}, -with the exception of DWARF expressions. These are not treated here, because -they form a rich language and would take a lot of time and space to formalize, -while in the mean time being seldom used --~see Section~\ref{ssec:instr_cov}. +The DWARF 5 standard~\cite{dwarf5std} is written in English prose, and our +first task is to formalize it. Thus, in this section, we first recall the +informal behaviour of DWARF instructions as provided by the standard; and then +we formalize their semantics by mapping them to well-defined C code. We omit +the translation of DWARF expressions, because they form a rich language and +would take a lot of time and space to formalize, while in the mean time being +seldom used --~see Section~\ref{ssec:instr_cov}. These semantics are defined \wrt{} the well-formalized C language, and -are passing through an intermediary language. The DWARF language can read the +are passing through an intermediate language. The DWARF language can read the whole memory, as well as registers, and is always executed for some instruction pointer. The C function representing it thus takes as parameters an array of the registers' values as well as an IP, and returns another array of registers values, which represents the evaluated DWARF row. -\subsection{Concerning correctness}\label{ssec:sem_correctness} - -The semantics described in this section are designed in a concern of -\emph{formalization} of the original standard. This standard, sadly, only -describes in plain English each instruction's action and result. This basis -cannot be used to \emph{prove} anything correct without relying on informal -interpretations. - \subsection{Original language: DWARF instructions} These are the DWARF instructions used for CFI description, that is, the @@ -486,7 +480,7 @@ a language. \subsection{Intermediary language $\intermedlang$} -A first pass translates DWARF instructions into this intermediary language +A first pass translates DWARF instructions into this intermediate language $\intermedlang$. It is designed to be more mathematical, representing the same thing, but abstracting all the data compression of the DWARF format away, so that we can better reason on it and transform it into C code. @@ -503,7 +497,7 @@ Its grammar is as follows: \values &::= \bot & \text{Values: undefined,}\\ &\quad\vert~\valaddr{\spexpr} & \text{at address $x$},\\ &\quad\vert~\valval{\spexpr} & \text{of value $x$} \\ - &\quad\vert~\valexpr{??} & \text{of expression $x$, see in text} \\ + &\quad\vert~\valexpr{} & \text{of expression $x$, see in text} \\ \spexpr &::= \regs \times \mathbb{Z} & \text{A ``simple'' expression $\reg{reg} + \textit{offset}$} \\ \end{align*} @@ -614,7 +608,7 @@ $f$. If we consider the fictive following fictive row $R_0$, \end{array}\right. \] -then, we would have +\noindent{}then, we would have \[ R \insarrow{\reg{rbx}} \left(\valaddr{\reg{rip - 24}}\right) @@ -701,7 +695,7 @@ if(ip >= $loc$) { } \end{lstlisting} \end{itemize} -while $\semR{\bullet}$ is defined as +\noindent{}\noindent{}while $\semR{\bullet}$ is defined as \begin{align*} \semR{\bot} &\eqspace{} \text{\lstc{ERROR_VALUE}} \\ @@ -711,6 +705,16 @@ while $\semR{\bullet}$ is defined as \text{\lstc{(old_ctx[reg] + offset)}} \\ \end{align*} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\subsection{Concerning correctness}\label{ssec:sem_correctness} + +The semantics described in this section are designed in a concern of +\emph{formalization} of the original standard. This standard, sadly, only +describes in plain English each instruction's action and result. This basis +cannot be used to \emph{prove} anything correct without relying on informal +interpretations. + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Stack unwinding data compilation} @@ -721,12 +725,12 @@ actual C implementation. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Code availability}\label{ssec:code_avail} -All the code produced during this internship is available on the various -repositories from \url{https://git.tobast.fr/m2-internship/}. The repositories -contain \texttt{README} files describing them; a summary and global description -can be found in the \texttt{abstract} repository. This should be detailed -enough to run the project. The source code is entirely under free software -licenses. +All the code produced during the course of this internship is available on the +various repositories from \url{https://git.tobast.fr/m2-internship/}. The +repositories contain \texttt{README} files describing them; a summary and +global description can be found in the \texttt{abstract} repository. This +should be detailed enough to run the project. The source code is entirely under +free software licenses. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Compilation: \ehelfs}\label{ssec:ehelfs} @@ -1292,6 +1296,41 @@ It is also worth noting that among all of the 4000 analyzed files, all the unsupported expressions are clustered in only 12 of them, and only 24 contained unsupported instructions at all. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\section*{Conclusion} + +From this data, we can deduce that + +\begin{itemize}[itemsep=3pt, parsep=0pt] + + \item compilation of the DWARF unwinding data is effective to speed up + drastically unwinding procedures: speedup of $\times 25.9$; + + \item code outlining is effective to reduce the produced binary size: from + $1\ \text{MiB}$ to $370\ \text{KiB}$, from a growth factor of $7$ + compared to DWARF unwinding data to a growth factor of $2.45$; + + \item unwinding relies on small subset of DWARF instructions and + expressions, while most instructions are not used at all in DWARF code + produced by compilers. + +\end{itemize} + +The overall size of the project is + +\begin{itemize}[itemsep=3pt, parsep=0pt] + \item compiler: 1628 lines, + \item \prog{libunwind}: 810 lines, + \item \prog{perf}: 222 lines +\end{itemize} + +\noindent{} for a total of 2660 lines of code on the main project. The various +statistics, benchmarking, testing and analyzing code modules add up to around +1500 more lines. + +\pagebreak{} + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% diff --git a/shared/specific.sty b/shared/specific.sty index e30690c..cfada16 100644 --- a/shared/specific.sty +++ b/shared/specific.sty @@ -24,7 +24,7 @@ \newcommand{\valaddr}[1]{\operatorname{Addr}\left(#1\right)} \newcommand{\valval}[1]{\operatorname{Val}\left(#1\right)} -\newcommand{\valexpr}[1]{\operatorname{Expr}\left(#1\right)} +\newcommand{\valexpr}{\operatorname{Expr}} \newcommand{\intermedlang}{\mathcal{I}}