Write conclusion, move correctness §
This commit is contained in:
parent
c651fb151e
commit
df7252238e
3 changed files with 69 additions and 30 deletions
|
@ -73,8 +73,8 @@ of compiled DWARF into existing projects have been made easy by implementing an
|
||||||
alternative version of the \textit{de facto} standard library for this purpose,
|
alternative version of the \textit{de facto} standard library for this purpose,
|
||||||
\prog{libunwind}.
|
\prog{libunwind}.
|
||||||
|
|
||||||
Multiple approaches have been tried and evaluated to determine which
|
We explored and evaluated multiple approaches to determine which compilation
|
||||||
compilation process leads to the best time/space trade-off.
|
process leads to the best time/space trade-off.
|
||||||
|
|
||||||
Unexpectedly, the part that proved hardest of the project was finding and
|
Unexpectedly, the part that proved hardest of the project was finding and
|
||||||
implementing a benchmarking protocol that was both relevant and reliable.
|
implementing a benchmarking protocol that was both relevant and reliable.
|
||||||
|
@ -128,9 +128,9 @@ the reference implementation. Indeed, corner cases occur often, and on a 27000
|
||||||
samples test, 885 failures were observed for \prog{libunwind}, against 1099 for
|
samples test, 885 failures were observed for \prog{libunwind}, against 1099 for
|
||||||
the compiled DWARF version (see Section~\ref{ssec:timeperf}).
|
the compiled DWARF version (see Section~\ref{ssec:timeperf}).
|
||||||
|
|
||||||
The implementation, however, is not production-ready: it only supports the
|
The implementation, however, is not yet production-ready: it only supports the
|
||||||
x86\_64 architecture, and relies to some extent on the Linux operating system.
|
x86\_64 architecture, and relies to some extent on the Linux operating system.
|
||||||
None of those are real problems in practice. Supporting other processor
|
None of these pose a fundamental problem. Supporting other processor
|
||||||
architectures and ABIs are only a matter of engineering,. The operating system
|
architectures and ABIs are only a matter of engineering,. The operating system
|
||||||
dependency is only present in the libraries developed in order to interact with
|
dependency is only present in the libraries developed in order to interact with
|
||||||
the compiled unwinding data, which can be developed for virtually any operating
|
the compiled unwinding data, which can be developed for virtually any operating
|
||||||
|
|
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
\author{Th\'eophile Bastian\\
|
\author{Th\'eophile Bastian\\
|
||||||
Under supervision of Francesco Zappa Nardelli, March -- August 2018\\
|
Under supervision of Francesco Zappa Nardelli, March -- August 2018\\
|
||||||
{\textsc{parkas}, \'Ecole Normale Sup\'erieure de Paris}}
|
{\textsc{parkas}, \textsc{inria}}}
|
||||||
|
|
||||||
%\date{March -- August 2018\\August 20, 2018}
|
%\date{March -- August 2018\\August 20, 2018}
|
||||||
\date{\vspace{-2em}}
|
\date{\vspace{-2em}}
|
||||||
|
@ -397,27 +397,21 @@ parse the relevant FDE from its start, until it finds the row it was seeking.
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\section{DWARF semantics}\label{sec:semantics}
|
\section{DWARF semantics}\label{sec:semantics}
|
||||||
|
|
||||||
We now define semantics covering the operations used for FDEs described in the
|
The DWARF 5 standard~\cite{dwarf5std} is written in English prose, and our
|
||||||
DWARF standard~\cite{dwarf5std}, such as seen in Listing~\ref{lst:ex1_dwraw},
|
first task is to formalize it. Thus, in this section, we first recall the
|
||||||
with the exception of DWARF expressions. These are not treated here, because
|
informal behaviour of DWARF instructions as provided by the standard; and then
|
||||||
they form a rich language and would take a lot of time and space to formalize,
|
we formalize their semantics by mapping them to well-defined C code. We omit
|
||||||
while in the mean time being seldom used --~see Section~\ref{ssec:instr_cov}.
|
the translation of DWARF expressions, because they form a rich language and
|
||||||
|
would take a lot of time and space to formalize, while in the mean time being
|
||||||
|
seldom used --~see Section~\ref{ssec:instr_cov}.
|
||||||
|
|
||||||
These semantics are defined \wrt{} the well-formalized C language, and
|
These semantics are defined \wrt{} the well-formalized C language, and
|
||||||
are passing through an intermediary language. The DWARF language can read the
|
are passing through an intermediate language. The DWARF language can read the
|
||||||
whole memory, as well as registers, and is always executed for some instruction
|
whole memory, as well as registers, and is always executed for some instruction
|
||||||
pointer. The C function representing it thus takes as parameters an array
|
pointer. The C function representing it thus takes as parameters an array
|
||||||
of the registers' values as well as an IP, and returns another array of
|
of the registers' values as well as an IP, and returns another array of
|
||||||
registers values, which represents the evaluated DWARF row.
|
registers values, which represents the evaluated DWARF row.
|
||||||
|
|
||||||
\subsection{Concerning correctness}\label{ssec:sem_correctness}
|
|
||||||
|
|
||||||
The semantics described in this section are designed in a concern of
|
|
||||||
\emph{formalization} of the original standard. This standard, sadly, only
|
|
||||||
describes in plain English each instruction's action and result. This basis
|
|
||||||
cannot be used to \emph{prove} anything correct without relying on informal
|
|
||||||
interpretations.
|
|
||||||
|
|
||||||
\subsection{Original language: DWARF instructions}
|
\subsection{Original language: DWARF instructions}
|
||||||
|
|
||||||
These are the DWARF instructions used for CFI description, that is, the
|
These are the DWARF instructions used for CFI description, that is, the
|
||||||
|
@ -486,7 +480,7 @@ a language.
|
||||||
|
|
||||||
\subsection{Intermediary language $\intermedlang$}
|
\subsection{Intermediary language $\intermedlang$}
|
||||||
|
|
||||||
A first pass translates DWARF instructions into this intermediary language
|
A first pass translates DWARF instructions into this intermediate language
|
||||||
$\intermedlang$. It is designed to be more mathematical, representing the same
|
$\intermedlang$. It is designed to be more mathematical, representing the same
|
||||||
thing, but abstracting all the data compression of the DWARF format away, so
|
thing, but abstracting all the data compression of the DWARF format away, so
|
||||||
that we can better reason on it and transform it into C code.
|
that we can better reason on it and transform it into C code.
|
||||||
|
@ -503,7 +497,7 @@ Its grammar is as follows:
|
||||||
\values &::= \bot & \text{Values: undefined,}\\
|
\values &::= \bot & \text{Values: undefined,}\\
|
||||||
&\quad\vert~\valaddr{\spexpr} & \text{at address $x$},\\
|
&\quad\vert~\valaddr{\spexpr} & \text{at address $x$},\\
|
||||||
&\quad\vert~\valval{\spexpr} & \text{of value $x$} \\
|
&\quad\vert~\valval{\spexpr} & \text{of value $x$} \\
|
||||||
&\quad\vert~\valexpr{??} & \text{of expression $x$, see in text} \\
|
&\quad\vert~\valexpr{} & \text{of expression $x$, see in text} \\
|
||||||
\spexpr &::= \regs \times \mathbb{Z}
|
\spexpr &::= \regs \times \mathbb{Z}
|
||||||
& \text{A ``simple'' expression $\reg{reg} + \textit{offset}$} \\
|
& \text{A ``simple'' expression $\reg{reg} + \textit{offset}$} \\
|
||||||
\end{align*}
|
\end{align*}
|
||||||
|
@ -614,7 +608,7 @@ $f$. If we consider the fictive following fictive row $R_0$,
|
||||||
\end{array}\right.
|
\end{array}\right.
|
||||||
\]
|
\]
|
||||||
|
|
||||||
then, we would have
|
\noindent{}then, we would have
|
||||||
|
|
||||||
\[
|
\[
|
||||||
R \insarrow{\reg{rbx}} \left(\valaddr{\reg{rip - 24}}\right)
|
R \insarrow{\reg{rbx}} \left(\valaddr{\reg{rip - 24}}\right)
|
||||||
|
@ -701,7 +695,7 @@ if(ip >= $loc$) {
|
||||||
} \end{lstlisting}
|
} \end{lstlisting}
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
while $\semR{\bullet}$ is defined as
|
\noindent{}\noindent{}while $\semR{\bullet}$ is defined as
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
\semR{\bot} &\eqspace{}
|
\semR{\bot} &\eqspace{}
|
||||||
\text{\lstc{ERROR_VALUE}} \\
|
\text{\lstc{ERROR_VALUE}} \\
|
||||||
|
@ -711,6 +705,16 @@ while $\semR{\bullet}$ is defined as
|
||||||
\text{\lstc{(old_ctx[reg] + offset)}} \\
|
\text{\lstc{(old_ctx[reg] + offset)}} \\
|
||||||
\end{align*}
|
\end{align*}
|
||||||
|
|
||||||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
|
\subsection{Concerning correctness}\label{ssec:sem_correctness}
|
||||||
|
|
||||||
|
The semantics described in this section are designed in a concern of
|
||||||
|
\emph{formalization} of the original standard. This standard, sadly, only
|
||||||
|
describes in plain English each instruction's action and result. This basis
|
||||||
|
cannot be used to \emph{prove} anything correct without relying on informal
|
||||||
|
interpretations.
|
||||||
|
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\section{Stack unwinding data compilation}
|
\section{Stack unwinding data compilation}
|
||||||
|
@ -721,12 +725,12 @@ actual C implementation.
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\subsection{Code availability}\label{ssec:code_avail}
|
\subsection{Code availability}\label{ssec:code_avail}
|
||||||
|
|
||||||
All the code produced during this internship is available on the various
|
All the code produced during the course of this internship is available on the
|
||||||
repositories from \url{https://git.tobast.fr/m2-internship/}. The repositories
|
various repositories from \url{https://git.tobast.fr/m2-internship/}. The
|
||||||
contain \texttt{README} files describing them; a summary and global description
|
repositories contain \texttt{README} files describing them; a summary and
|
||||||
can be found in the \texttt{abstract} repository. This should be detailed
|
global description can be found in the \texttt{abstract} repository. This
|
||||||
enough to run the project. The source code is entirely under free software
|
should be detailed enough to run the project. The source code is entirely under
|
||||||
licenses.
|
free software licenses.
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
\subsection{Compilation: \ehelfs}\label{ssec:ehelfs}
|
\subsection{Compilation: \ehelfs}\label{ssec:ehelfs}
|
||||||
|
@ -1292,6 +1296,41 @@ It is also worth noting that among all of the 4000 analyzed files, all the
|
||||||
unsupported expressions are clustered in only 12 of them, and only 24 contained
|
unsupported expressions are clustered in only 12 of them, and only 24 contained
|
||||||
unsupported instructions at all.
|
unsupported instructions at all.
|
||||||
|
|
||||||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
|
\section*{Conclusion}
|
||||||
|
|
||||||
|
From this data, we can deduce that
|
||||||
|
|
||||||
|
\begin{itemize}[itemsep=3pt, parsep=0pt]
|
||||||
|
|
||||||
|
\item compilation of the DWARF unwinding data is effective to speed up
|
||||||
|
drastically unwinding procedures: speedup of $\times 25.9$;
|
||||||
|
|
||||||
|
\item code outlining is effective to reduce the produced binary size: from
|
||||||
|
$1\ \text{MiB}$ to $370\ \text{KiB}$, from a growth factor of $7$
|
||||||
|
compared to DWARF unwinding data to a growth factor of $2.45$;
|
||||||
|
|
||||||
|
\item unwinding relies on small subset of DWARF instructions and
|
||||||
|
expressions, while most instructions are not used at all in DWARF code
|
||||||
|
produced by compilers.
|
||||||
|
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
The overall size of the project is
|
||||||
|
|
||||||
|
\begin{itemize}[itemsep=3pt, parsep=0pt]
|
||||||
|
\item compiler: 1628 lines,
|
||||||
|
\item \prog{libunwind}: 810 lines,
|
||||||
|
\item \prog{perf}: 222 lines
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
\noindent{} for a total of 2660 lines of code on the main project. The various
|
||||||
|
statistics, benchmarking, testing and analyzing code modules add up to around
|
||||||
|
1500 more lines.
|
||||||
|
|
||||||
|
\pagebreak{}
|
||||||
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
%%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||||
|
|
|
@ -24,7 +24,7 @@
|
||||||
|
|
||||||
\newcommand{\valaddr}[1]{\operatorname{Addr}\left(#1\right)}
|
\newcommand{\valaddr}[1]{\operatorname{Addr}\left(#1\right)}
|
||||||
\newcommand{\valval}[1]{\operatorname{Val}\left(#1\right)}
|
\newcommand{\valval}[1]{\operatorname{Val}\left(#1\right)}
|
||||||
\newcommand{\valexpr}[1]{\operatorname{Expr}\left(#1\right)}
|
\newcommand{\valexpr}{\operatorname{Expr}}
|
||||||
|
|
||||||
\newcommand{\intermedlang}{\mathcal{I}}
|
\newcommand{\intermedlang}{\mathcal{I}}
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue