Add instructions coverage statistics

This commit is contained in:
Théophile Bastian 2018-08-07 11:09:33 +02:00
parent 74b8142d34
commit bff2158059
3 changed files with 142 additions and 12 deletions

View file

@ -0,0 +1,25 @@
(54666427,
(1607, 67587841),
(1154, 13869),
{'UNDEFINED': 1698,
'SAME_VALUE': 0,
'OFFSET': 54666405,
'VAL_OFFSET': 0,
'REGISTER': 22,
'EXPRESSION': 12367,
'VAL_EXPRESSION': 0,
'ARCHITECTURAL': 0},
{'seen': 12922916, 'expr': 1502, 'offset': 12921414})
(30038269,
(1603, 42959683),
(1114, 5977),
{'UNDEFINED': 1698,
'SAME_VALUE': 0,
'OFFSET': 30038255,
'VAL_OFFSET': 0,
'REGISTER': 14,
'EXPRESSION': 4475,
'VAL_EXPRESSION': 0,
'ARCHITECTURAL': 0},
{'seen': 12922916, 'expr': 1502, 'offset': 12921414})

View file

@ -342,18 +342,14 @@ quite efficient, most of its optimization comes from fine-tuned code and good
caching mechanisms. While parsing DWARF, \prog{libunwind} is forced to parse
the relevant FDE from its start, until it finds the row it was seeking.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{General statistics}
\todo{}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{DWARF semantics}\label{sec:semantics}
We will now define semantics covering most of the operations used for
CFI\todo{To be defined elsewhere in the report} described in the DWARF
standard~\cite{dwarf5std}, with the exception of DWARF expressions. These are
We will now define semantics covering most of the operations used for FDEs
described in the DWARF standard~\cite{dwarf5std}, such as seen in
Listing~\ref{lst:ex1_dwraw}, with the exception of DWARF expressions. These are
not exhaustively treated because they are quite rich and would take a lot of
time and space to formalize, and in the meantime are only seldom used (see the
DWARF statistics regarding this).
@ -634,7 +630,7 @@ a reasonable space loss was to compile directly the \ehframe{} into native
machine code on the x86\_64 platform.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Compilation: \ehelfs}
\subsection{Compilation: \ehelfs}\label{ssec:ehelfs}
The rough idea of the compilation is to produce, out of the \ehframe{} section
of a binary, C code that resembles the code shown in the DWARF semantics from
@ -1006,7 +1002,116 @@ lot.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Instructions coverage}
\todo{}
In order to determine which proportion of real-world ELF instructions are
covered by our compiler and \ehelfs.
The method chosen was to randomly select 4000 ELFs among those present on a
basic ArchLinux system setup, in the directories \texttt{/bin}, \texttt{/lib},
\texttt{/usr/bin}, \texttt{/usr/lib} and their subdirectories, making sure
those files were ELF64 files, then gathering statistics on those files.
\begin{table}[h]
\centering
\begin{tabular}{r r r r r r r}
\toprule
\thead{} & \thead{Unsupported \\ register rule}
& \thead{Register \\ rules seen}
& \thead{\% \\ supp.}
& \thead{Unsupported \\ expression}
& \thead{Expressions \\ seen}
& \thead{\% \\ supp.}
\\
\midrule
\makecell{Only supp. \\ columns} &
1603 & 42959683 & 99.996\,\% &
1114 & 5977 & 81.4\,\%
\\
All columns &
1607 & 67587841 & 99.998\,\% &
1154 & 13869 & 91.7\,\%
\\
\bottomrule
\end{tabular}
\caption{Instructions coverage statistics}\label{table:instr_cov}
\end{table}
\begin{table}[h]
\centering
\begin{tabular}{r r r r r r}
\toprule
\thead{}
& \thead{\texttt{Undefined}}
& \thead{\texttt{Same\_value}}
& \thead{\texttt{Offset}}
& \thead{\texttt{Val\_offset}}
& \thead{\texttt{Register}}
\\
\midrule
\makecell{Only supp. \\ columns}
& 1698 (0.006\,\%)
& 0
& 30038255 (99.9\,\%)
& 0
& 14 (0\,\%)
\\
All columns
& 1698 (0.003\,\%)
& 0
& 54666405 (99.9\,\%)
& 0
& 22 (0\,\%)
\\
\bottomrule
\toprule
\thead{}
& \thead{\texttt{Expression}}
& \thead{\texttt{Val\_expression}}
& \thead{\texttt{Architectural}}
& & \thead{Total}
\\
\midrule
\makecell{Only supp. \\ columns}
& 4475 (0.015\,\%)
& 0
& 0
& & 30044442
\\
All columns
& 12367 (0.02\,\%)
& 0
& 0
& & 54680492
\\
\bottomrule
\end{tabular}
\caption{Instruction type statistics}\label{table:instr_types}
\end{table}
The Table~\ref{table:instr_cov} gives statistics about the proportion of
instructions encountered that were not supported by \ehelfs. The first row is
only concerned about the columns CFA, \reg{rip}, \reg{rsp}, \reg{rbp} and
\reg{rbx} (the supported registers --~see Section~\ref{ssec:ehelfs}). The
second row analyzes all the columns that were encountered, no matter whether
supported or not.
The Table~\ref{table:instr_types} analyzes the proportion of each command
(\ie\ the formal way a register is set) for non-CFA columns in the sampled
data. For a brief explanation, \texttt{Offset} means stored at offset from CFA,
\texttt{Register} means the value from a machine register, \texttt{Expression}
means stored at the value of an expression, and the \texttt{Val\_} prefix means
that the value must not be dereferenced. Overall, it can be seen that
supporting \texttt{Offset} already means supporting the vast majority of
registers. The data gathered (not reproduced here) also suggests that
supporting a few common expressions is enough to support most of them.
It is also worth noting that of all the 4000 analyzed files, there are only 12
that contained all the unsupported expressions seen, and only 24 that contained
some unsupported instruction at all.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -1020,8 +1125,8 @@ lot.
\hfill \begin{minipage}{0.7\textwidth}
\begin{flushright}
\itshape{} \small{}
Unless otherwise explicitly stated, any image or source code snippet
from the present document can be reused freely by anyone.
Unless otherwise explicitly stated, any image, source code snippet or
table from the present document can be reused freely by anyone.
\end{flushright}
\end{minipage}

View file

@ -33,7 +33,7 @@
}
@article{dinechin2000exn,
title={C++ exception handling},
title={C++ exception handling \qtodo{CHECK}},
author={De Dinechin, Christophe},
journal={IEEE Concurrency},
volume={8},