diff --git a/report/data/instruction_coverage b/report/data/instruction_coverage new file mode 100644 index 0000000..1e62219 --- /dev/null +++ b/report/data/instruction_coverage @@ -0,0 +1,25 @@ +(54666427, + (1607, 67587841), + (1154, 13869), + {'UNDEFINED': 1698, + 'SAME_VALUE': 0, + 'OFFSET': 54666405, + 'VAL_OFFSET': 0, + 'REGISTER': 22, + 'EXPRESSION': 12367, + 'VAL_EXPRESSION': 0, + 'ARCHITECTURAL': 0}, + {'seen': 12922916, 'expr': 1502, 'offset': 12921414}) + +(30038269, + (1603, 42959683), + (1114, 5977), + {'UNDEFINED': 1698, + 'SAME_VALUE': 0, + 'OFFSET': 30038255, + 'VAL_OFFSET': 0, + 'REGISTER': 14, + 'EXPRESSION': 4475, + 'VAL_EXPRESSION': 0, + 'ARCHITECTURAL': 0}, + {'seen': 12922916, 'expr': 1502, 'offset': 12921414}) diff --git a/report/report.tex b/report/report.tex index f996079..894ba6e 100644 --- a/report/report.tex +++ b/report/report.tex @@ -342,18 +342,14 @@ quite efficient, most of its optimization comes from fine-tuned code and good caching mechanisms. While parsing DWARF, \prog{libunwind} is forced to parse the relevant FDE from its start, until it finds the row it was seeking. -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsection{General statistics} -\todo{} - %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{DWARF semantics}\label{sec:semantics} -We will now define semantics covering most of the operations used for -CFI\todo{To be defined elsewhere in the report} described in the DWARF -standard~\cite{dwarf5std}, with the exception of DWARF expressions. These are +We will now define semantics covering most of the operations used for FDEs +described in the DWARF standard~\cite{dwarf5std}, such as seen in +Listing~\ref{lst:ex1_dwraw}, with the exception of DWARF expressions. These are not exhaustively treated because they are quite rich and would take a lot of time and space to formalize, and in the meantime are only seldom used (see the DWARF statistics regarding this). @@ -634,7 +630,7 @@ a reasonable space loss was to compile directly the \ehframe{} into native machine code on the x86\_64 platform. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsection{Compilation: \ehelfs} +\subsection{Compilation: \ehelfs}\label{ssec:ehelfs} The rough idea of the compilation is to produce, out of the \ehframe{} section of a binary, C code that resembles the code shown in the DWARF semantics from @@ -1006,7 +1002,116 @@ lot. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Instructions coverage} -\todo{} + +In order to determine which proportion of real-world ELF instructions are +covered by our compiler and \ehelfs. + +The method chosen was to randomly select 4000 ELFs among those present on a +basic ArchLinux system setup, in the directories \texttt{/bin}, \texttt{/lib}, +\texttt{/usr/bin}, \texttt{/usr/lib} and their subdirectories, making sure +those files were ELF64 files, then gathering statistics on those files. + +\begin{table}[h] + \centering + \begin{tabular}{r r r r r r r} + \toprule + \thead{} & \thead{Unsupported \\ register rule} + & \thead{Register \\ rules seen} + & \thead{\% \\ supp.} + & \thead{Unsupported \\ expression} + & \thead{Expressions \\ seen} + & \thead{\% \\ supp.} + \\ + \midrule + \makecell{Only supp. \\ columns} & + 1603 & 42959683 & 99.996\,\% & + 1114 & 5977 & 81.4\,\% + \\ + All columns & + 1607 & 67587841 & 99.998\,\% & + 1154 & 13869 & 91.7\,\% + \\ + \bottomrule + \end{tabular} + + \caption{Instructions coverage statistics}\label{table:instr_cov} +\end{table} + +\begin{table}[h] + \centering + \begin{tabular}{r r r r r r} + \toprule + \thead{} + & \thead{\texttt{Undefined}} + & \thead{\texttt{Same\_value}} + & \thead{\texttt{Offset}} + & \thead{\texttt{Val\_offset}} + & \thead{\texttt{Register}} + \\ + \midrule + \makecell{Only supp. \\ columns} + & 1698 (0.006\,\%) + & 0 + & 30038255 (99.9\,\%) + & 0 + & 14 (0\,\%) + \\ + All columns + & 1698 (0.003\,\%) + & 0 + & 54666405 (99.9\,\%) + & 0 + & 22 (0\,\%) + \\ + \bottomrule + \toprule + \thead{} + & \thead{\texttt{Expression}} + & \thead{\texttt{Val\_expression}} + & \thead{\texttt{Architectural}} + & & \thead{Total} + \\ + \midrule + \makecell{Only supp. \\ columns} + & 4475 (0.015\,\%) + & 0 + & 0 + & & 30044442 + \\ + All columns + & 12367 (0.02\,\%) + & 0 + & 0 + & & 54680492 + \\ + + \bottomrule + \end{tabular} + + \caption{Instruction type statistics}\label{table:instr_types} +\end{table} + +The Table~\ref{table:instr_cov} gives statistics about the proportion of +instructions encountered that were not supported by \ehelfs. The first row is +only concerned about the columns CFA, \reg{rip}, \reg{rsp}, \reg{rbp} and +\reg{rbx} (the supported registers --~see Section~\ref{ssec:ehelfs}). The +second row analyzes all the columns that were encountered, no matter whether +supported or not. + +The Table~\ref{table:instr_types} analyzes the proportion of each command +(\ie\ the formal way a register is set) for non-CFA columns in the sampled +data. For a brief explanation, \texttt{Offset} means stored at offset from CFA, +\texttt{Register} means the value from a machine register, \texttt{Expression} +means stored at the value of an expression, and the \texttt{Val\_} prefix means +that the value must not be dereferenced. Overall, it can be seen that +supporting \texttt{Offset} already means supporting the vast majority of +registers. The data gathered (not reproduced here) also suggests that +supporting a few common expressions is enough to support most of them. + +It is also worth noting that of all the 4000 analyzed files, there are only 12 +that contained all the unsupported expressions seen, and only 24 that contained +some unsupported instruction at all. + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -1020,8 +1125,8 @@ lot. \hfill \begin{minipage}{0.7\textwidth} \begin{flushright} \itshape{} \small{} - Unless otherwise explicitly stated, any image or source code snippet - from the present document can be reused freely by anyone. + Unless otherwise explicitly stated, any image, source code snippet or + table from the present document can be reused freely by anyone. \end{flushright} \end{minipage} diff --git a/shared/report.bib b/shared/report.bib index 59e0431..e0ffabd 100644 --- a/shared/report.bib +++ b/shared/report.bib @@ -33,7 +33,7 @@ } @article{dinechin2000exn, - title={C++ exception handling}, + title={C++ exception handling \qtodo{CHECK}}, author={De Dinechin, Christophe}, journal={IEEE Concurrency}, volume={8},