diff --git a/report/fiche_synthese.tex b/report/fiche_synthese.tex index ac5b144..10c5dfa 100644 --- a/report/fiche_synthese.tex +++ b/report/fiche_synthese.tex @@ -32,7 +32,7 @@ computation~\cite{oakley2011exploiting}. \subsection*{The research problem} -As debugging data can easily get heavy beyond reasonable if stored carelessly, +As debugging data can easily take an unreasonable space if stored carelessly, the DWARF standard pays a great attention to data compactness and compression, and succeeds particularly well at it. But this, as always, is at the expense of efficiency: accessing stack unwinding data for a particular program point @@ -118,7 +118,7 @@ below. Yet, it supports the vast majority --~around $99.9$\ \%~-- of the cases seen in the wild, and is decently robust compared to \prog{libunwind}, the reference implementation. Indeed, corner cases occur often, and on a 27000 samples test, 885 failures were observed for \prog{libunwind}, against 1099 for -the compiled DWARF version. +the compiled DWARF version (see Section~\ref{ssec:timeperf}). The implementation, however, as a few other limitations. It only supports the x86\_64 architecture, and relies to some extent on the Linux operating system. @@ -132,7 +132,7 @@ virtually any operating system. In most cases of everyday's life, a slow stack unwinding is not a problem, or even an annoyance. Yet, having a 26 times speed-up on stack unwinding-heavy -tasks, such as profiling, can be really useful to profile heavy programs, +tasks, such as profiling, can be really useful to profile large programs, particularly if one wants to profile many times in order to analyze the impact of multiple changes. It can also be useful for exception-heavy programs. Thus, it might be interesting to implement a more stable version, and try to diff --git a/report/report.tex b/report/report.tex index ef5c672..7d18773 100644 --- a/report/report.tex +++ b/report/report.tex @@ -179,9 +179,9 @@ traced program at regular, short intervals, inspect their stack, and determine which function is currently being run. They also often perform a stack unwinding to determine the call path to this function, to determine which function indirectly takes time: \eg, a function \lstc{fct_a} can call both -\lstc{fct_b} and \lstc{fct_c}, which are quite heavy; spend practically no time -directly in \lstc{fct_a}, but spend a lot of time in calls to the other two -functions that were made from \lstc{fct_a}. +\lstc{fct_b} and \lstc{fct_c}, which take a lot of time; spend practically no +time directly in \lstc{fct_a}, but spend a lot of time in calls to the other +two functions that were made from \lstc{fct_a}. Exception handling also requires a stack unwinding mechanism in most languages. Indeed, an exception is completely different from a \lstinline{return}: while the @@ -375,13 +375,6 @@ distribution of FDE rows count. The histogram in Figure~\ref{fig:fde_line_density} was generated on a random sample of around 2000 ELF files present on an ArchLinux system. -Most of the FDEs seem to be quite small, which only reflects that most -functions found in the wild are relatively small and do not particularly -allocate many times on the stack. Yet, the median value is at $8$ rows per FDE, -and the average is at $9.7$, which is already not that fast to unwind. Values -up to $50$ are not that uncommon, given some commonly used functions have such -large FDEs, and often end up in the call stack. - %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Unwinding state-of-the-art} @@ -531,7 +524,7 @@ second frame will require loading the corresponding DWARF information. The function is the following: -\lstinputlisting[language=C]{src/dw_semantics/c_context.c} +\lstinputlisting[language=C]{src/dw_semantics/c_context.c}\label{lst:sem_c_ctx} The translation of $\intermedlang$ as produced by the later-defined function are then to be inserted in this context, where the comment states so. @@ -743,7 +736,10 @@ In the unwind context from Listing~\ref{lst:unw_ctx}, the values of type \lstc{flags} is a 8-bits value, indicating for each register whether it is present or not in this context, plus an error bit, indicating whether an error occurred during unwinding. Such errors can be due \eg{} to an unsupported -operation in the original DWARF\@. +operation in the original DWARF\@. This context differs from the one presented +in Section~\ref{lst:sem_c_ctx}, since the previous one was only an array of +values, and the one from the real implementation is more robust, in particular +by including an error flag by lack of $\bot$ value. This generated data is stored in separate shared object files, which we call \ehelfs. It would have been possible to alter the original ELF file to embed @@ -997,7 +993,7 @@ computer has 32\,GB of RAM, and care was taken never to fill it and start swapping. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsection{Measured time performance} +\subsection{Measured time performance}\label{ssec:timeperf} A benchmarking of \ehelfs{} against the vanilla \prog{libunwind} was made using the exact same methodology as in Section~\ref{ssec:bench_perf}, only linking @@ -1057,6 +1053,14 @@ without using multiple cores to compile, the various shared objects needed to run \prog{hackbench} --~that is, \prog{hackbench}, \prog{libc}, \prog{ld} and \prog{libpthread}~-- are compiled in an overall time of $25.28$ seconds. +The unwinding errors observed are hard to investigate, but are most probably +due to truncated stack records. Indeed, since \prog{perf} dumps the last $n$ +bytes of the call stack (for a given $n$), and only keeps those for later +unwinding, large stacks leads to lost information when analyzing the results. +The difference between \ehelfs{} and the vanilla library could be due either to +unsupported DWARF instructions or registers, \prog{libdwarfpp} bugs or bugs in +the custom \prog{libunwind} implementation that were not spotted. + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Measured compactness}\label{ssec:results_size} @@ -1218,7 +1222,6 @@ It is also worth noting that of all the 4000 analyzed files, there are only 12 that contained all the unsupported expressions seen, and only 24 that contained some unsupported instruction at all. - %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%% End main text content %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%