30 lines
1.7 KiB
TeX
30 lines
1.7 KiB
TeX
|
\section*{Conclusion}
|
||
|
|
||
|
In this chapter, we studied data dependencies within assembly kernels; and more
|
||
|
specifically, data dependencies occurring through memory accesses, which we
|
||
|
call \emph{memory-carried dependencies}. \cesasme{}'s analysis showed in
|
||
|
\autoref{chap:CesASMe} that this kind of dependency was responsible for a
|
||
|
significant portion of state-of-the-art analyzers' prediction errors.
|
||
|
|
||
|
We introduce \staticdeps{}, a heuristic approach based on random values as
|
||
|
representatives of abstract values. This approach is able to find data
|
||
|
dependencies, including memory-carried ones, loop-carried or not, leveraging
|
||
|
semantics of the assembly code provided by \valgrind{}'s \vex. It is, however,
|
||
|
still unable to find aliasing addresses whose source of aliasing is outside of
|
||
|
the studied block's scope ---~and, as such, suffers from the \emph{lack of
|
||
|
context} pointed out in the previous chapter.
|
||
|
|
||
|
\medskip{}
|
||
|
|
||
|
Our evaluation of \staticdeps{} against a dynamic analysis baseline,
|
||
|
\depsim{}, shows that it only finds about 60\,\% of the existing dependencies.
|
||
|
We however enrich \uica{} with \staticdeps{}, and find that it performs on the
|
||
|
full \cesasme{}'s dataset as well as \uica{} alone on the pruned dataset of
|
||
|
\cesasme{}, removing memory-carried bottlenecks. From this, we conclude that
|
||
|
\staticdeps{} is very successful at finding the data dependencies through
|
||
|
memory that actually matter from a performance analysis perspective. We also
|
||
|
find that, despite being written in pure Python, \staticdeps{} is at least
|
||
|
30$\times$ faster than its C dynamic counterpart, \depsim; as such, we expect
|
||
|
a compiled and optimized implementation of \staticdeps{} to be two to three
|
||
|
orders of magnitude faster than \depsim{}.
|