Conclusion: mostly written, maybe lacks a conclusive paragraph
This commit is contained in:
parent
e2e1993d37
commit
ceea025b65
1 changed files with 56 additions and 13 deletions
|
@ -12,21 +12,21 @@ analyzing the low-level performance of a microkernel:
|
|||
prevent the backend from being saturated; the latter is stalled
|
||||
awaiting previous results (\autoref{chap:staticdeps}).
|
||||
\end{itemize}
|
||||
We also conduced in \autoref{chap:CesASMe} a systematic comparative study of a
|
||||
We also conducted in \autoref{chap:CesASMe} a systematic comparative study of a
|
||||
variety of state-of-the-art code analyzers.
|
||||
|
||||
\bigskip{}
|
||||
|
||||
State-of-the-art code analyzers such as \llvmmca{} or \uica{} already
|
||||
boast a good accuracy. Both of these models ---~and most of the others also~---
|
||||
boast a good accuracy. Both of these tools ---~and most of the others also~---
|
||||
are however based on models obtained by various degrees of manual
|
||||
investigation, and are unable to scale without further manual effort to future
|
||||
investigation, and cannot be adapted without further manual effort to future
|
||||
or uncharted microprocessors.
|
||||
|
||||
The field of microarchitectural models for code
|
||||
analysis emerged with fundamentally manual methods, such as Agner Fog's tables.
|
||||
Such tables, however, may now be produced in a more automated way using
|
||||
\uopsinfo{} ---~at least for certain microarchitectures~---; \pmevo{} pushes
|
||||
\uopsinfo{} ---~at least for certain microarchitectures; \pmevo{} pushes
|
||||
further in this direction by automatically computing a frontend model from
|
||||
benchmarks ---~but still has trouble scaling to a full instruction set. In its
|
||||
own way, \ithemal{}, a machine-learning based approach, could also be
|
||||
|
@ -38,28 +38,28 @@ supercomputer area.
|
|||
|
||||
\medskip{}
|
||||
|
||||
We investigate this direction by exploring the three major bottlenecks
|
||||
We investigated this direction by exploring the three major bottlenecks
|
||||
mentioned earlier in the perspective of providing fully-automated,
|
||||
benchmarks-based models for each of them. Optimally, these models should be
|
||||
generated by simply executing a program on a machine running on top of the
|
||||
targeted microarchitecture.
|
||||
|
||||
\begin{itemize}
|
||||
\item We contribute to \palmed{}, a framework able to extract a
|
||||
\item We contributed to \palmed{}, a framework able to extract a
|
||||
port-mapping of a processor, serving as a backend model.
|
||||
\item We manually extract a frontend model for the Cortex A72 processor. We
|
||||
believe that the foundation of our methodology works on most
|
||||
\item We manually extracted a frontend model for the Cortex A72 processor.
|
||||
We believe that the foundation of our methodology works on most
|
||||
processors. The main characteristics of a frontend, apart from their
|
||||
instructions' \uops{} decomposition and issue width, must however still
|
||||
be investigated, and their relative importance evaluated.
|
||||
\item We provide with \staticdeps{} a method to to extract data
|
||||
\item We provided with \staticdeps{} a method to to extract data
|
||||
dependencies between instructions. It is able to detect
|
||||
\textit{loop-carried} dependencies (dependencies that span across
|
||||
multiple loop iterations), as well as \textit{memory-carried}
|
||||
dependencies (dependencies based on reading at a memory address written
|
||||
by another instruction). While the former is widely implemented, the
|
||||
latter is, to the best of our knowledge, an original contribution. We
|
||||
bundle this method in a processor-independent tool, based on semantics
|
||||
bundled this method in a processor-independent tool, based on semantics
|
||||
of the ISA provided by \valgrind{}, which supports a variety of ISAs.
|
||||
\end{itemize}
|
||||
|
||||
|
@ -76,11 +76,12 @@ together ---~or, even better, when each of them could be combined with any
|
|||
other model of the other parts. To the best of our knowledge, however, no such
|
||||
modular tool exists; nor is there any standardized approach to interact with
|
||||
such models. The usual approach of the domain to try a new idea, instead, is to
|
||||
create a full analyzer implementing this idea, such as we did with \palmed{}
|
||||
for backend models, or such as \uica{}'s implementation.
|
||||
create a full analyzer implementing this idea, such as what we did with \palmed{}
|
||||
for backend models, or such as \uica{}'s implementation, focusing on frontend
|
||||
analysis.
|
||||
|
||||
In hindsight, we advocate for the emergence of such a modular code analyzer.
|
||||
It would maybe not be as convenient or well-packaged as ``production-ready''
|
||||
It would maybe not be as convenient or well-integrated as ``production-ready''
|
||||
code analyzers, such as \llvmmca{} ---~which is packaged for Debian. It could,
|
||||
however, greatly simplify the academic process of trying a new idea on any of
|
||||
the three main models, by decorrelating them. It would also ease the
|
||||
|
@ -98,4 +99,46 @@ comparative experiments with \cesasme{}.
|
|||
|
||||
\smallskip{}
|
||||
|
||||
First, none of the state-of-the-art tools have a good support for dependencies
|
||||
across memory. Such dependencies were present in about a third of \cesasme{}'s
|
||||
benchmark set. While we built this benchmark set aiming for representative
|
||||
data, there is no clear evidence that these dependencies are so strongly
|
||||
present in the codes analyzed in real usecases. We however believe that such
|
||||
cases regularly occur, and we also saw that the performance of code analyzers
|
||||
drop sharply in their presence.
|
||||
|
||||
\smallskip{}
|
||||
|
||||
We also found the bottleneck prediction offered by some code analyzers still
|
||||
uncertain. In our experiments, the tools disagreed more often than not on the
|
||||
presence or absence of a bottleneck, with no outstanding tool; we are thus
|
||||
unable to conclude on the relative performance of tools on this aspect. On the
|
||||
other hand, sensitivity analysis, as implemented \eg{} by \gus{}, seems a
|
||||
theoretically sound way to evaluate the presence or absence of a bottleneck in
|
||||
a microkernel; it is, however, prohibitively slow for many usecases. In this
|
||||
respect, a study of code analyzers' predictions against results from
|
||||
sensitivity analysis would certainly bring more conclusive results.
|
||||
|
||||
\smallskip{}
|
||||
|
||||
Finally, we observed on \bhive{}'s results the effects of a \emph{lack of
|
||||
context} for an analysis. \bhive{} measures a real execution, on real hardware,
|
||||
of a kernel; as such, it yields excellent accuracy in many cases, with a median
|
||||
error of about 8\%. Yet, it still lacks in accuracy in many other cases, with
|
||||
its third quartile (23\%) above \uica{} or \iaca{}'s median result (about
|
||||
18\%), and far-reaching outliers bringing its mean error on-par with \uica{}'s.
|
||||
Indeed, what precedes a loop nest and the real values present in registers
|
||||
impact the performance of the loop nest. The effects can be of fairly high
|
||||
level, such as pointer aliasing, leading to false positives or negatives in
|
||||
dependency detections. They can also be of a microarchitectural level, such as
|
||||
the observable performance loss of memory accesses ---~even with cache hits~---
|
||||
when memory reads cross a cache line boundary.
|
||||
|
||||
This lack of context incurs a significant loss of accuracy for
|
||||
static analyzers, as we saw in \autoref{ssec:bhive_errors} that the same
|
||||
instruction, depending on its registers' values, can be twice as slow even
|
||||
without aliasing, or 19 times slower upon aliasing. With \cesasme{}, we sketch
|
||||
the embryo of a solution, with a simple and fast pass of dynamic analysis
|
||||
through instrumentation, gathering data for a subsequent pass of static
|
||||
analysis. Such a method might help recreating the context needed for an
|
||||
accurate analysis.
|
||||
|
|
Loading…
Reference in a new issue