101 lines
5.3 KiB
TeX
101 lines
5.3 KiB
TeX
\chapter*{Conclusion}
|
|
\addcontentsline{toc}{chapter}{Conclusion}
|
|
|
|
During this manuscript, we explored the main bottlenecks that arise while
|
|
analyzing the low-level performance of a microkernel:
|
|
\begin{itemize}
|
|
\item frontend bottlenecks ---~the processor's frontend is unable to
|
|
saturate the backend with instructions (\autoref{chap:palmed});
|
|
\item backend bottlenecks ---~the backend is saturated with instructions
|
|
and processes them as fast as possible (\autoref{chap:frontend});
|
|
\item dependencies bottlenecks ---~data dependencies between instructions
|
|
prevent the backend from being saturated; the latter is stalled
|
|
awaiting previous results (\autoref{chap:staticdeps}).
|
|
\end{itemize}
|
|
We also conduced in \autoref{chap:CesASMe} a systematic comparative study of a
|
|
variety of state-of-the-art code analyzers.
|
|
|
|
\bigskip{}
|
|
|
|
State-of-the-art code analyzers such as \llvmmca{} or \uica{} already
|
|
boast a good accuracy. Both of these models ---~and most of the others also~---
|
|
are however based on models obtained by various degrees of manual
|
|
investigation, and are unable to scale without further manual effort to future
|
|
or uncharted microprocessors.
|
|
|
|
The field of microarchitectural models for code
|
|
analysis emerged with fundamentally manual methods, such as Agner Fog's tables.
|
|
Such tables, however, may now be produced in a more automated way using
|
|
\uopsinfo{} ---~at least for certain microarchitectures~---; \pmevo{} pushes
|
|
further in this direction by automatically computing a frontend model from
|
|
benchmarks ---~but still has trouble scaling to a full instruction set. In its
|
|
own way, \ithemal{}, a machine-learning based approach, could also be
|
|
considered automated ---~yet, it still requires a large training set for the
|
|
intended processor, which must be at least partially crafted manually.
|
|
This trend towards model automation seems only natural as new
|
|
microarchitectures keep appearing, while new ISAs such as ARM reach the
|
|
supercomputer area.
|
|
|
|
\medskip{}
|
|
|
|
We investigate this direction by exploring the three major bottlenecks
|
|
mentioned earlier in the perspective of providing fully-automated,
|
|
benchmarks-based models for each of them. Optimally, these models should be
|
|
generated by simply executing a program on a machine running on top of the
|
|
targeted microarchitecture.
|
|
|
|
\begin{itemize}
|
|
\item We contribute to \palmed{}, a framework able to extract a
|
|
port-mapping of a processor, serving as a backend model.
|
|
\item We manually extract a frontend model for the Cortex A72 processor. We
|
|
believe that the foundation of our methodology works on most
|
|
processors. The main characteristics of a frontend, apart from their
|
|
instructions' \uops{} decomposition and issue width, must however still
|
|
be investigated, and their relative importance evaluated.
|
|
\item We provide with \staticdeps{} a method to to extract data
|
|
dependencies between instructions. It is able to detect
|
|
\textit{loop-carried} dependencies (dependencies that span across
|
|
multiple loop iterations), as well as \textit{memory-carried}
|
|
dependencies (dependencies based on reading at a memory address written
|
|
by another instruction). While the former is widely implemented, the
|
|
latter is, to the best of our knowledge, an original contribution. We
|
|
bundle this method in a processor-independent tool, based on semantics
|
|
of the ISA provided by \valgrind{}, which supports a variety of ISAs.
|
|
\end{itemize}
|
|
|
|
\bigskip{}
|
|
|
|
We evaluated independently these three models, each of them providing
|
|
satisfactory results: \palmed{} is competitive with the state of the art, with
|
|
the advantage of being automatic; our frontend model significantly improves a
|
|
backend model's accuracy and our dependencies model significantly improves
|
|
\uica{}'s results, while being consistent with a dynamic dependencies analysis.
|
|
|
|
These models, however, should become really meaningful only when combined
|
|
together ---~or, even better, when each of them could be combined with any
|
|
other model of the other parts. To the best of our knowledge, however, no such
|
|
modular tool exists; nor is there any standardized approach to interact with
|
|
such models. The usual approach of the domain to try a new idea, instead, is to
|
|
create a full analyzer implementing this idea, such as we did with \palmed{}
|
|
for backend models, or such as \uica{}'s implementation.
|
|
|
|
In hindsight, we advocate for the emergence of such a modular code analyzer.
|
|
It would maybe not be as convenient or well-packaged as ``production-ready''
|
|
code analyzers, such as \llvmmca{} ---~which is packaged for Debian. It could,
|
|
however, greatly simplify the academic process of trying a new idea on any of
|
|
the three main models, by decorrelating them. It would also ease the
|
|
comparative evaluation of those ideas, while eliminating many of the discrepancies
|
|
between experimental setups that make an actual comparison difficult ---~the
|
|
reason that prompted us to make \cesasme{} in \autoref{chap:CesASMe}. Indeed,
|
|
with such a modular tool, it would be easy to run the same experiment, in the
|
|
same conditions, while only changing \eg{} the frontend model but keeping a
|
|
well-tried backend model.
|
|
|
|
\bigskip{}
|
|
|
|
We also identified multiple weaknesses in the current state of the art from our
|
|
comparative experiments with \cesasme{}.
|
|
|
|
\smallskip{}
|
|
|
|
|