2023-09-25 18:45:35 +02:00
|
|
|
\section*{Conclusion and future works}
|
2023-09-25 17:00:07 +02:00
|
|
|
|
2023-09-25 18:45:35 +02:00
|
|
|
In this chapter, we have presented a fully-tooled approach that enables:
|
2023-09-25 17:00:07 +02:00
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
\item the generation of a wide variety of microbenchmarks, reflecting both the
|
|
|
|
expertise contained in an initial benchmark suite, and the diversity of
|
|
|
|
code transformations allowing to stress different aspects of a performance model
|
|
|
|
---~or even a measurement environment, \eg{} \bhive; and
|
|
|
|
\item the comparability of various measurements and
|
|
|
|
analyses applied to each of these microbenchmarks.
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
Thanks to this tooling, we were able to show the limits and strengths of
|
|
|
|
various performance models in relation to the expertise contained in the
|
|
|
|
Polybench suite. We discuss throughput results in
|
|
|
|
Section~\ref{ssec:overall_results} and bottleneck prediction in
|
|
|
|
Section~\ref{ssec:bottleneck_pred_analysis}.
|
|
|
|
|
|
|
|
We were also able to demonstrate the difficulties of reasoning at the level of
|
|
|
|
a basic block isolated from its context. We specifically study those
|
|
|
|
difficulties in the case of \bhive{} in Section~\ref{ssec:bhive_errors}.
|
|
|
|
Indeed, the actual values ---~both from registers and memory~--- involved in a
|
|
|
|
basic block's computation are constitutive not only of its functional
|
|
|
|
properties (\ie{} the result of the calculation), but also of some of its
|
|
|
|
non-functional properties (\eg{} latency, throughput).
|
|
|
|
|
|
|
|
We were also able to show in Section~\ref{ssec:memlatbound}
|
|
|
|
that state-of-the-art static analyzers struggle to
|
|
|
|
account for memory-carried dependencies; a weakness significantly impacting
|
|
|
|
their overall results on our benchmarks. We believe that detecting
|
|
|
|
and accounting for these dependencies is an important future works direction.
|
|
|
|
|
|
|
|
Moreover, we present this work in the form of a modular software package, each
|
|
|
|
component of which exposes numerous adjustable parameters. These components can
|
|
|
|
also be replaced by others fulfilling the same abstract function: another
|
|
|
|
initial benchmark suite in place of Polybench, other loop nest
|
|
|
|
optimizers in place of PLUTO and PoCC, other code
|
|
|
|
analyzers, and so on. This software modularity reflects the fact that our
|
|
|
|
contribution is about interfacing and communication between distinct issues.
|
|
|
|
|
|
|
|
\medskip
|
|
|
|
|
|
|
|
Furthermore, we believe that the contributions we made in the course of this work
|
|
|
|
may eventually be used to face different, yet neighbouring issues.
|
|
|
|
These perspectives can also be seen as future works:
|
|
|
|
|
|
|
|
\smallskip
|
|
|
|
|
2023-09-26 11:39:26 +02:00
|
|
|
\paragraph{Program optimization.} The whole program processing we have designed
|
2023-09-25 17:00:07 +02:00
|
|
|
can be used not only to evaluate the performance model underlying a static
|
|
|
|
analyzer, but also to guide program optimization itself. In such a perspective,
|
|
|
|
we would generate different versions of the same program using the
|
|
|
|
transformations discussed in Section~\ref{sec:bench_gen} and colored blue in
|
|
|
|
Figure~\ref{fig:contrib}. These different versions would then feed the
|
|
|
|
execution and measurement environment outlined in
|
|
|
|
Section~\ref{sec:bench_harness} and colored orange in Figure~\ref{fig:contrib}.
|
|
|
|
Indeed, thanks to our previous work, we know that the results of these
|
|
|
|
comparable analyses and measurements would make it possible to identify which
|
|
|
|
version is the most efficient, and even to reconstruct information indicating
|
|
|
|
why (which bottlenecks, etc.).
|
|
|
|
|
|
|
|
However, this approach would require that these different versions of the same
|
|
|
|
program are functionally equivalent, \ie{} that they compute the same
|
|
|
|
result from the same inputs; yet we saw in Section~\ref{sec:bench_harness}
|
|
|
|
that, as it stands, the transformations we apply are not concerned with
|
|
|
|
preserving the semantics of the input codes. To recover this semantic
|
|
|
|
preservation property, abandoning the kernelification pass we have presented
|
|
|
|
suffices; this however would require to control L1-residence otherwise.
|
|
|
|
|
|
|
|
\smallskip
|
|
|
|
|
2023-09-26 11:39:26 +02:00
|
|
|
\paragraph{Dataset building.} Our microbenchmarks generation phase outputs a
|
2023-09-25 17:00:07 +02:00
|
|
|
large, diverse and representative dataset of microkernels. In addition to our
|
|
|
|
harness, we believe that such a dataset could be used to improve existing
|
|
|
|
data-dependant solutions.
|
|
|
|
|
|
|
|
%the measurement and execution environment we
|
|
|
|
%propose is not the only type of tool whose function is to process a large
|
|
|
|
%dataset (\ie{} the microbenchmarks generated earlier) to automatically
|
|
|
|
%abstract its characteristics. We can also think of:
|
|
|
|
|
|
|
|
Inductive methods, for instance in \anica, strive to preserve the properties of a basic
|
|
|
|
block through successive abstractions of the instructions it contains, so as to
|
|
|
|
draw the most general conclusions possible from a particular experiment.
|
|
|
|
Currently, \anica{} starts off from randomly generated basic blocks. This
|
|
|
|
approach guarantees a certain variety, and avoids
|
|
|
|
over-specialization, which would prevent it from finding interesting cases too
|
|
|
|
far from an initial dataset. However, it may well lead to the sample under
|
|
|
|
consideration being systematically outside the relevant area of the search
|
|
|
|
space ---~\ie{} having no relation to real-life programs or those in the user's
|
|
|
|
field.
|
|
|
|
|
|
|
|
On the other hand, machine learning methods based on neural networks, for
|
|
|
|
instance in \ithemal, seek to correlate the result of a function with the
|
|
|
|
characteristics of its input ---~in this case to correlate a throughput
|
|
|
|
prediction with the instructions making up a basic block~--- by backpropagating
|
|
|
|
the gradient of a cost function. In the case of \ithemal{}, it is trained on
|
|
|
|
benchmarks originating from a data suite. As opposed to random generation,
|
|
|
|
this approach offers representative samples, but comes with a risk of lack of
|
|
|
|
variety and over-specialization.
|
|
|
|
|
|
|
|
Comparatively, our microbenchmark generation method is natively meant to
|
|
|
|
produce a representative, varied and large dataset. We believe that
|
|
|
|
enriching the dataset of the above-mentioned methods with our benchmarks might
|
|
|
|
extend their results and reach.
|