73 lines
3.8 KiB
TeX
73 lines
3.8 KiB
TeX
|
\section{Future works: benchmark-based automatic frontend model generation}
|
||
|
|
||
|
While this chapter was solely centered on the Cortex A72, we believe that this
|
||
|
study paves the way for an automated frontend model synthesis akin to
|
||
|
\palmed{}. This synthesis should be fully-automated; stem solely from
|
||
|
benchmarking data and a description of the ISA; and should avoid the use of any
|
||
|
specific hardware counter.
|
||
|
|
||
|
As a scaffold for such a future work, we propose the parametric model in
|
||
|
\autoref{fig:parametric_model}. Some of its parameters are should be possible
|
||
|
to obtain with the methods used in this chapter, while for some others, new
|
||
|
methods must be devised.
|
||
|
|
||
|
\begin{figure}
|
||
|
\begin{subfigure}{\textwidth}
|
||
|
\centering
|
||
|
\includegraphics[width=0.9\textwidth]{parametric_model_sketch-frontend}
|
||
|
\caption{Frontend model}\label{fig:parametric_model:front}
|
||
|
\end{subfigure}
|
||
|
\begin{subfigure}{\textwidth}
|
||
|
\centering
|
||
|
\includegraphics[width=0.9\textwidth]{parametric_model_sketch-insn}
|
||
|
\caption{Instruction model}\label{fig:parametric_model:insn}
|
||
|
\end{subfigure}
|
||
|
\caption{A generic parametric model of a processor's frontend. In red, the
|
||
|
parameters which must be discovered for each
|
||
|
architecture.}\label{fig:parametric_model}
|
||
|
\end{figure}
|
||
|
|
||
|
\bigskip{}
|
||
|
|
||
|
A part of this model does not appear \emph{at all} in the present chapter, as
|
||
|
it is absent from the Cortex A72: the complexity of decoding. As AArch64
|
||
|
instructions are of fixed bit size, each instruction is equally difficult to
|
||
|
decode, and no ``complex'' decoder is needed ---~as is the case with \eg{}
|
||
|
x86-64. It seems, however, to be crucial to accurate modeling on
|
||
|
other architectures~\cite{uica}. For much the same reason, the present chapter
|
||
|
does not distinguish between the \emph{number of decoders} and the \emph{number
|
||
|
of \uops{} dispatched per cycle}. Indeed, as there is no variability in
|
||
|
instruction decoding, designing a processor with different values for these two
|
||
|
microarchitectural parameters would have been inefficient.
|
||
|
|
||
|
\todo{}
|
||
|
|
||
|
The core of the model presented in this chapter is the discovery, for each
|
||
|
instruction, of its \uop{} count. Assuming that a model of the backend is known
|
||
|
--~by taking for instance a model generated by \palmed{} or \uopsinfo{}~--, the
|
||
|
method described in \autoref{ssec:a72_insn_muop_count} should be generic enough
|
||
|
to be used on any processor. The basic instructions may be easily selected
|
||
|
using the backend model --~we assume their existence in most
|
||
|
microarchitectures, as pragmatic concerns guide the ports design. Counting the
|
||
|
\uops{} of an instruction thus follows, using only elapsed cycles counters,
|
||
|
assuming $\cycF{\kerK}$ bottlenecks on a global dispatch queue for $\kerK$. This
|
||
|
can however be arranged by selecting well-chosen kernels~--- for instance, on
|
||
|
the A72, care must be taken to interleave instructions corresponding to diverse
|
||
|
enough dispatch pipelines.
|
||
|
|
||
|
In order to generalize this method to arbitrary microarchitectures, it is
|
||
|
first necessary to obtain a global view of the common design choices that vary
|
||
|
between processors' frontends. A comparative study of their respective
|
||
|
importance to accurately model frontends, and ways to circumvent their impact
|
||
|
on the measure of $\cycF{\kerK}$ to count \uops{} per instruction would also be
|
||
|
needed.
|
||
|
|
||
|
Such fully-automated methods would probably be unable to account for
|
||
|
``unusual'' frontend bottlenecks ---~at least not at the level of detail that
|
||
|
\eg{} \uica{} authors gather for Intel frontends~\cite{uica}. This level of
|
||
|
detail, however, is possible exactly because the authors' restricted their
|
||
|
scope to microarchitectures that share a lot of similarity, coming from the
|
||
|
same manufacturer. Assessing extent of the loss of precision of an
|
||
|
automatically-generated model, and its gain of precision \wrt{} a model without
|
||
|
frontend, remains to be done.
|