phd-thesis/manuscrit/40_A72-frontend/00_intro.tex

The usual reverse-engineering methods for CPU models usually make abundant use
of hardware counters ---~and legitimately so, as they are the natural and
accurate way to obtain insight on the internals of a CPU\@. Such methods
include, among others, the optimisation guides from Agner Fog~\cite{AgnerFog},
as well as \uopsinfo{}~\cite{uopsinfo} and \uica{}'s~\cite{uica} approach to
respectively model the CPU's back- and front-end.  In \autoref{chap:palmed}, we
introduced Palmed, whose main goal is to automatically produce port-mappings of
CPUs without assuming the presence of specific hardware counters.

\smallskip{}

The ARM architectures occupy a growing space in the global computing ecosystem.
They are already pervasive among the embedded and mobile devices, with most
mobile phones featuring an ARM CPU~\cite{arm_mobile}. Processors based on ARM
are emerging in datacenters and supercomputers: the Fugaku supercomputer
---~considered the fastest supercomputer in the world by the TOP500
ranking~\cite{fugaku_top500}~--- runs on ARM-based CPUs~\cite{fugaku_arm}, the
MareNostrum 4 supercomputer has an ARM-based cluster~\cite{marenostrum4_arm}.

Yet, the ARM ecosystem is still lacking in performance debugging tooling. While
\llvmmca{} supports ARM, it is one of the only few: \iaca{}, made by Intel, is
not supported ---~and will never be, as it is end-of-life~---; \uica{} is
focused on Intel architectures, and cannot be easily ported as it heavily
relies on reverse engineering specific to Intel, and enabled by specific
hardware counters; Intel \texttt{VTune}, a commonly used profiling performance
analysis tool, supports only x86-64.

\smallskip{}

In this context, modelling an ARM CPU ---~the Cortex A72~--- with \palmed{}
seemed to be an important goal, especially meaningful as this particular CPU
only has very few hardware counters. However, it yielded only mixed results, as
we will see in \autoref{sec:a40_eval}.

\bigskip{}

In this chapter, we show that a major cause of imprecision in these results is
the absence in \palmed{} of a frontend model. We manually model the Cortex A72
frontend to compare a raw \palmed{}-generated model, to one naively augmented
with a frontend model.

While this chapter only documents a manual approach, we view it as a
preliminary work towards an automation of the synthesis of a model that stems
from benchmarks data, in the same way that \palmed{} synthesises a backend
model. In this direction, we propose in \autoref{sec:frontend_parametric_model}
a generic, parametric frontend that, we expect, could be used with good results
on many architectures. We also offer methodologies that we expect to be able to
automatically fill some of the parameters of this model for an arbitrary
architecture.