phd-thesis/manuscrit/40_A72-frontend/00_intro.tex

The usual reverse-engineering methods for CPU models usually make abundant use
of hardware counters ---~and legitimately so, as they are the natural and
accurate way to obtain insight on the internals of a CPU\@. Such methods
include, among others, the optimisation guides from Agner Fog~\cite{AgnerFog},
as well as \uopsinfo{}~\cite{uopsinfo} and \uica{}'s~\cite{uica} approach to
respectively model the CPU's back- and front-end.  In \autoref{chap:palmed}, we
introduced Palmed, whose main goal is to automatically produce port-mappings of
CPUs without assuming the presence of specific hardware counters.

\smallskip{}

The ARM architectures occupy a growing space in the global computing ecosystem.
They are already pervasive among the embedded and mobile devices, with most
mobile phones featuring an ARM CPU~\cite{arm_mobile}. Processors based on ARM
are emerging in datacenters and supercomputers: the Fugaku supercomputer
---~considered the fastest supercomputer in the world by the TOP500
ranking~\cite{fugaku_top500}~--- runs on ARM-based CPUs~\cite{fugaku_arm}, the
MareNostrum 4 supercomputer has an ARM-based cluster~\cite{marenostrum4_arm}.

Yet, the ARM ecosystem is still lacking in performance debugging tooling. While
\llvmmca{} supports ARM, it is one of the only few: \iaca{}, made by Intel, is
not supported ---~and will never be, as it is end-of-life~---; \uica{} is
focused on Intel architectures, and cannot be easily ported as it heavily
relies on reverse engineering specific to Intel, and enabled by specific
hardware counters; Intel \texttt{VTune}, a commonly used profiling performance
analysis tool, supports only x86-64.

\smallskip{}

In this context, modelling an ARM CPU ---~the Cortex A72~--- with Palmed seemed
to be an important goal, especially meaningful as this particular CPU only has
very few hardware counters. However, it yielded only mixed results, as shown in
\autoref{sec:palmed_results}.

\bigskip{}

In this chapter, we show that a major cause of imprecision in these results is
the absence of a frontend model. We manually model the Cortex A72 frontend to
compare a raw \palmed{}-generated model, to one naively augmented with a
frontend model.

While this chapter only documents a manual approach, we view it as a
preliminary work towards an automation of the synthesis of a model that stems
from benchmarks data, in the same way that \palmed{} synthesises a backend
model.