2023-09-20 16:09:50 +02:00
|
|
|
The usual reverse-engineering methods for CPU models usually make abundant use
|
|
|
|
of hardware counters ---~and legitimately so, as they are the natural and
|
|
|
|
accurate way to obtain insight on the internals of a CPU\@. Such methods
|
|
|
|
include, among others, the optimisation guides from Agner Fog~\cite{AgnerFog},
|
|
|
|
as well as \uopsinfo{}~\cite{uopsinfo} and \uica{}'s~\cite{uica} approach to
|
|
|
|
respectively model the CPU's back- and front-end. In \autoref{chap:palmed}, we
|
|
|
|
introduced Palmed, whose main goal is to automatically produce port-mappings of
|
|
|
|
CPUs without assuming the presence of specific hardware counters.
|
|
|
|
|
|
|
|
\smallskip{}
|
|
|
|
|
|
|
|
The ARM architectures occupy a growing space in the global computing ecosystem.
|
|
|
|
They are already pervasive among the embedded and mobile devices, with most
|
|
|
|
mobile phones featuring an ARM CPU~\cite{arm_mobile}. Processors based on ARM
|
|
|
|
are emerging in datacenters and supercomputers: the Fugaku supercomputer
|
|
|
|
---~considered the fastest supercomputer in the world by the TOP500
|
|
|
|
ranking~\cite{fugaku_top500}~--- runs on ARM-based CPUs~\cite{fugaku_arm}, the
|
|
|
|
MareNostrum 4 supercomputer has an ARM-based cluster~\cite{marenostrum4_arm}.
|
|
|
|
|
|
|
|
Yet, the ARM ecosystem is still lacking in performance debugging tooling. While
|
|
|
|
\llvmmca{} supports ARM, it is one of the only few: \iaca{}, made by Intel, is
|
|
|
|
not supported ---~and will never be, as it is end-of-life~---; \uica{} is
|
|
|
|
focused on Intel architectures, and cannot be easily ported as it heavily
|
|
|
|
relies on reverse engineering specific to Intel, and enabled by specific
|
|
|
|
hardware counters; Intel \texttt{VTune}, a commonly used profiling performance
|
|
|
|
analysis tool, supports only x86-64.
|
|
|
|
|
|
|
|
\smallskip{}
|
|
|
|
|
|
|
|
In this context, modelling an ARM CPU ---~the Cortex A72~--- with Palmed seemed
|
|
|
|
to be an important goal, especially meaningful as this particular CPU only has
|
|
|
|
very few hardware counters. However, it yielded only mixed results, as shown in
|
|
|
|
\autoref{sec:palmed_results}.
|
|
|
|
|
|
|
|
\bigskip{}
|
|
|
|
|
|
|
|
In this chapter, we show that a major cause of imprecision in these results is
|
|
|
|
the absence of a frontend model. We manually model the Cortex A72 frontend to
|
|
|
|
compare a raw \palmed{}-generated model, to one naively augmented with a
|
|
|
|
frontend model. \todo{discuss automated future work}
|