\section{A frontend model for the Cortex A72} \begin{frame}{The Cortex A72} \begin{itemize} \item{} Low-power ARM CPU \item{} CPU of the Raspberry Pi 4: easily available \item{} Aarch64, NEON SIMD \medskip{} \item{} ARM CPUs not usually modeled! \item{} Backend modeled by \palmed{} \end{itemize} \end{frame} \begin{frame} \centering \includegraphics[width=0.9\textwidth]{A72_pipeline_diagram.svg} \end{frame} \begin{frame}{Manual model} \begin{itemize} \item Goal: manually craft a frontend model \item Try to follow methods that can be automated \item Propose a parametric model for future works, leaving question marks on some sections \end{itemize} \end{frame} \begin{frame}{Counting \uops{}} For an instruction $i$, denote \alert{$\mucount{i}$} its number of \uops{}. \begin{itemize} \item{} For $k \in \nat$, construct (if possible) $\kerK_k$ a kernel: \begin{itemize} \item instruction $i$ + $k$ ``simple'' instructions (one \uop) \item frontend-bound: \[ \cyc{\kerK_k} = \dfrac{k + \mucount{i}}{3} \] \end{itemize} \item{} For well-chosen $k_0$, we should have \[ \cyc{\kerK_{k_0}} + \sfrac{1}{3} = \cyc{\kerK_{k_0+1}} \] \item{} Measure to verify \bigskip \item{} If so, \textbf{\[ \mucount{i} = 3 \cyc{\kerK_{k_0}} - k \]} \end{itemize} \end{frame}