\section{State of the art}\label{sec:sota} Performance models for CPUs have been previously studied, and applied to static code performance analysis. \subsection{Manufacturer-sourced data} Manufacturers of CPUs are expected to offer optimisation data for software compiled for their processors. This data may be used by compilers authors, within highly-optimized libraries or in the optimisation process of critical sections of programs that require very high performance. \medskip Intel provides its \textit{IntelĀ® 64 and IA-32 Architectures Optimization Reference Manual}~\cite{ref:intel64_architectures_optim_reference_vol1}, regularly updated, whose nearly 1,000 pages give relevant details to Intel's microarchitectures, such as block diagrams, pipelines, ports available, etc. It further gives data tables with throughput and latencies for some instructions. While the manual provides a huge collection of important insights ---~from the optimisation perspective~--- on their microarchitectures, it lacks exhaustive and (conveniently) machine-parsable data tables and does not detail port usages of each instruction. ARM typically releases optimisation manuals that are way more complete for its microarchitectures, such as the Cortex A72 optimisation manual~\cite{ref:a72_optim}. AMD, since 2020, releases lengthy and complete optimisation manuals for its microarchitecture. For instance, the Zen4 optimisation manual~\cite{ref:amd_zen4_optim_manual} contains both detailed insights on the processor's workflow and ports, and a spreadsheet of about 3,400 x86 instructions ---~with operands variants broken down~--- and their port usage, throughput and latencies. Such an effort, which certainly translates to a non-negligible financial cost to the company, showcases the importance and recent expectations on such documents. \medskip{} As a part of its EXEgesis project~\cite{tool:google_exegesis}, Google made an effort to parse Intel's microarchitecture manuals, resulting in a machine-usable data source of instruction details. The extracted data has since then been contributed to the llvm compiler's data model. The project, however, is no longer developed. \subsection{Third-party instruction data} The lack, for many microarchitectures, of reliable, exhaustive and machine-usable data for individual instructions has driven academics to independently obtain this data from an experimental approach. \medskip Since 1996, Agner Fog has been maintaining tables of values useful for optimisation purposes for x86 instructions~\cite{AgnerFog}. These tables, still maintained and updated today, are often considered very accurate. They are the result of benchmarking scripts developed by the author, subject to manual ---~and thus tedious, given the size of microarchitectures~--- analysis, and are mainly conducted through hardware counters measurements. The main issue, however, is that those tables are generated through the use of hand-picked instructions and benchmarks, depending on specific hardware counters and features specific to some CPU manufacturers. As such, while these tables are very helpful on the supported CPUs for x86, the method does not scale to the abundance of CPUs on which such tables may be useful ---~for instance, ARM processors, embedded platforms, etc. \medskip Following the work of Agner Fog, Andreas Abel and Jan Reineke have designed the \uopsinfo{} framework~\cite{uopsinfo}, striving to automate the previous methodology. Their work, providing data tables for the vast majority of instructions on many recent Intel microarchitectures, has been recently enhanced to also support AMD architectures. The \uopsinfo{} approach, detailed in their article, consists in finding so-called \textit{blocking instructions} for each port which, used in combination of the instruction to be benchmarked and port-specific hardware counters, yield a detailed analysis of the port usage of each instruction ---~and even its break-down into \uops{}. This makes for an accurate and robust approach, but also limits it to microarchitectures offering such counters, and requires a manual analysis of each microarchitecture to be supported in order to find a fitting set of blocking instructions. Although we have no theoretical guarantee of the existence of such instructions, this should never be a problem, as all pragmatic microarchitecture design will lead to their existence. \subsection{Code analyzers and their models} Going further than data extraction at the individual instruction level, academics and industrials interested in this domain now mostly work on code analyzers, as described in \autoref{ssec:code_analyzers} above. Each such tool embeds a model ---~or collection of models~--- on which its inference is based, and whose definition, embedded data and obtention method varies from tool to tool. These tools often use, to some extent, the data on individual instructions obtained either from the manufacturer or the third-party efforts mentioned above. \medskip{} The Intel Architecture Code Analyzer (\iaca)~\cite{iaca}, released by Intel, is a fully-closed source analyzer able to analyze assembly code for Intel microarchitectures only. It draws on Intel's own knowledge of their microarchitectures to make accurate predictions. This accuracy made it very helpful to experts aiming to do performance debugging on supported microarchitectures. Yet, being closed-source and relying on data that is partially unavailable to the public, the model is not totally satisfactory to academics or engineers trying to understand specific performance results. It also makes it vulnerable to deprecation, as the community is unable to \textit{fork} the project ---~and indeed, \iaca{} has been discontinued by Intel in 2019. Thus, \iaca{} does not support recent microarchitectures, and its binary was recently removed from official download pages. \medskip{} In the meantime, the LLVM Machine Code Analyzer ---~or \llvmmca{}~--- was developed as an internal tool at Sony, and was proposed for inclusion in \llvm{} in 2018~\cite{llvm_mca_rfc}. This code analyzer is based on the data tables that \llvm{} ---~a compiler~--- has to maintain for each microarchitecture in order to produce optimized code. The project has since then evolved to be fairly accurate, as seen in the experiments later presented in this manuscript. It is the alternative Intel offers to \iaca{} subsequently to its deprecation. \medskip{} Another model, \osaca{}, was developed by Jan Laukemann \textit{et al.} starting in 2017~\cite{osaca1,osaca2}. Its development stemmed from the lack (at the time) of an open-source ---~and thus, open-model~--- alternative to IACA. As a data source, \osaca{} makes use of Agner Fog's data tables or \uopsinfo{}. It still lacks, however, a good model of frontend and data dependencies, making it less performant than other code analyzers in our experiments later in this manuscript. \medskip{} Taking another approach entirely, \ithemal{} is a machine-learning-based code analyzer striving to predict the reciprocal throughput of a given kernel. The necessity of its training resulted in the development of \bhive{}, a benchmark suite of kernels extracted from real-life programs and libraries, along with a profiler measuring the runtime, in CPU cycles, of a basic block isolated from its context. This approach, in our experiments, was significantly less accurate than those not based on machine learning. In our opinion, its main issue, however, is to be a \textit{black-box model}: given a kernel, it is only able to predict its reverse throughput. Doing so, even with perfect accuracy, does not explain the source of a performance problem: the model is unable to help detecting which resource is the performance bottleneck of a kernel; in other words, it quantifies a potential issue, but does not help in \emph{explaining} it ---~or debugging it. \medskip{} In yet another approach, \pmevo{}~\cite{PMEvo} uses genetic algorithms to infer, from scratch and in a benchmarks-oriented approach, a port-mapping of the processor it is running on. It is, to the best of our knowledge, the first tool striving to compute a port-mapping model in a fully-automated way, as \palmed{} does (see \autoref{chap:palmed} later), although through a completely different methodology. As detailed in \palmed{}'s article~\cite{palmed}, it however suffers from a lack of scalability: as generating a port-mapping for the few thousands of x86-64 instructions would be extremely time-consuming with this approach, the authors limit the evaluation of their tool to around 300 most common instructions. \medskip{} Abel and Reineke, the authors of \uopsinfo{}, recently released \uica{}~\cite{uica}, a code analyzer for Intel microarchitectures based on \uopsinfo{} tables on one hand as a port model, and on manual reverse-engineering through the use of hardware counters to model the frontend and pipelines. We found this tool to be very accurate (see experiments later in this manuscript), with results comparable with \llvmmca{}. Its source code ---~under free software license~--- is self-contained and reasonably concise (about 2,000 lines of Python for the main part), making it a good basis and baseline for experiments. It is, however, closely tied by design to Intel microarchitectures, or microarchitectures very close to Intel's ones.