\section{Necessity to go beyond ports} The resource models produced by \palmed{} are mainly concerned with the backend of the CPUs modeled. However, the importance of the frontend in the accuracy of a model's prediction cannot be ignored. Its effect can be clearly seen in the evaluation heatmaps of various code analyzers in \autoref{fig:palmed_heatmaps}. Each heatmap has a clear-cut limit on the horizontal axis: independently of the benchmark's content, it is impossible to reach more than a given number of instructions per cycle for a given processor ---~4 instructions for the \texttt{SKL-SP}, 5 for the \texttt{ZEN1}. This limit is imposed by the frontend. Some analyzers, such as \palmed{} and \iaca{}, model this limit: the heatmap shows that the predicted IPC will not surpass this limit. The other three analyzers studied, however, do not model this limit; for instance, \uopsinfo{} has a high density of benchmarks predicted at 8 instructions per cycle on SPEC2017 on the \texttt{SKL-SP} CPU, while the native measurement yielded only 4 instructions per cycle. The same effect is visible on \pmevo{} and \llvmmca{} heatmaps. \begin{example}[High back-end throughput on \texttt{SKL-SP}] On the \texttt{SKL-SP} microarchitecture, assuming an infinitely large frontend, a number of instructions per cycle higher than 4 is easy to reach. According to \uopsinfo{} data, a 64-bits integer \lstxasm{addq} is processed with a single \uop{}, dispatched on port 0, 1, 5 or 6. In the meantime, a simple form 64 bits register store to a direct register-held address ---~\eg{} a \lstxasm{movq \%rax, (\%rbx)}~--- is also processed with a single \uop{}, dispatched on port 2 or 3. Thus, backend-wise, the kernel $4\times \texttt{addq} + 2\times \texttt{mov}$ has a throughput of 6 instructions per cycle. However, in reality, this kernel would be frontend-bound, with a theoretical maximum throughput of 4 instructions per cycle ---~in fact, a \pipedream{} measure only yields 3 instructions per cycle. \end{example} \bigskip{} To account for this, \palmed{} tries to detect an additional resource, apart from the backend ports and combined ports, on which every \uop{} incurs a load. This allows \palmed{} to avoid large errors on frontend-bound kernels. The approach is, however, far from perfect. The clearest reason for this is is that the frontend, both on x86-64 and ARM architectures, works in-order, while \palmed{} inherently models kernels as multisets of instructions, thus completely ignoring ordering. This resource model is purely linear: an instruction incurs a load on the frontend resource in a fully commutative way, independently of the previous instructions executed this cycle and of many other effects. The article introducing \uica{}~\cite{uica} explores this question in detail for x86-64 Intel architectures. The authors, having previously developed \uopsinfo{}, discuss the importance of a correct modelling of the frontend to accurately predict throughput. Their approach, based on the exploration and reverse-engineering of the crucial parts of the frontend, showcases many important and non-trivial aspects of frontends usually neglected, such as the switching between the decoders and \uop{}-cache as source of instructions ---~which cannot be linearly modelled.