\section{Related works} \paragraph{Another comparative study: \anica{}.} The \anica{} framework~\cite{anica} also attempts to comparatively evaluate various throughput predictors by finding examples on which they are inaccurate. \anica{} starts with randomly generated assembly snippets fed to various code analyzers. Once it finds a snippet on which (some) code analyzers yield unsatisfying results, it refines it through a process derived from abstract interpretation to reach a more general category of input, \eg{} ``a load to a SIMD register followed by a SIMD arithmetic operation''. \paragraph{A dynamic code analyzer: \gus{}.} So far, this manuscript was mostly concerned with static code analyzers. Throughput prediction tools, however, are not all static. \gus is a dynamic tool first introduced in \fgruber{}'s PhD thesis~\cite{phd:gruber}. It leverages \qemu{}'s instrumentation capabilities to dynamically predict the throughput of user-defined regions of interest in whole program. In these regions, it instruments every instruction, memory access, \ldots{} in order to retrieve the exact events occurring through the program's execution. \gus{} then leverages throughput, latency and microarchitectural models to analyze resource usage and produce an accurate theoretical elapsed cycles prediction. Its main strength, however, resides in its \emph{sensitivity analysis} capabilities: by applying an arbitrary factor to some parts of the model (\eg{} latencies, arithmetics port, \ldots{}), it is possible to investigate the impact of a specific resource on the final execution time of a region of interest. It can also accurately determine if a resource is actually a bottleneck for a region, \ie{} if increasing this resource's capabilities would reduce the execution time. The output of \gus{} on a region of interest provides a very detailed insight on each instruction's resource consumption and its contribution to the final execution time. As a dynamic analysis tool, it is also able to extract the dependencies an instruction exhibits on a real run. The main downside of \gus{}, however, is its slowness. As most dynamic tools, it suffers from a heavy slowdown compared to a native execution of the binary, oftentimes about $100\times$ slower. While it remains a precious tool to the user willing to deeply optimize an execution kernel, it makes \gus{} highly impractical to run on a large collection of execution kernels. \paragraph{An isolated basic-block profiler: \bhive{}.} In \autoref{sec:redefine_exec_time} above, we advocated for measuring a basic block's execution time \emph{in-context}. The \bhive{} profiler~\cite{bhive}, initially written by \ithemal{}'s authors~\cite{ithemal} to provide their model with sufficient ---~and sufficiently accurate~--- training data, takes an orthogonal approach to basic block throughput measurement. By mapping memory at any address accessed by a basic block, it can effectively run and measure arbitrary code without context, often ---~but not always, as we discuss later~--- yielding good results.