In the previous chapters, we focused on two of the main bottleneck factors for computation kernels: \autoref{chap:palmed} investigated the backend aspect of throughput prediction, while \autoref{chap:frontend} dived into the frontend aspects. Throughout those two chapters, we entirely left out another crucial factor: dependencies, and the latency they induce between instructions. We managed to do so, because our baseline of native execution was \pipedream{} measures, \emph{designed} to suppress any dependency. However, state-of-the-art tools strive to provide an estimation of the execution time $\cyc{\kerK}$ of a given kernel $\kerK$ that is \emph{as precise as possible}, and as such, cannot neglect this third major bottleneck. An exact throughput prediction would require a cycle-accurate simulator of the processor, based on microarchitectural data that is most often not publicly available, and would be prohibitively slow in any case. These tools thus each solve in their own way the challenge of modeling complex CPUs while remaining simple enough to yield a prediction in a reasonable time, ending up with different models. For instance, on the following x86-64 basic block computing a general matrix multiplication, \begin{minipage}{0.95\linewidth} \begin{lstlisting}[language={[x86masm]Assembler}] movsd (%rcx, %rax), %xmm0 mulsd %xmm1, %xmm0 addsd (%rdx, %rax), %xmm0 movsd %xmm0, (%rdx, %rax) addq $8, %rax cmpq $0x2260, %rax jne 0x16e0 \end{lstlisting} \end{minipage} \noindent\llvmmca{} predicts 1.5 cycles, \iaca{} and \ithemal{} predict 2 cycles, while \uica{} predicts 3 cycles. One may wonder which tool is correct. In this chapter, we take a step back from our previous contributions, and assess more generally the landscape of code analyzers. What are the key bottlenecks to account for if one aims to predict the execution time of a kernel correctly? Are some of these badly accounted for by state-of-the-art code analyzers? This chapter, by conducting a broad experimental analysis of these tools, strives to answer these questions. \input{overview} \bigskip{} In \autoref{sec:redefine_exec_time}, we investigate how a kernel's execution time may be measured if we want to correctly account for its dependencies. We advocate for the measurement of the total execution time of a computation kernel in its original context, coupled with a precise measure of its number of iterations to normalize the measure. We then present a fully-tooled solution to evaluate and compare the diversity of static throughput predictors. Our tool, \cesasme, solves two main issues in this direction. In Section~\ref{sec:bench_gen}, we describe how \cesasme{} generates a wide variety of computation kernels stressing different parameters of the architecture, and thus of the predictors' models, while staying close to representative workloads. To achieve this, we use Polybench~\cite{bench:polybench}, a C-level benchmark suite representative of scientific computation workloads, that we combine with a variety of optimisations, including polyhedral loop transformations. In Section~\ref{sec:bench_harness}, we describe how \cesasme{} is able to evaluate throughput predictors on this set of benchmarks by lifting their predictions to a total number of cycles that can be compared to a hardware counters-based measure. A high-level view of \cesasme{} is shown in Figure~\ref{fig:contrib}. In Section~\ref{sec:exp_setup}, we detail our experimental setup and assess our methodology. In Section~\ref{sec:results_analysis}, we compare the predictors' results and analyze the results of \cesasme{}. In addition to statistical studies, we use \cesasme's results to investigate analyzers' flaws. We show that code analyzers do not always correctly model data dependencies through memory accesses, substantially impacting their precision.