diff --git a/manuscrit/90_wrapping_up/main.tex b/manuscrit/90_wrapping_up/main.tex index ae1b6de..5da57a3 100644 --- a/manuscrit/90_wrapping_up/main.tex +++ b/manuscrit/90_wrapping_up/main.tex @@ -14,16 +14,16 @@ described below. \medskip{} -To conclude this manuscript, we loosely combine those three models into a -predictor, that we call \acombined{}, by taking the maximal prediction among -the three models. +To conclude this manuscript, we take a minimalist first approach at combining +those three models into a predictor, that we call \acombined{}, by taking the +maximal prediction among the three models. This method is clearly less precise than \eg{} \uica{} or \llvmmca{}'s methods, which simulate iterations of the kernel while accounting for each model. It however allows us to quickly and easily evaluate an \emph{upper bound} of the quality of our models: a more refined tool using our models -should obtain results at least as good as this method ---~and hopefully -significantly better. +should obtain results at least as good as this method ---~but we could expect +it to perform significantly better. \section{Critical path model} @@ -38,10 +38,12 @@ by \osaca{}~\cite{osaca2}. In our case, we use instructions' latencies inferred by \palmed{} and its backend \pipedream{} on the A72. -However, this method fails to account for out-of-orderness: the latency of an -instruction is hidden by other computations, independent of the former one's -result. This instruction-level parallelism is limited by the reorder buffer's -size. +\medskip{} + +So far, however, this method would fail to account for out-of-orderness: the +latency of an instruction is hidden by other computations, independent of the +former one's result. This instruction-level parallelism is limited by the +reorder buffer's size. We thus unroll the kernel as many times as fits in the reorder buffer ---~accounting for each instruction's \uop{} count, as we have a frontend model @@ -76,7 +78,7 @@ Osaca (crit. path) & 1773 & 3 & (0.17\,\%) & 84.02\,\% & 70.39\,\% & 40.37\,\% & model}\label{fig:a72_combined_stats_boxplot} \end{figure} -We evaluate \acombined{} using \cesasme{} on the Raspberry Pi's Cortex A72, +We evaluate \acombined{} with \cesasme{} on the Raspberry Pi's Cortex A72, using the same set of benchmarks as in \autoref{chap:CesASMe} recompiled for AArch64. As most of the code analyzers we studied are unable to run on the A72, we are only able to compare \acombined{} to the baseline \perf{} measure,