Wrapping up: minor rewordings

This commit is contained in:
Théophile Bastian 2024-10-04 18:46:51 +02:00
parent ad4d34bf1c
commit da9d606325

View file

@ -14,16 +14,16 @@ described below.
\medskip{}
To conclude this manuscript, we loosely combine those three models into a
predictor, that we call \acombined{}, by taking the maximal prediction among
the three models.
To conclude this manuscript, we take a minimalist first approach at combining
those three models into a predictor, that we call \acombined{}, by taking the
maximal prediction among the three models.
This method is clearly less precise than \eg{} \uica{} or \llvmmca{}'s
methods, which simulate iterations of the kernel while accounting for each
model. It however allows us to quickly and easily evaluate an \emph{upper
bound} of the quality of our models: a more refined tool using our models
should obtain results at least as good as this method ---~and hopefully
significantly better.
should obtain results at least as good as this method ---~but we could expect
it to perform significantly better.
\section{Critical path model}
@ -38,10 +38,12 @@ by \osaca{}~\cite{osaca2}.
In our case, we use instructions' latencies inferred by \palmed{} and its
backend \pipedream{} on the A72.
However, this method fails to account for out-of-orderness: the latency of an
instruction is hidden by other computations, independent of the former one's
result. This instruction-level parallelism is limited by the reorder buffer's
size.
\medskip{}
So far, however, this method would fail to account for out-of-orderness: the
latency of an instruction is hidden by other computations, independent of the
former one's result. This instruction-level parallelism is limited by the
reorder buffer's size.
We thus unroll the kernel as many times as fits in the reorder buffer
---~accounting for each instruction's \uop{} count, as we have a frontend model
@ -76,7 +78,7 @@ Osaca (crit. path) & 1773 & 3 & (0.17\,\%) & 84.02\,\% & 70.39\,\% & 40.37\,\% &
model}\label{fig:a72_combined_stats_boxplot}
\end{figure}
We evaluate \acombined{} using \cesasme{} on the Raspberry Pi's Cortex A72,
We evaluate \acombined{} with \cesasme{} on the Raspberry Pi's Cortex A72,
using the same set of benchmarks as in \autoref{chap:CesASMe} recompiled for
AArch64. As most of the code analyzers we studied are unable to run on the A72,
we are only able to compare \acombined{} to the baseline \perf{} measure,