Wrapping up: minor rewordings

This commit is contained in:
Théophile Bastian 2024-10-04 18:46:51 +02:00
parent ad4d34bf1c
commit da9d606325

View file

@ -14,16 +14,16 @@ described below.
\medskip{} \medskip{}
To conclude this manuscript, we loosely combine those three models into a To conclude this manuscript, we take a minimalist first approach at combining
predictor, that we call \acombined{}, by taking the maximal prediction among those three models into a predictor, that we call \acombined{}, by taking the
the three models. maximal prediction among the three models.
This method is clearly less precise than \eg{} \uica{} or \llvmmca{}'s This method is clearly less precise than \eg{} \uica{} or \llvmmca{}'s
methods, which simulate iterations of the kernel while accounting for each methods, which simulate iterations of the kernel while accounting for each
model. It however allows us to quickly and easily evaluate an \emph{upper model. It however allows us to quickly and easily evaluate an \emph{upper
bound} of the quality of our models: a more refined tool using our models bound} of the quality of our models: a more refined tool using our models
should obtain results at least as good as this method ---~and hopefully should obtain results at least as good as this method ---~but we could expect
significantly better. it to perform significantly better.
\section{Critical path model} \section{Critical path model}
@ -38,10 +38,12 @@ by \osaca{}~\cite{osaca2}.
In our case, we use instructions' latencies inferred by \palmed{} and its In our case, we use instructions' latencies inferred by \palmed{} and its
backend \pipedream{} on the A72. backend \pipedream{} on the A72.
However, this method fails to account for out-of-orderness: the latency of an \medskip{}
instruction is hidden by other computations, independent of the former one's
result. This instruction-level parallelism is limited by the reorder buffer's So far, however, this method would fail to account for out-of-orderness: the
size. latency of an instruction is hidden by other computations, independent of the
former one's result. This instruction-level parallelism is limited by the
reorder buffer's size.
We thus unroll the kernel as many times as fits in the reorder buffer We thus unroll the kernel as many times as fits in the reorder buffer
---~accounting for each instruction's \uop{} count, as we have a frontend model ---~accounting for each instruction's \uop{} count, as we have a frontend model
@ -76,7 +78,7 @@ Osaca (crit. path) & 1773 & 3 & (0.17\,\%) & 84.02\,\% & 70.39\,\% & 40.37\,\% &
model}\label{fig:a72_combined_stats_boxplot} model}\label{fig:a72_combined_stats_boxplot}
\end{figure} \end{figure}
We evaluate \acombined{} using \cesasme{} on the Raspberry Pi's Cortex A72, We evaluate \acombined{} with \cesasme{} on the Raspberry Pi's Cortex A72,
using the same set of benchmarks as in \autoref{chap:CesASMe} recompiled for using the same set of benchmarks as in \autoref{chap:CesASMe} recompiled for
AArch64. As most of the code analyzers we studied are unable to run on the A72, AArch64. As most of the code analyzers we studied are unable to run on the A72,
we are only able to compare \acombined{} to the baseline \perf{} measure, we are only able to compare \acombined{} to the baseline \perf{} measure,