Wrapping up: minor rewordings
This commit is contained in:
parent
ad4d34bf1c
commit
da9d606325
1 changed files with 12 additions and 10 deletions
|
@ -14,16 +14,16 @@ described below.
|
|||
|
||||
\medskip{}
|
||||
|
||||
To conclude this manuscript, we loosely combine those three models into a
|
||||
predictor, that we call \acombined{}, by taking the maximal prediction among
|
||||
the three models.
|
||||
To conclude this manuscript, we take a minimalist first approach at combining
|
||||
those three models into a predictor, that we call \acombined{}, by taking the
|
||||
maximal prediction among the three models.
|
||||
|
||||
This method is clearly less precise than \eg{} \uica{} or \llvmmca{}'s
|
||||
methods, which simulate iterations of the kernel while accounting for each
|
||||
model. It however allows us to quickly and easily evaluate an \emph{upper
|
||||
bound} of the quality of our models: a more refined tool using our models
|
||||
should obtain results at least as good as this method ---~and hopefully
|
||||
significantly better.
|
||||
should obtain results at least as good as this method ---~but we could expect
|
||||
it to perform significantly better.
|
||||
|
||||
\section{Critical path model}
|
||||
|
||||
|
@ -38,10 +38,12 @@ by \osaca{}~\cite{osaca2}.
|
|||
In our case, we use instructions' latencies inferred by \palmed{} and its
|
||||
backend \pipedream{} on the A72.
|
||||
|
||||
However, this method fails to account for out-of-orderness: the latency of an
|
||||
instruction is hidden by other computations, independent of the former one's
|
||||
result. This instruction-level parallelism is limited by the reorder buffer's
|
||||
size.
|
||||
\medskip{}
|
||||
|
||||
So far, however, this method would fail to account for out-of-orderness: the
|
||||
latency of an instruction is hidden by other computations, independent of the
|
||||
former one's result. This instruction-level parallelism is limited by the
|
||||
reorder buffer's size.
|
||||
|
||||
We thus unroll the kernel as many times as fits in the reorder buffer
|
||||
---~accounting for each instruction's \uop{} count, as we have a frontend model
|
||||
|
@ -76,7 +78,7 @@ Osaca (crit. path) & 1773 & 3 & (0.17\,\%) & 84.02\,\% & 70.39\,\% & 40.37\,\% &
|
|||
model}\label{fig:a72_combined_stats_boxplot}
|
||||
\end{figure}
|
||||
|
||||
We evaluate \acombined{} using \cesasme{} on the Raspberry Pi's Cortex A72,
|
||||
We evaluate \acombined{} with \cesasme{} on the Raspberry Pi's Cortex A72,
|
||||
using the same set of benchmarks as in \autoref{chap:CesASMe} recompiled for
|
||||
AArch64. As most of the code analyzers we studied are unable to run on the A72,
|
||||
we are only able to compare \acombined{} to the baseline \perf{} measure,
|
||||
|
|
Loading…
Reference in a new issue