Wrapping up: minor rewordings
This commit is contained in:
parent
ad4d34bf1c
commit
da9d606325
1 changed files with 12 additions and 10 deletions
|
@ -14,16 +14,16 @@ described below.
|
||||||
|
|
||||||
\medskip{}
|
\medskip{}
|
||||||
|
|
||||||
To conclude this manuscript, we loosely combine those three models into a
|
To conclude this manuscript, we take a minimalist first approach at combining
|
||||||
predictor, that we call \acombined{}, by taking the maximal prediction among
|
those three models into a predictor, that we call \acombined{}, by taking the
|
||||||
the three models.
|
maximal prediction among the three models.
|
||||||
|
|
||||||
This method is clearly less precise than \eg{} \uica{} or \llvmmca{}'s
|
This method is clearly less precise than \eg{} \uica{} or \llvmmca{}'s
|
||||||
methods, which simulate iterations of the kernel while accounting for each
|
methods, which simulate iterations of the kernel while accounting for each
|
||||||
model. It however allows us to quickly and easily evaluate an \emph{upper
|
model. It however allows us to quickly and easily evaluate an \emph{upper
|
||||||
bound} of the quality of our models: a more refined tool using our models
|
bound} of the quality of our models: a more refined tool using our models
|
||||||
should obtain results at least as good as this method ---~and hopefully
|
should obtain results at least as good as this method ---~but we could expect
|
||||||
significantly better.
|
it to perform significantly better.
|
||||||
|
|
||||||
\section{Critical path model}
|
\section{Critical path model}
|
||||||
|
|
||||||
|
@ -38,10 +38,12 @@ by \osaca{}~\cite{osaca2}.
|
||||||
In our case, we use instructions' latencies inferred by \palmed{} and its
|
In our case, we use instructions' latencies inferred by \palmed{} and its
|
||||||
backend \pipedream{} on the A72.
|
backend \pipedream{} on the A72.
|
||||||
|
|
||||||
However, this method fails to account for out-of-orderness: the latency of an
|
\medskip{}
|
||||||
instruction is hidden by other computations, independent of the former one's
|
|
||||||
result. This instruction-level parallelism is limited by the reorder buffer's
|
So far, however, this method would fail to account for out-of-orderness: the
|
||||||
size.
|
latency of an instruction is hidden by other computations, independent of the
|
||||||
|
former one's result. This instruction-level parallelism is limited by the
|
||||||
|
reorder buffer's size.
|
||||||
|
|
||||||
We thus unroll the kernel as many times as fits in the reorder buffer
|
We thus unroll the kernel as many times as fits in the reorder buffer
|
||||||
---~accounting for each instruction's \uop{} count, as we have a frontend model
|
---~accounting for each instruction's \uop{} count, as we have a frontend model
|
||||||
|
@ -76,7 +78,7 @@ Osaca (crit. path) & 1773 & 3 & (0.17\,\%) & 84.02\,\% & 70.39\,\% & 40.37\,\% &
|
||||||
model}\label{fig:a72_combined_stats_boxplot}
|
model}\label{fig:a72_combined_stats_boxplot}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
We evaluate \acombined{} using \cesasme{} on the Raspberry Pi's Cortex A72,
|
We evaluate \acombined{} with \cesasme{} on the Raspberry Pi's Cortex A72,
|
||||||
using the same set of benchmarks as in \autoref{chap:CesASMe} recompiled for
|
using the same set of benchmarks as in \autoref{chap:CesASMe} recompiled for
|
||||||
AArch64. As most of the code analyzers we studied are unable to run on the A72,
|
AArch64. As most of the code analyzers we studied are unable to run on the A72,
|
||||||
we are only able to compare \acombined{} to the baseline \perf{} measure,
|
we are only able to compare \acombined{} to the baseline \perf{} measure,
|
||||||
|
|
Loading…
Add table
Reference in a new issue