44 lines
2.2 KiB
TeX
44 lines
2.2 KiB
TeX
\section{Evaluation on Palmed}
|
|
|
|
To evaluate the gain brought by each frontend model, we plug them successively
|
|
on top of the \palmed{} backend model. The number of cycles for a kernel
|
|
$\kerK$ is then predicted as the maximum between the backend-predicted time and
|
|
the frontend-predicted time.
|
|
|
|
We evaluate four models: \palmed{}'s backend alone, \palmed{} with a purely
|
|
linear frontend, based on our modeled number of \uops{} for each instruction,
|
|
\palmed{} with the no-cross frontend, and finally \palmed{} with the
|
|
dispatch-queues frontend. The results of each model are reported in
|
|
\autoref{table:a72_frontend_err}, to which we add \llvmmca{}'s results as a
|
|
basis for comparison with the state-of-the-art.
|
|
|
|
\begin{table}
|
|
\centering
|
|
\begin{tabular}{l l c r r r r r}
|
|
\toprule
|
|
& & & \multirow{2}{*}{\llvmmca{}} & \multicolumn{4}{c}{\palmed{} with
|
|
frontend\ldots} \\
|
|
& & & & none & linear & no-cross & disp.\ queues \\
|
|
\midrule{}
|
|
\multirow{3}{*}{SPEC} & \covrow{} & 100.0 & \na{} & 97.21 & 97.21 & 97.16 \\
|
|
& \errrow{} & 9.0 & 20.1 & 6.2 & 6.3 & 4.6 \\
|
|
& \taurow{} & 0.83 & 0.88 & 0.91 & 0.91 & 0.93 \\
|
|
\midrule
|
|
\multirow{3}{*}{Polybench} & \covrow{} & 100.00 & \na{} & 99.33 & 99.33 & 99.33 \\
|
|
& \errrow{} & 13.9 & 12.6 & 8.1 & 8.1 & 8.0 \\
|
|
& \taurow{} & 0.47 & 0.82 & 0.88 & 0.88 & 0.90 \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\caption{Comparative accuracy of IPC predictions with different frontend
|
|
models on the Cortex A72}\label{table:a72_frontend_err}
|
|
\end{table}
|
|
|
|
As expected, the error is greatly reduced with the addition of any reasonable
|
|
frontend model ---~especially on the SPEC benchmark suite. Using the
|
|
dispatch-queues model, which models more accurately the frontend, further
|
|
reduces significantly the error rate on SPEC by 1.6 points, without
|
|
significantly increasing the $\tau_K$ coefficient. On Polybench, however, the
|
|
gains brought by the dispatch-queues model are very modest ---~only 0.1 point.
|
|
|
|
In all cases, \palmed{} with a frontend model performs significantly better
|
|
than \llvmmca{} on the Cortex A72.
|