phd-thesis/manuscrit/40_A72-frontend/40_evaluation.tex

45 lines
2.2 KiB
TeX
Raw Normal View History

\section{Evaluation on Palmed}
To evaluate the gain brought by each frontend model, we plug them successively
on top of the \palmed{} backend model. The number of cycles for a kernel
$\kerK$ is then predicted as the maximum between the backend-predicted time and
the frontend-predicted time.
We evaluate four models: \palmed{}'s backend alone, \palmed{} with a purely
linear frontend, based on our modeled number of \uops{} for each instruction,
\palmed{} with the no-cross frontend, and finally \palmed{} with the
dispatch-queues frontend. The results of each model are reported in
\autoref{table:a72_frontend_err}, to which we add \llvmmca{}'s results as a
basis for comparison with the state-of-the-art.
\begin{table}
\centering
\begin{tabular}{l l c r r r r r}
\toprule
& & & \multirow{2}{*}{\llvmmca{}} & \multicolumn{4}{c}{\palmed{} with
frontend\ldots} \\
& & & & none & linear & no-cross & disp.\ queues \\
\midrule{}
\multirow{3}{*}{SPEC} & \covrow{} & 100.0 & \na{} & 97.21 & 97.21 & 97.16 \\
& \errrow{} & 9.0 & 20.1 & 6.2 & 6.3 & 4.6 \\
& \taurow{} & 0.83 & 0.88 & 0.91 & 0.91 & 0.93 \\
\midrule
\multirow{3}{*}{Polybench} & \covrow{} & 100.00 & \na{} & 99.33 & 99.33 & 99.33 \\
& \errrow{} & 13.9 & 12.6 & 8.1 & 8.1 & 8.0 \\
& \taurow{} & 0.47 & 0.82 & 0.88 & 0.88 & 0.90 \\
\bottomrule
\end{tabular}
\caption{Comparative accuracy of IPC predictions with different frontend
models on the Cortex A72}\label{table:a72_frontend_err}
\end{table}
As expected, the error is greatly reduced with the addition of any reasonable
frontend model ---~especially on the SPEC benchmark suite. Using the
dispatch-queues model, which models more accurately the frontend, further
reduces significantly the error rate on SPEC by 1.6 points, without
significantly increasing the $\tau_K$ coefficient. On Polybench, however, the
gains brought by the dispatch-queues model are very modest ---~only 0.1 point.
In all cases, \palmed{} with a frontend model performs significantly better
than \llvmmca{} on the Cortex A72.