phd-thesis/manuscrit/40_A72-frontend/40_evaluation.tex

\section{Evaluation on Palmed}

To evaluate the gain brought by each frontend model, we plug them successively
on top of the \palmed{} backend model. The number of cycles for a kernel
$\kerK$ is then predicted as the maximum between the backend-predicted time and
the frontend-predicted time.

We evaluate four models: \palmed{}'s backend alone, \palmed{} with a purely
linear frontend, based on our modeled number of \uops{} for each instruction,
\palmed{} with the no-cross frontend, and finally \palmed{} with the
dispatch-queues frontend. The results of each model are reported in
\autoref{table:a72_frontend_err}, to which we add \llvmmca{}'s results as a
basis for comparison with the state-of-the-art.

\begin{table}
    \centering
    \begin{tabular}{l l c r r r r r}
        \toprule
        & & & \multirow{2}{*}{\llvmmca{}} & \multicolumn{4}{c}{\palmed{} with
        frontend\ldots} \\
        & & & & none & linear & no-cross & disp.\ queues \\
        \midrule{}
        \multirow{3}{*}{SPEC} & \covrow{} & 100.0 & \na{} & 97.21 & 97.21 & 97.16 \\
                              & \errrow{} & 9.0 & 20.1 & 6.2 & 6.3 & 4.6 \\
                              & \taurow{} & 0.83 & 0.88 & 0.91 & 0.91 & 0.93 \\
        \midrule
        \multirow{3}{*}{Polybench} & \covrow{} & 100.00 & \na{} & 99.33 & 99.33 & 99.33 \\
                                   & \errrow{} & 13.9 & 12.6 & 8.1 & 8.1 & 8.0 \\
                                   & \taurow{} & 0.47 & 0.82 & 0.88 & 0.88 & 0.90 \\
        \bottomrule
    \end{tabular}
    \caption{Comparative accuracy of IPC predictions with different frontend
    models on the Cortex A72}\label{table:a72_frontend_err}
\end{table}

As expected, the error is greatly reduced with the addition of any reasonable
frontend model ---~especially on the SPEC benchmark suite. Using the
dispatch-queues model, which models more accurately the frontend, further
reduces significantly the error rate on SPEC by 1.6 points, without
significantly increasing the $\tau_K$ coefficient. On Polybench, however, the
gains brought by the dispatch-queues model are very modest ---~only 0.1 point.

In all cases, \palmed{} with a frontend model performs significantly better
than \llvmmca{} on the Cortex A72.