phd-thesis/manuscrit/30_palmed/40_palmed_results.tex

103 lines
4.8 KiB
TeX
Raw Normal View History

2023-09-15 18:05:48 +02:00
\section{Main contribution: evaluating \palmed{}}
The main contribution I made to \palmed{} is its evaluation harness and
procedure. \todo{}
\subsection{Basic blocks from benchmark suites}
Models generated by \palmed{} are meant to be used on basic blocks that are
computationally intensive ---~so that the backend is actually the relevant
resource to monitor, compared to \eg{} frontend- or input/output-bound code~---,
running in steady-state ---~that is, which is the body of a loop long enough to
be reasonably considered infinite for performance modelling purposes. The basic
blocks used to evaluate \palmed{} should thus be reasonably close from these
criteria.
Some tools, such as \pmevo{}~\cite{PMEvo}, use randomly-sampled basic blocks
for their evaluation. This approach, however, may yield basic blocks that do
not fit in those criteria; furthermore, it may not be representative of
real-life code on which the users of the tool expect it to be accurate.
For this reason, we evaluate \palmed{} on basic blocks extracted from
two well-known benchmark suites instead: Polybench and SPEC CPU 2017.
\paragraph{Polybench} is a suite of benchmarks built out of 30 kernels of
numerical computation~\cite{bench:polybench}. Its benchmarks are
domain-specific and centered around scientific computation, mathematical
computation, image processing, etc. As the computation kernels are
clearly identifiable in the source code, extracting the relevant basic blocks
is easy, and fits well for our purpose. It is written in C language. Although
it is not under a free/libre software license, it is free to use and
open-source.
We compile multiple versions of each benchmark (\texttt{-O2}, \texttt{-O3} and
tiled using the Pluto optimizer~\cite{tool:pluto}), then extract the basic
block corresponding to the benchmarks' kernels using \qemu~\cite{tool:qemu},
gathering translation blocks and occurrence statistics.
\paragraph{SPEC CPU 2017} is a suite of benchmarks meant to be CPU
intensive~\cite{bench:spec}. It is composed of both integer and floating-point
based benchmarks, extracted from (mainly open source) real-world software, such
as \texttt{gcc}, \texttt{imagemagick}, \ldots{} Its main purpose is to obtain
metrics and compare CPUs on a unified workload; it is however commonly used
throughout the literature to evaluate compilers, optimizers, code analyzers,
etc. It is split into four variants: integer and floating-point, combined with
speed ---~time to perform a single task~--- and rate ---~throughput for
performing a flow of tasks. Most benchmarks exist in both speed and rate mode.
The SPEC suite is under a paid license, and cannot be redistributed, which
makes peer-review and replication of experiments ---~\eg{} for artifact
review~--- complicated.
In the case of SPEC, there is no clear kernel available for each benchmark;
extracting basic blocks to evaluate \palmed{} is not trivial. We manually
extract the relevant basic blocks using a profiling-based approach with Linux
\perf{}, as the \qemu{}-based solution used for Polybench would be too costly
for SPEC\@. We automatize and describe this method in detail later in
\qtodo{ref}.
\bigskip{}
Altogether, this method generates, for x86-64 processors, 13\,778 SPEC-based
and 2\,664 polybench-based basic blocks.
\subsection{Evaluation harness}
We implement into \palmed{} an evaluation harness to evaluate it both against
native measurement and other code analyzers.
We first strip each basic block gathered of its dependencies to fall into the
use-case of \palmed{} using \pipedream{}, as we did previously. This yields
assembly code that can be run and measured natively. The body of the most
nested loop can also be used as an assembly basic block for other code
analyzers.
However, as \pipedream{}
does not support some instructions (control flow, x86-64 divisions, \ldots),
those are stripped from the original kernel, which might denature the original
basic block.
To evaluate \palmed{}, the same kernel is run:
\begin{enumerate}
\item{} natively on each CPU, using the \pipedream{} harness to measure its
execution time;
\item{} using the resource mapping \palmed{} produced on the evaluation machine;
\item{} using the \uopsinfo{}~\cite{uopsinfo} port mapping, converted to its
equivalent conjunctive resource mapping\footnote{When this evaluation was
made, \uica{}~\cite{uica} was not yet published. Since \palmed{} provides a
resource mapping, the comparison to \uopsinfo{} is fair.};
\item{} using \pmevo~\cite{PMEvo}, ignoring any instruction not supported by
its provided mapping;
\item{} using \iaca~\cite{iaca}, by inserting assembly markers around the
kernel and running the tool;
\item{} using \llvmmca~\cite{llvm-mca}, by inserting markers in the
\pipedream{}-generated assembly code and running the tool.
\end{enumerate}
% TODO: metrics extracted