Palmed: Start writing evaluation

2023-09-15 18:05:48 +02:00 · 2023-09-15 18:05:48 +02:00 · 0683fefbef
commit 0683fefbef
parent 6e6581b322
9 changed files with 186 additions and 5 deletions
--- a/manuscrit/30_palmed/40_palmed_results.tex
+++ b/manuscrit/30_palmed/40_palmed_results.tex
@ -0,0 +1,102 @@
+\section{Main contribution: evaluating \palmed{}}
+
+The main contribution I made to \palmed{} is its evaluation harness and
+procedure. \todo{}
+
+\subsection{Basic blocks from benchmark suites}
+
+Models generated by \palmed{} are meant to be used on basic blocks that are
+computationally intensive ---~so that the backend is actually the relevant
+resource to monitor, compared to \eg{} frontend- or input/output-bound code~---,
+running in steady-state ---~that is, which is the body of a loop long enough to
+be reasonably considered infinite for performance modelling purposes. The basic
+blocks used to evaluate \palmed{} should thus be reasonably close from these
+criteria.
+
+Some tools, such as \pmevo{}~\cite{PMEvo}, use randomly-sampled basic blocks
+for their evaluation. This approach, however, may yield basic blocks that do
+not fit in those criteria; furthermore, it may not be representative of
+real-life code on which the users of the tool expect it to be accurate.
+
+For this reason, we evaluate \palmed{} on basic blocks extracted from
+two well-known benchmark suites instead: Polybench and SPEC CPU 2017.
+
+\paragraph{Polybench} is a suite of benchmarks built out of 30 kernels of
+numerical computation~\cite{bench:polybench}. Its benchmarks are
+domain-specific and centered around scientific computation, mathematical
+computation, image processing, etc. As the computation kernels are
+clearly identifiable in the source code, extracting the relevant basic blocks
+is easy, and fits well for our purpose. It is written in C language. Although
+it is not under a free/libre software license, it is free to use and
+open-source.
+
+We compile multiple versions of each benchmark (\texttt{-O2}, \texttt{-O3} and
+tiled using the Pluto optimizer~\cite{tool:pluto}), then extract the basic
+block corresponding to the benchmarks' kernels using \qemu~\cite{tool:qemu},
+gathering translation blocks and occurrence statistics.
+
+\paragraph{SPEC CPU 2017} is a suite of benchmarks meant to be CPU
+intensive~\cite{bench:spec}. It is composed of both integer and floating-point
+based benchmarks, extracted from (mainly open source) real-world software, such
+as \texttt{gcc}, \texttt{imagemagick}, \ldots{} Its main purpose is to obtain
+metrics and compare CPUs on a unified workload; it is however commonly used
+throughout the literature to evaluate compilers, optimizers, code analyzers,
+etc. It is split into four variants: integer and floating-point, combined with
+speed ---~time to perform a single task~--- and rate ---~throughput for
+performing a flow of tasks. Most benchmarks exist in both speed and rate mode.
+The SPEC suite is under a paid license, and cannot be redistributed, which
+makes peer-review and replication of experiments ---~\eg{} for artifact
+review~--- complicated.
+
+In the case of SPEC, there is no clear kernel available for each benchmark;
+extracting basic blocks to evaluate \palmed{} is not trivial. We manually
+extract the relevant basic blocks using a profiling-based approach with Linux
+\perf{}, as the \qemu{}-based solution used for Polybench would be too costly
+for SPEC\@. We automatize and describe this method in detail later in
+\qtodo{ref}.
+
+\bigskip{}
+
+Altogether, this method generates, for x86-64 processors, 13\,778 SPEC-based
+and 2\,664 polybench-based basic blocks.
+
+\subsection{Evaluation harness}
+
+We implement into \palmed{} an evaluation harness to evaluate it both against
+native measurement and other code analyzers.
+
+We first strip each basic block gathered of its dependencies to fall into the
+use-case of \palmed{} using \pipedream{}, as we did previously. This yields
+assembly code that can be run and measured natively. The body of the most
+nested loop can also be used as an assembly basic block for other code
+analyzers.
+However, as \pipedream{}
+does not support some instructions (control flow, x86-64 divisions, \ldots),
+those are stripped from the original kernel, which might denature the original
+basic block.
+
+To evaluate \palmed{}, the same kernel is run:
+
+\begin{enumerate}
+
+\item{} natively on each CPU, using the \pipedream{} harness to measure its
+    execution time;
+
+\item{} using the resource mapping \palmed{} produced on the evaluation machine;
+
+\item{} using the \uopsinfo{}~\cite{uopsinfo} port mapping, converted to its
+    equivalent conjunctive resource mapping\footnote{When this evaluation was
+    made, \uica{}~\cite{uica} was not yet published. Since \palmed{} provides a
+    resource mapping, the comparison to \uopsinfo{} is fair.};
+
+\item{} using \pmevo~\cite{PMEvo}, ignoring any instruction not supported by
+    its provided mapping;
+
+\item{} using \iaca~\cite{iaca}, by inserting assembly markers around the
+    kernel and running the tool;
+
+\item{} using \llvmmca~\cite{llvm-mca}, by inserting markers in the
+    \pipedream{}-generated assembly code and running the tool.
+\end{enumerate}
+
+% TODO: metrics extracted
--- a/manuscrit/30_palmed/50_other_contributions.tex
+++ b/manuscrit/30_palmed/50_other_contributions.tex
--- a/manuscrit/30_palmed/main.tex
+++ b/manuscrit/30_palmed/main.tex
@ -4,3 +4,5 @@
 \input{10_resource_models.tex}
 \input{20_palmed_design.tex}
 \input{30_pipedream.tex}
+\input{40_palmed_results.tex}
+\input{50_other_contributions.tex}
--- a/manuscrit/biblio/bench_suites.bib
+++ b/manuscrit/biblio/bench_suites.bib
@ -4,3 +4,44 @@
    note={\url{http://polybench.sf.net}},
    year={2016}
 }
+
+@inproceedings{bench:spec,
+  author    = {James Bucek and
+               Klaus{-}Dieter Lange and
+               J{\'{o}}akim von Kistowski},
+  editor    = {Katinka Wolter and
+               William J. Knottenbelt and
+               Andr{\'{e}} van Hoorn and
+               Manoj Nambiar},
+  title     = {{SPEC CPU2017}: Next-Generation Compute Benchmark},
+  booktitle = {Companion of the 2018 {ACM/SPEC} International Conference on Performance
+               Engineering, {ICPE} 2018},
+  pages     = {41--42},
+  publisher = {{ACM}},
+  month     = {April},
+  year      = {2018},
+  location  = {Berlin, Germany},
+  url       = {https://doi.org/10.1145/3185768.3185771},
+  doi       = {10.1145/3185768.3185771},
+  timestamp = {Wed, 21 Nov 2018 12:44:17 +0100},
+  biburl    = {https://dblp.org/rec/conf/wosp/BucekLK18.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+
+@inproceedings{carmot,
+author = {Deiana, Enrico Armenio and Suchy, Brian and Wilkins, Michael and Homerding, Brian and McMichen, Tommy and Dunajewski, Katarzyna and Dinda, Peter and Hardavellas, Nikos and Campanoni, Simone},
+title = {Program State Element Characterization},
+year = {2023},
+isbn = {9798400701016},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/3579990.3580011},
+doi = {10.1145/3579990.3580011},
+abstract = {Modern programming languages offer abstractions that simplify software development and allow hardware to reach its full potential. These abstractions range from the well-established OpenMP language extensions to newer C++ features like smart pointers. To properly use these abstractions in an existing codebase, programmers must determine how a given source code region interacts with Program State Elements (PSEs) (i.e., the program's variables and memory locations). We call this process Program State Element Characterization (PSEC). Without tool support for PSEC, a programmer's only option is to manually study the entire codebase. We propose a profile-based approach that automates PSEC and provides abstraction recommendations to programmers. Because a profile-based approach incurs an impractical overhead, we introduce the Compiler and Runtime Memory Observation Tool (CARMOT), a PSEC-specific compiler co-designed with a parallel runtime. CARMOT reduces the overhead of PSEC by two orders of magnitude, making PSEC practical. We show that CARMOT's recommendations achieve the same speedup as hand-tuned OpenMP directives and avoid memory leaks with C++ smart pointers. From this, we argue that PSEC tools, such as CARMOT, can provide support for the rich ecosystem of modern programming language abstractions.},
+booktitle = {Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization},
+pages = {199–211},
+numpages = {13},
+keywords = {program characterization, dynamic analysis, code optimization},
+location = {Montr\'{e}al, QC, Canada},
+series = {CGO 2023}
+}
--- a/manuscrit/biblio/code_analyzers.bib
+++ b/manuscrit/biblio/code_analyzers.bib
@ -138,3 +138,24 @@
    bibsource = {dblp computer science bibliography, https://dblp.org}
 }

+@inproceedings{PMEvo,
+  author    = {Fabian Ritter and
+               Sebastian Hack},
+  editor    = {Alastair F. Donaldson and
+               Emina Torlak},
+  title     = {PMEvo: portable inference of port mappings for out-of-order processors
+               by evolutionary optimization},
+  booktitle = {Proceedings of the 41st {ACM} {SIGPLAN} International Conference on
+               Programming Language Design and Implementation, {PLDI} 2020},
+  pages     = {608--622},
+  publisher = {{ACM}},
+  address   = {New York, USA},
+  location  = {London, UK},
+  month     = {June},
+  year      = {2020},
+  url       = {https://doi.org/10.1145/3385412.3385995},
+  doi       = {10.1145/3385412.3385995},
+  timestamp = {Tue, 09 Jun 2020 13:52:54 +0200},
+  biburl    = {https://dblp.org/rec/conf/pldi/0002H20.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
--- a/manuscrit/biblio/tools.bib
+++ b/manuscrit/biblio/tools.bib
@ -67,3 +67,9 @@
    year={1999}
 }

+@misc{tool:qemu,
+	title={{QEMU}: the {FAST!} processor emulator},
+	author={{QEMU}},
+	howpublished={\url{https://www.qemu.org}}
+}
+
--- a/manuscrit/include/biblio.tex
+++ b/manuscrit/include/biblio.tex
@ -1,4 +1,4 @@
-\usepackage[backend=biber,maxbibnames=10,style=numeric,sorting=none,defernumbers=true]{biblatex}
+\usepackage[backend=biber,maxbibnames=10,style=alphabetic,sorting=anyt,defernumbers=true]{biblatex}
 \addbibresource{biblio/bench_suites.bib}
 \addbibresource{biblio/code_analyzers.bib}
 \addbibresource{biblio/ecology.bib}
--- a/manuscrit/include/macros.tex
+++ b/manuscrit/include/macros.tex
@ -9,7 +9,6 @@

 \newcommand{\cyc}[1]{\overline{#1}}

-\newcommand{\uopsinfo}{\texttt{uops.info}}

 % Names
 \newcommand{\fgruber}{Fabian \textsc{Gruber}}
@ -18,5 +17,12 @@

 % Programs
 \newcommand{\papi}{\texttt{PAPI}}
-\newcommand{\pipedream}{Pipedream}
-\newcommand{\palmed}{Palmed}
+\newcommand{\perf}{\texttt{perf}}
+\newcommand{\qemu}{\texttt{QEMU}}
+\newcommand{\iaca}{\texttt{IACA}}
+\newcommand{\llvmmca}{\texttt{llvm-mca}}
+\newcommand{\uopsinfo}{\texttt{uops.info}}
+\newcommand{\uica}{\texttt{uiCA}}
+\newcommand{\pipedream}{\texttt{Pipedream}}
+\newcommand{\palmed}{\texttt{Palmed}}
+\newcommand{\pmevo}{\texttt{PMEvo}}
--- a/manuscrit/include/packages.tex
+++ b/manuscrit/include/packages.tex
@ -16,12 +16,15 @@
 \usepackage{listings}
 \usepackage{hyperref}
 %\usepackage{shorttoc}
-\usepackage{enumitem}
+\usepackage{enumerate}
 \usepackage{lmodern}
 \usepackage{graphicx}
 \usepackage{pdfpages}
 \usepackage{import}
 \usepackage[bottom]{footmisc}  % footnotes are below floats
+\usepackage[final]{microtype}
+
+\emergencystretch=1em

 % Local sty files
 \usepackage{include/my_listings}