First version of Palmed chapter
This commit is contained in:
parent
d4e0272e9e
commit
276e1d50c2
1 changed files with 62 additions and 1 deletions
|
@ -165,6 +165,8 @@ instructions found in the original basic blocks.
|
||||||
|
|
||||||
\subsection{Results}
|
\subsection{Results}
|
||||||
|
|
||||||
|
\input{40-1_results_fig.tex}
|
||||||
|
|
||||||
We run the evaluation harness on three different machines:
|
We run the evaluation harness on three different machines:
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item{} an x86-64 Intel \texttt{SKL-SP}-based machine, with two Intel Xeon Silver
|
\item{} an x86-64 Intel \texttt{SKL-SP}-based machine, with two Intel Xeon Silver
|
||||||
|
@ -190,4 +192,63 @@ closer a prediction is to the red horizontal line, the more accurate it is.
|
||||||
|
|
||||||
These results are analyzed in the full article~\cite{palmed}.
|
These results are analyzed in the full article~\cite{palmed}.
|
||||||
|
|
||||||
\input{40-1_results_fig.tex}
|
\section{Other contributions}
|
||||||
|
|
||||||
|
\paragraph{Using a database to enhance reproducibility and usability.}
|
||||||
|
\palmed{}'s method is driven by a large number of \pipedream{} benchmarks. For
|
||||||
|
instance, generating a mapping for an x86-64 machine requires the execution of
|
||||||
|
about $10^6$ benchmarks on the CPU\@.
|
||||||
|
|
||||||
|
Each of these measures takes time: the multiset of instructions must be
|
||||||
|
transformed into an assembly code, including the register mapping phrase; this
|
||||||
|
assembly must be assembled and linked into an ELF file; and finally, the
|
||||||
|
benchmark must be actually executed, with multiple warm-up rounds and multiple
|
||||||
|
measures. On average, on the \texttt{SKL-SP} CPU, each benchmark requires half
|
||||||
|
to two-thirds of a second on a single core. The whole benchmarking phase, on
|
||||||
|
the \texttt{SKL-SP} processor, roughly took eight hours.
|
||||||
|
|
||||||
|
\medskip{}
|
||||||
|
|
||||||
|
As \palmed{} relies on the Gurobi optimizer, which is itself non-deterministic,
|
||||||
|
\palmed{} cannot be made truly reproducible. However, the slight fluctuations
|
||||||
|
in measured cycles between two executions of a benchmark are also a source of
|
||||||
|
non-determinism in the execution of Palmed.
|
||||||
|
|
||||||
|
\medskip{}
|
||||||
|
|
||||||
|
For both these reasons, we implemented into \palmed{} a database-backed storage of
|
||||||
|
measurements. Whenever \palmed{} needs to measure a kernel, it will first try
|
||||||
|
to find a corresponding measure in the database; if the measure does not exist
|
||||||
|
yet, it will be run, then stored in database.
|
||||||
|
|
||||||
|
For each measure, we further store for context:
|
||||||
|
the time and date at which the measure was made;
|
||||||
|
the machine on which the measure was made;
|
||||||
|
how many times the measure was repeated;
|
||||||
|
how many warm-up rounds were performed;
|
||||||
|
how many instructions were in the unrolled loop;
|
||||||
|
how many instructions were executed per repetition in total;
|
||||||
|
the parameters for \pipedream{}'s assembly generation procedure;
|
||||||
|
how the final result was aggregated from the repeated measures;
|
||||||
|
the variance of the set of measures;
|
||||||
|
how many CPU cores were active when the measure was made;
|
||||||
|
which CPU core was used for this measure;
|
||||||
|
whether the kernel's scheduler was set to FIFO mode.
|
||||||
|
|
||||||
|
\bigskip{}
|
||||||
|
|
||||||
|
We believe that, as a whole, the use of a database increases the usability of
|
||||||
|
\palmed{}: it is faster if some measures were already made in the past and
|
||||||
|
recovers better upon error.
|
||||||
|
|
||||||
|
This also gives us a better confidence towards our results: we can easily
|
||||||
|
archive and backup our experimental data, and we can easily trace the origin of
|
||||||
|
a measure if needed. We can also reuse the exact same measures between two runs
|
||||||
|
of \palmed{}, to ensure that the results are as consistent as possible.
|
||||||
|
|
||||||
|
|
||||||
|
\paragraph{General engineering contributions.} Apart from purely scientific
|
||||||
|
contributions, we worked on improving \palmed{} as a whole, from the
|
||||||
|
engineering point of view: code quality; reliable parallel measurements;
|
||||||
|
recovery upon error; logging; \ldots{} These improvements amount to about a
|
||||||
|
hundred merge-requests between \nderumig{} and myself.
|
||||||
|
|
Loading…
Add table
Reference in a new issue