First version of Palmed chapter
This commit is contained in:
parent
d4e0272e9e
commit
276e1d50c2
1 changed files with 62 additions and 1 deletions
|
@ -165,6 +165,8 @@ instructions found in the original basic blocks.
|
|||
|
||||
\subsection{Results}
|
||||
|
||||
\input{40-1_results_fig.tex}
|
||||
|
||||
We run the evaluation harness on three different machines:
|
||||
\begin{itemize}
|
||||
\item{} an x86-64 Intel \texttt{SKL-SP}-based machine, with two Intel Xeon Silver
|
||||
|
@ -190,4 +192,63 @@ closer a prediction is to the red horizontal line, the more accurate it is.
|
|||
|
||||
These results are analyzed in the full article~\cite{palmed}.
|
||||
|
||||
\input{40-1_results_fig.tex}
|
||||
\section{Other contributions}
|
||||
|
||||
\paragraph{Using a database to enhance reproducibility and usability.}
|
||||
\palmed{}'s method is driven by a large number of \pipedream{} benchmarks. For
|
||||
instance, generating a mapping for an x86-64 machine requires the execution of
|
||||
about $10^6$ benchmarks on the CPU\@.
|
||||
|
||||
Each of these measures takes time: the multiset of instructions must be
|
||||
transformed into an assembly code, including the register mapping phrase; this
|
||||
assembly must be assembled and linked into an ELF file; and finally, the
|
||||
benchmark must be actually executed, with multiple warm-up rounds and multiple
|
||||
measures. On average, on the \texttt{SKL-SP} CPU, each benchmark requires half
|
||||
to two-thirds of a second on a single core. The whole benchmarking phase, on
|
||||
the \texttt{SKL-SP} processor, roughly took eight hours.
|
||||
|
||||
\medskip{}
|
||||
|
||||
As \palmed{} relies on the Gurobi optimizer, which is itself non-deterministic,
|
||||
\palmed{} cannot be made truly reproducible. However, the slight fluctuations
|
||||
in measured cycles between two executions of a benchmark are also a source of
|
||||
non-determinism in the execution of Palmed.
|
||||
|
||||
\medskip{}
|
||||
|
||||
For both these reasons, we implemented into \palmed{} a database-backed storage of
|
||||
measurements. Whenever \palmed{} needs to measure a kernel, it will first try
|
||||
to find a corresponding measure in the database; if the measure does not exist
|
||||
yet, it will be run, then stored in database.
|
||||
|
||||
For each measure, we further store for context:
|
||||
the time and date at which the measure was made;
|
||||
the machine on which the measure was made;
|
||||
how many times the measure was repeated;
|
||||
how many warm-up rounds were performed;
|
||||
how many instructions were in the unrolled loop;
|
||||
how many instructions were executed per repetition in total;
|
||||
the parameters for \pipedream{}'s assembly generation procedure;
|
||||
how the final result was aggregated from the repeated measures;
|
||||
the variance of the set of measures;
|
||||
how many CPU cores were active when the measure was made;
|
||||
which CPU core was used for this measure;
|
||||
whether the kernel's scheduler was set to FIFO mode.
|
||||
|
||||
\bigskip{}
|
||||
|
||||
We believe that, as a whole, the use of a database increases the usability of
|
||||
\palmed{}: it is faster if some measures were already made in the past and
|
||||
recovers better upon error.
|
||||
|
||||
This also gives us a better confidence towards our results: we can easily
|
||||
archive and backup our experimental data, and we can easily trace the origin of
|
||||
a measure if needed. We can also reuse the exact same measures between two runs
|
||||
of \palmed{}, to ensure that the results are as consistent as possible.
|
||||
|
||||
|
||||
\paragraph{General engineering contributions.} Apart from purely scientific
|
||||
contributions, we worked on improving \palmed{} as a whole, from the
|
||||
engineering point of view: code quality; reliable parallel measurements;
|
||||
recovery upon error; logging; \ldots{} These improvements amount to about a
|
||||
hundred merge-requests between \nderumig{} and myself.
|
||||
|
|
Loading…
Reference in a new issue