60 lines
2.8 KiB
TeX
60 lines
2.8 KiB
TeX
\section{Other contributions}
|
|
|
|
\paragraph{Using a database to enhance reproducibility and usability.}
|
|
\palmed{}'s method is driven by a large number of \pipedream{} benchmarks. For
|
|
instance, generating a mapping for an x86-64 machine requires the execution of
|
|
about $10^6$ benchmarks on the CPU\@.
|
|
|
|
Each of these measures takes time: the multiset of instructions must be
|
|
transformed into an assembly code, including the register mapping phrase; this
|
|
assembly must be assembled and linked into an ELF file; and finally, the
|
|
benchmark must be actually executed, with multiple warm-up rounds and multiple
|
|
measures. On average, on the \texttt{SKL-SP} CPU, each benchmark requires half
|
|
to two-thirds of a second on a single core. The whole benchmarking phase, on
|
|
the \texttt{SKL-SP} processor, roughly took eight hours.
|
|
|
|
\medskip{}
|
|
|
|
As \palmed{} relies on the Gurobi optimizer, which is itself non-deterministic,
|
|
\palmed{} cannot be made truly reproducible. However, the slight fluctuations
|
|
in measured cycles between two executions of a benchmark are also a major
|
|
source of non-determinism in the execution of Palmed.
|
|
|
|
\medskip{}
|
|
|
|
For both these reasons, we implemented into \palmed{} a database-backed storage of
|
|
measurements. Whenever \palmed{} needs to measure a kernel, it will first try
|
|
to find a corresponding measure in the database; if the measure does not exist
|
|
yet, it will be run, then stored in database.
|
|
|
|
For each measure, we further store for context:
|
|
the time and date at which the measure was made;
|
|
the machine on which the measure was made;
|
|
how many times the measure was repeated;
|
|
how many warm-up rounds were performed;
|
|
how many instructions were in the unrolled loop;
|
|
how many instructions were executed per repetition in total;
|
|
the parameters for \pipedream{}'s assembly generation procedure;
|
|
how the final result was aggregated from the repeated measures;
|
|
the variance of the set of measures;
|
|
how many CPU cores were active when the measure was made;
|
|
which CPU core was used for this measure;
|
|
whether the kernel's scheduler was set to FIFO mode.
|
|
|
|
\bigskip{}
|
|
|
|
We believe that, as a whole, the use of a database increases the usability of
|
|
\palmed{}: it is faster if some measures were already made in the past and
|
|
recovers better upon error.
|
|
|
|
This also gives us a better confidence towards our results: we can easily
|
|
archive and backup our experimental data, and we can easily trace the origin of
|
|
a measure if needed. We can also reuse the exact same measures between two runs
|
|
of \palmed{}, to ensure that the results are as consistent as possible.
|
|
|
|
|
|
\paragraph{General engineering contributions.} Apart from purely scientific
|
|
contributions, we worked on improving \palmed{} as a whole, from the
|
|
engineering point of view: code quality; reliable parallel measurements;
|
|
recovery upon error; logging; \ldots{} These improvements amount to about a
|
|
hundred merge-requests between \nderumig{} and myself.
|