phd-thesis/manuscrit/30_palmed/50_other_contributions.tex

\section{Other contributions}

\paragraph{Using a database to enhance reproducibility and usability.}
\palmed{}'s method is driven by a large number of \pipedream{} benchmarks. For
instance, generating a mapping for an x86-64 machine requires the execution of
about $10^6$ benchmarks on the CPU\@.

Each of these measures takes time: the multiset of instructions must be
transformed into an assembly code, including the register mapping phrase; this
assembly must be assembled and linked into an ELF file; and finally, the
benchmark must be actually executed, with multiple warm-up rounds and multiple
measures. On average, on the \texttt{SKL-SP} CPU, each benchmark requires half
to two-thirds of a second on a single core. The whole benchmarking phase, on
the \texttt{SKL-SP} processor, roughly took eight hours.

\medskip{}

As \palmed{} relies on the Gurobi optimizer, which is itself non-deterministic,
\palmed{} cannot be made truly reproducible. However, the slight fluctuations
in measured cycles between two executions of a benchmark are also a major
source of non-determinism in the execution of Palmed.

\medskip{}

For both these reasons, we implemented into \palmed{} a database-backed storage of
measurements. Whenever \palmed{} needs to measure a kernel, it will first try
to find a corresponding measure in the database; if the measure does not exist
yet, it will be run, then stored in database.

For each measure, we further store for context:
the time and date at which the measure was made;
the machine on which the measure was made;
how many times the measure was repeated;
how many warm-up rounds were performed;
how many instructions were in the unrolled loop;
how many instructions were executed per repetition in total;
the parameters for \pipedream{}'s assembly generation procedure;
how the final result was aggregated from the repeated measures;
the variance of the set of measures;
how many CPU cores were active when the measure was made;
which CPU core was used for this measure;
whether the kernel's scheduler was set to FIFO mode.

\bigskip{}

We believe that, as a whole, the use of a database increases the usability of
\palmed{}: it is faster if some measures were already made in the past and
recovers better upon error.

This also gives us a better confidence towards our results: we can easily
archive and backup our experimental data, and we can easily trace the origin of
a measure if needed. We can also reuse the exact same measures between two runs
of \palmed{}, to ensure that the results are as consistent as possible.


\paragraph{General engineering contributions.} Apart from purely scientific
contributions, we worked on improving \palmed{} as a whole, from the
engineering point of view: code quality; reliable parallel measurements;
recovery upon error; logging; \ldots{} These improvements amount to about a
hundred merge-requests between \nderumig{} and myself.