First version of Palmed chapter

2023-09-18 17:53:06 +02:00 · 2023-09-18 17:53:06 +02:00 · 276e1d50c2
commit 276e1d50c2
parent d4e0272e9e
1 changed files with 62 additions and 1 deletions
--- a/manuscrit/30_palmed/40_palmed_results.tex
+++ b/manuscrit/30_palmed/40_palmed_results.tex
@ -165,6 +165,8 @@ instructions found in the original basic blocks.

 \subsection{Results}

+\input{40-1_results_fig.tex}
+
 We run the evaluation harness on three different machines:
 \begin{itemize}
    \item{} an x86-64 Intel \texttt{SKL-SP}-based machine, with two Intel Xeon Silver
@ -190,4 +192,63 @@ closer a prediction is to the red horizontal line, the more accurate it is.

 These results are analyzed in the full article~\cite{palmed}.

-\input{40-1_results_fig.tex}
+\section{Other contributions}
+
+\paragraph{Using a database to enhance reproducibility and usability.}
+\palmed{}'s method is driven by a large number of \pipedream{} benchmarks. For
+instance, generating a mapping for an x86-64 machine requires the execution of
+about $10^6$ benchmarks on the CPU\@.
+
+Each of these measures takes time: the multiset of instructions must be
+transformed into an assembly code, including the register mapping phrase; this
+assembly must be assembled and linked into an ELF file; and finally, the
+benchmark must be actually executed, with multiple warm-up rounds and multiple
+measures. On average, on the \texttt{SKL-SP} CPU, each benchmark requires half
+to two-thirds of a second on a single core. The whole benchmarking phase, on
+the \texttt{SKL-SP} processor, roughly took eight hours.
+
+\medskip{}
+
+As \palmed{} relies on the Gurobi optimizer, which is itself non-deterministic,
+\palmed{} cannot be made truly reproducible. However, the slight fluctuations
+in measured cycles between two executions of a benchmark are also a source of
+non-determinism in the execution of Palmed.
+
+\medskip{}
+
+For both these reasons, we implemented into \palmed{} a database-backed storage of
+measurements. Whenever \palmed{} needs to measure a kernel, it will first try
+to find a corresponding measure in the database; if the measure does not exist
+yet, it will be run, then stored in database.
+
+For each measure, we further store for context:
+the time and date at which the measure was made;
+the machine on which the measure was made;
+how many times the measure was repeated;
+how many warm-up rounds were performed;
+how many instructions were in the unrolled loop;
+how many instructions were executed per repetition in total;
+the parameters for \pipedream{}'s assembly generation procedure;
+how the final result was aggregated from the repeated measures;
+the variance of the set of measures;
+how many CPU cores were active when the measure was made;
+which CPU core was used for this measure;
+whether the kernel's scheduler was set to FIFO mode.
+
+\bigskip{}
+
+We believe that, as a whole, the use of a database increases the usability of
+\palmed{}: it is faster if some measures were already made in the past and
+recovers better upon error.
+
+This also gives us a better confidence towards our results: we can easily
+archive and backup our experimental data, and we can easily trace the origin of
+a measure if needed. We can also reuse the exact same measures between two runs
+of \palmed{}, to ensure that the results are as consistent as possible.
+
+
+\paragraph{General engineering contributions.} Apart from purely scientific
+contributions, we worked on improving \palmed{} as a whole, from the
+engineering point of view: code quality; reliable parallel measurements;
+recovery upon error; logging; \ldots{} These improvements amount to about a
+hundred merge-requests between \nderumig{} and myself.