\section{Generating microbenchmarks}\label{sec:bench_gen} Our framework aims to generate \emph{microbenchmarks} relevant to a specific domain. A microbenchmark is a code that is as simplified as possible to expose the behaviour under consideration. The specified computations should be representative of the considered domain, and at the same time they should stress the different aspects of the target architecture ---~which is modeled by code analyzers. In practice, a microbenchmark's \textit{computational kernel} is a simple \texttt{for} loop, whose body contains no loops and whose bounds are statically known. A \emph{measure} is a number of repetitions $n$ of this computational kernel, $n$ being an user-specified parameter. The measure may be repeated an arbitrary number of times to improve stability. Furthermore, such a microbenchmark should be a function whose computation happens without leaving the L1 cache. This requirement helps measurements and analyses to be undisturbed by memory accesses, but it is also a matter of comparability. Indeed, most of the static analyzers make the assumption that the code under consideration is L1-resident; if it is not, their results are meaningless, and can not be compared with an actual measurement. The generation of such microbenchmarks is achieved through four distinct components, whose parameter variations are specified in configuration files~: a benchmark suite, C-to-C loop nest optimizers, a constraining utility and a C-to-binary compiler. \subsection{Benchmark suite}\label{ssec:bench_suite} Our first component is an initial set of benchmarks which materializes the human expertise we intend to exploit for the generation of relevant codes. The considered suite must embed computation kernels delimited by ad-hoc \texttt{\#pragma}s, whose arrays are accessed directly (no indirections) and whose loops are affine. These constraints are necessary to ensure that the microkernelification phase, presented below, generates segfault-free code. In this case, we use Polybench~\cite{bench:polybench}, a suite of 30 benchmarks for polyhedral compilation ---~of which we use only 26. The \texttt{nussinov}, \texttt{ludcmp} and \texttt{deriche} benchmarks are removed because they are incompatible with PoCC (introduced below). The \texttt{lu} benchmark is left out as its execution alone takes longer than all others together, making its dynamic analysis (\eg{} with \gus) impractical. In addition to the importance of linear algebra within it, one of its important features is that it does not include computational kernels with conditional control flow (\eg{} \texttt{if-then-else}) ---~however, it does includes conditional data flow, using the ternary conditional operator of C. \subsection{C-to-C loop nest optimizers}\label{ssec:loop_nest_optimizer} Loop nest optimizers transform an initial benchmark in different ways (generate different \textit{versions} of the same benchmark), varying the stress on resources of the target architecture, and by extension the models on which the static analyzers are based. In this case, we chose to use the \textsc{Pluto}~\cite{tool:pluto} and PoCC~\cite{tool:pocc} polyhedral compilers, to easily access common loop nest optimizations~: register tiling, tiling, skewing, vectorization/simdization, loop unrolling, loop permutation, loop fusion. These transformations are meant to maximize variety within the initial benchmark suite. Eventually, the generated benchmarks are expected to highlight the impact on performance of the resulting behaviours. For instance, \textit{skewing} introduces non-trivial pointer arithmetics, increasing the pressure on address computation units~; \textit{loop unrolling}, among many things, opens the way to register promotion, which exposes dependencies and alleviates load-store units~; \textit{vectorization} stresses SIMD units and decreases pressure on the front-end~; and so on. \subsection{Constraining utility}\label{ssec:kernelify} A constraining utility transforms the code in order to respect an arbitrary number of non-functional properties. In this case, we apply a pass of \emph{microkernelification}: we extract a computational kernel from the arbitrarily deep and arbitrarily long loop nest generated by the previous component. The loop chosen to form the microkernel is the one considered to be the \textit{hottest}; the \textit{hotness} of a loop being obtained by multiplying the number of arithmetic operations it contains by the number of times it is iterated. This metric allows us to prioritize the parts of the code that have the greatest impact on performance. At this point, the resulting code can compute a different result from the initial code; for instance, the composition of tiling and kernelification reduces the number of loop iterations. Indeed, our framework is not meant to preserve the functional semantics of the benchmarks. Our goal is only to generate codes that are relevant from the point of view of performance analysis. \subsection{C-to-binary compiler}\label{ssec:compile} A C-to-binary compiler varies binary optimization options by enabling/disabling auto-vectorization, extended instruction sets, \textit{etc}. We use \texttt{gcc}. \bigskip Eventually, the relevance of the microbenchmarks set generated using this approach derives not only from initial benchmark suite and the relevance of the transformations chosen at each stage, but also from the combinatorial explosion generated by the composition of the four stages. In our experimental setup, this yields up to 144 microbenchmarks per benchmark of the original suite.