\section{A dive into processors' microarchitecture} A modern computer can roughly be broken down into a number of functional parts: a processor, a general-purpose computation unit; accelerators, such as GPUs, computation units specialized on specific tasks; memory, both volatile but fast (RAM) and persistent but slower (SSD, HDD); hardware specialized for interfacing, such as networks cards or USB controllers; power supplies, responsible for providing smoothed, adequate electric power to the previous components. This manuscript will largely focus on the processor. While some of the techniques described here might possibly be used for accelerators, we did not experiment in this direction, nor are we aware of efforts in this direction. \subsection{High-level abstraction of processors} A processor, in its coarsest view, is simply a piece of hardware that can be fed with a flow of instructions, which will, each after the other, modify the machine's internal state. The processor's state, the available instructions themselves and their effect on the state are defined by an \emph{Instruction Set Architecture}, or ISA\@. These instructions are represented, at a human-readable level, by \emph{assembly code}, but are transcribed, or \emph{assembled}, to a binary representation in order to be fed to the processor. This state, generally, is composed of a small number of \emph{registers}, small pieces of memory on which the processor can directly operate ---~to perform arithmetic operations, index the main memory, etc. It is also composed of the whole memory hierarchy, including the persistent memory, the main memory (usually RAM) and the hierarchy of caches between the processor and the main memory. This state can also be extended to encompass external effects, such as networks communication, peripherals, etc. The way an ISA is implemented, in order for the instructions to alter the state as specified, is called a microarchitecture. Many microarchitectures can implement the same ISA, as it is the case for instance with the x86-64 ISA, implemented both by Intel and AMD, each with multiple generations, which translates into multiple microarchitectures. \subsection{Microarchitectures} While many different ISAs are available and used, and even many more microarchitectures are industrially implemented and widely distributed, some generalities still hold for the vast majority of processors found in commercial or server-grade computers. Such a generic view is obviously an approximation and will miss many details and specificities; it should, however, be sufficient for the purposes of this manuscript. A microarchitecture can be broken down into a few functional blocks, shown in \qtodo{fig}, roughly amounting to a \emph{frontend}, a \emph{backend}, a \emph{register file} and multiple \emph{data caches}. \medskip{} \paragraph{Frontend.} The frontend is responsible for fetching the flow of instruction bytes to be executed, decode them into instructions, and dispatch them to execution units. \paragraph{Backend.} The backend is composed of \emph{execution ports}, which act as gateways to the actual \emph{execution units}. Those units are responsible for the actual computations made by the processor.