2023-10-22 23:04:34 +02:00
|
|
|
|
|
|
|
\section{A dive into processors' microarchitecture}
|
|
|
|
|
|
|
|
A modern computer can roughly be broken down into a number of functional parts:
|
|
|
|
a processor, a general-purpose computation unit; accelerators, such
|
|
|
|
as GPUs, computation units specialized on specific tasks; memory, both volatile
|
|
|
|
but fast (RAM) and persistent but slower (SSD, HDD); hardware specialized for
|
|
|
|
interfacing, such as networks cards or USB controllers; power supplies,
|
|
|
|
responsible for providing smoothed, adequate electric power to the previous
|
|
|
|
components.
|
|
|
|
|
|
|
|
This manuscript will largely focus on the processor. While some of the
|
|
|
|
techniques described here might possibly be used for accelerators, we did not
|
|
|
|
experiment in this direction, nor are we aware of efforts in this direction.
|
|
|
|
|
|
|
|
\subsection{High-level abstraction of processors}
|
|
|
|
|
|
|
|
A processor, in its coarsest view, is simply a piece of hardware that can be
|
|
|
|
fed with a flow of instructions, which will, each after the other, modify the
|
|
|
|
machine's internal state.
|
|
|
|
|
2023-11-03 17:31:01 +01:00
|
|
|
The processor's state, the available instructions themselves and their effect
|
|
|
|
on the state are defined by an \emph{Instruction Set Architecture}, or ISA\@;
|
|
|
|
such as x86-64 or A64 (ARM's ISA). More generally, the ISA defines how software
|
|
|
|
will interact with a given processor, including the registers available to the
|
|
|
|
programmer, the instructions' semantics ---~broadly speaking, as these are
|
|
|
|
often informal~---, etc. These instructions are represented, at a
|
|
|
|
human-readable level, by \emph{assembly code}, such as \lstxasm{add (\%rax),
|
|
|
|
\%rbx} in x86-64. Assembly code is then transcribed, or \emph{assembled}, to a
|
|
|
|
binary representation in order to be fed to the processor ---~for instance,
|
|
|
|
\lstxasm{0x480318} for the previous instruction. This instruction computes the
|
|
|
|
sum of the value held at memory address \reg{rax} and of the value \reg{rbx},
|
|
|
|
but it does not, strictly speaking, \emph{return} or \emph{produce} a result:
|
|
|
|
instead, its stores the result of the computation in register \reg{rbx},
|
|
|
|
altering the machine's state.
|
2023-10-22 23:04:34 +02:00
|
|
|
|
|
|
|
This state, generally, is composed of a small number of \emph{registers}, small
|
|
|
|
pieces of memory on which the processor can directly operate ---~to perform
|
|
|
|
arithmetic operations, index the main memory, etc. It is also composed of the
|
|
|
|
whole memory hierarchy, including the persistent memory, the main memory
|
|
|
|
(usually RAM) and the hierarchy of caches between the processor and the main
|
|
|
|
memory. This state can also be extended to encompass external effects, such as
|
|
|
|
networks communication, peripherals, etc.
|
|
|
|
|
|
|
|
The way an ISA is implemented, in order for the instructions to alter the state
|
|
|
|
as specified, is called a microarchitecture. Many microarchitectures can
|
|
|
|
implement the same ISA, as it is the case for instance with the x86-64 ISA,
|
|
|
|
implemented both by Intel and AMD, each with multiple generations, which
|
2023-11-03 17:31:01 +01:00
|
|
|
translates into multiple microarchitectures. It is thus frequent for ISAs to
|
|
|
|
have many extensions, which each microarchitecture may or may not implement.
|
2023-10-22 23:04:34 +02:00
|
|
|
|
|
|
|
\subsection{Microarchitectures}
|
|
|
|
|
2023-11-03 17:31:01 +01:00
|
|
|
\begin{figure}
|
|
|
|
\centering
|
|
|
|
\includegraphics[width=0.9\textwidth]{cpu_big_picture.svg}
|
|
|
|
\caption{Simplified and generalized global representation of a CPU
|
|
|
|
microarchitecture}\label{fig:cpu_big_picture}
|
|
|
|
\end{figure}
|
|
|
|
|
2023-10-22 23:04:34 +02:00
|
|
|
While many different ISAs are available and used, and even many more
|
|
|
|
microarchitectures are industrially implemented and widely distributed, some
|
|
|
|
generalities still hold for the vast majority of processors found in commercial
|
|
|
|
or server-grade computers. Such a generic view is obviously an approximation
|
|
|
|
and will miss many details and specificities; it should, however, be sufficient
|
|
|
|
for the purposes of this manuscript.
|
|
|
|
|
|
|
|
A microarchitecture can be broken down into a few functional blocks, shown in
|
2023-11-03 17:31:01 +01:00
|
|
|
\autoref{fig:cpu_big_picture}, roughly amounting to a \emph{frontend}, a \emph{backend}, a
|
2023-11-03 17:47:11 +01:00
|
|
|
\emph{register file}, multiple \emph{data caches} and a \emph{retire buffer}.
|
2023-10-22 23:04:34 +02:00
|
|
|
|
|
|
|
\medskip{}
|
|
|
|
|
|
|
|
\paragraph{Frontend.} The frontend is responsible for fetching the flow of
|
2023-11-03 17:47:11 +01:00
|
|
|
instruction bytes to be executed, break it down into operations executable by
|
|
|
|
the backend and issue them to execution units.
|
2023-10-22 23:04:34 +02:00
|
|
|
|
|
|
|
\paragraph{Backend.} The backend is composed of \emph{execution ports}, which
|
|
|
|
act as gateways to the actual \emph{execution units}. Those units are
|
|
|
|
responsible for the actual computations made by the processor.
|
2023-11-03 17:47:11 +01:00
|
|
|
|
|
|
|
\paragraph{Register file.} The register file holds the processor's registers,
|
|
|
|
on which computations are made.
|
|
|
|
|
|
|
|
\paragraph{Data caches.} The cache hierarchy (usually L1, L2 and L3) caches
|
|
|
|
data rows from the main memory, whose access latency would slow computation
|
|
|
|
down by several orders of magnitude if it was accessed directly. Usually, the
|
|
|
|
L1 cache resides directly in the computation core, while the L2 and L3 caches
|
|
|
|
are shared between multiple cores.
|