phd-thesis/manuscrit/20_foundations/10_cpu_arch.tex


\section{A dive into processors' microarchitecture}

A modern computer can roughly be broken down into a number of functional parts:
a processor, a general-purpose computation unit; accelerators, such
as GPUs, computation units specialized on specific tasks; memory, both volatile
but fast (RAM) and persistent but slower (SSD, HDD); hardware specialized for
interfacing, such as networks cards or USB controllers; power supplies,
responsible for providing smoothed, adequate electric power to the previous
components.

This manuscript will largely focus on the processor. While some of the
techniques described here might possibly be used for accelerators, we did not
experiment in this direction, nor are we aware of efforts in this direction.

\subsection{High-level abstraction of processors}

A processor, in its coarsest view, is simply a piece of hardware that can be
fed with a flow of instructions, which will, each after the other, modify the
machine's internal state.

The processor's state, the available instructions themselves and their effect
on the state are defined by an \emph{Instruction Set Architecture}, or ISA\@;
such as x86-64 or A64 (ARM's ISA). More generally, the ISA defines how software
will interact with a given processor, including the registers available to the
programmer, the instructions' semantics ---~broadly speaking, as these are
often informal~---, etc.  These instructions are represented, at a
human-readable level, by \emph{assembly code}, such as \lstxasm{add (\%rax),
\%rbx} in x86-64. Assembly code is then transcribed, or \emph{assembled}, to a
binary representation in order to be fed to the processor ---~for instance,
\lstxasm{0x480318} for the previous instruction. This instruction computes the
sum of the value held at memory address \reg{rax} and of the value \reg{rbx},
but it does not, strictly speaking, \emph{return} or \emph{produce} a result:
instead, its stores the result of the computation in register \reg{rbx},
altering the machine's state.

This state, generally, is composed of a small number of \emph{registers}, small
pieces of memory on which the processor can directly operate ---~to perform
arithmetic operations, index the main memory, etc. It is also composed of the
whole memory hierarchy, including the persistent memory, the main memory
(usually RAM) and the hierarchy of caches between the processor and the main
memory. This state can also be extended to encompass external effects, such as
networks communication, peripherals, etc.

The way an ISA is implemented, in order for the instructions to alter the state
as specified, is called a microarchitecture. Many microarchitectures can
implement the same ISA, as it is the case for instance with the x86-64 ISA,
implemented both by Intel and AMD, each with multiple generations, which
translates into multiple microarchitectures. It is thus frequent for ISAs to
have many extensions, which each microarchitecture may or may not implement.

\subsection{Microarchitectures}

\begin{figure}
    \centering
    \includegraphics[width=0.9\textwidth]{cpu_big_picture.svg}
    \caption{Simplified and generalized global representation of a CPU
    microarchitecture}\label{fig:cpu_big_picture}
\end{figure}

While many different ISAs are available and used, and even many more
microarchitectures are industrially implemented and widely distributed, some
generalities still hold for the vast majority of processors found in commercial
or server-grade computers. Such a generic view is obviously an approximation
and will miss many details and specificities; it should, however, be sufficient
for the purposes of this manuscript.

A microarchitecture can be broken down into a few functional blocks, shown in
\autoref{fig:cpu_big_picture}, roughly amounting to a \emph{frontend}, a \emph{backend}, a
\emph{register file}, multiple \emph{data caches} and a \emph{retire buffer}.

\medskip{}

\paragraph{Frontend.} The frontend is responsible for fetching the flow of
instruction bytes to be executed, break it down into operations executable by
the backend and issue them to execution units.

\paragraph{Backend.} The backend is composed of \emph{execution ports}, which
act as gateways to the actual \emph{execution units}. Those units are
responsible for the actual computations made by the processor.

\paragraph{Register file.} The register file holds the processor's registers,
on which computations are made.

\paragraph{Data caches.} The cache hierarchy (usually L1, L2 and L3) caches
data rows from the main memory, whose access latency would slow computation
down by several orders of magnitude if it was accessed directly. Usually, the
L1 cache resides directly in the computation core, while the L2 and L3 caches
are shared between multiple cores.
Foundations: microarch 2023-10-22 23:04:34 +02:00
			`\section{A dive into processors' microarchitecture}`

			`A modern computer can roughly be broken down into a number of functional parts:`
			`a processor, a general-purpose computation unit; accelerators, such`
			`as GPUs, computation units specialized on specific tasks; memory, both volatile`
			`but fast (RAM) and persistent but slower (SSD, HDD); hardware specialized for`
			`interfacing, such as networks cards or USB controllers; power supplies,`
			`responsible for providing smoothed, adequate electric power to the previous`
			`components.`

			`This manuscript will largely focus on the processor. While some of the`
			`techniques described here might possibly be used for accelerators, we did not`
			`experiment in this direction, nor are we aware of efforts in this direction.`

			`\subsection{High-level abstraction of processors}`

			`A processor, in its coarsest view, is simply a piece of hardware that can be`
			`fed with a flow of instructions, which will, each after the other, modify the`
			`machine's internal state.`

Foundations: CPU big picture image 2023-11-03 17:31:01 +01:00			`The processor's state, the available instructions themselves and their effect`
			`on the state are defined by an \emph{Instruction Set Architecture}, or ISA\@;`
			`such as x86-64 or A64 (ARM's ISA). More generally, the ISA defines how software`
			`will interact with a given processor, including the registers available to the`
			`programmer, the instructions' semantics ---~broadly speaking, as these are`
			`often informal~---, etc. These instructions are represented, at a`
			`human-readable level, by \emph{assembly code}, such as \lstxasm{add (\%rax),`
			`\%rbx} in x86-64. Assembly code is then transcribed, or \emph{assembled}, to a`
			`binary representation in order to be fed to the processor ---~for instance,`
			`\lstxasm{0x480318} for the previous instruction. This instruction computes the`
			`sum of the value held at memory address \reg{rax} and of the value \reg{rbx},`
			`but it does not, strictly speaking, \emph{return} or \emph{produce} a result:`
			`instead, its stores the result of the computation in register \reg{rbx},`
			`altering the machine's state.`
Foundations: microarch 2023-10-22 23:04:34 +02:00
			`This state, generally, is composed of a small number of \emph{registers}, small`
			`pieces of memory on which the processor can directly operate ---~to perform`
			`arithmetic operations, index the main memory, etc. It is also composed of the`
			`whole memory hierarchy, including the persistent memory, the main memory`
			`(usually RAM) and the hierarchy of caches between the processor and the main`
			`memory. This state can also be extended to encompass external effects, such as`
			`networks communication, peripherals, etc.`

			`The way an ISA is implemented, in order for the instructions to alter the state`
			`as specified, is called a microarchitecture. Many microarchitectures can`
			`implement the same ISA, as it is the case for instance with the x86-64 ISA,`
			`implemented both by Intel and AMD, each with multiple generations, which`
Foundations: CPU big picture image 2023-11-03 17:31:01 +01:00			`translates into multiple microarchitectures. It is thus frequent for ISAs to`
			`have many extensions, which each microarchitecture may or may not implement.`
Foundations: microarch 2023-10-22 23:04:34 +02:00
			`\subsection{Microarchitectures}`

Foundations: CPU big picture image 2023-11-03 17:31:01 +01:00			`\begin{figure}`
			`\centering`
			`\includegraphics[width=0.9\textwidth]{cpu_big_picture.svg}`
			`\caption{Simplified and generalized global representation of a CPU`
			`microarchitecture}\label{fig:cpu_big_picture}`
			`\end{figure}`

Foundations: microarch 2023-10-22 23:04:34 +02:00			`While many different ISAs are available and used, and even many more`
			`microarchitectures are industrially implemented and widely distributed, some`
			`generalities still hold for the vast majority of processors found in commercial`
			`or server-grade computers. Such a generic view is obviously an approximation`
			`and will miss many details and specificities; it should, however, be sufficient`
			`for the purposes of this manuscript.`

			`A microarchitecture can be broken down into a few functional blocks, shown in`
Foundations: CPU big picture image 2023-11-03 17:31:01 +01:00			`\autoref{fig:cpu_big_picture}, roughly amounting to a \emph{frontend}, a \emph{backend}, a`
Foundations: slightly more writeup 2023-11-03 17:47:11 +01:00			`\emph{register file}, multiple \emph{data caches} and a \emph{retire buffer}.`
Foundations: microarch 2023-10-22 23:04:34 +02:00
			`\medskip{}`

			`\paragraph{Frontend.} The frontend is responsible for fetching the flow of`
Foundations: slightly more writeup 2023-11-03 17:47:11 +01:00			`instruction bytes to be executed, break it down into operations executable by`
			`the backend and issue them to execution units.`
Foundations: microarch 2023-10-22 23:04:34 +02:00
			`\paragraph{Backend.} The backend is composed of \emph{execution ports}, which`
			`act as gateways to the actual \emph{execution units}. Those units are`
			`responsible for the actual computations made by the processor.`
Foundations: slightly more writeup 2023-11-03 17:47:11 +01:00
			`\paragraph{Register file.} The register file holds the processor's registers,`
			`on which computations are made.`

			`\paragraph{Data caches.} The cache hierarchy (usually L1, L2 and L3) caches`
			`data rows from the main memory, whose access latency would slow computation`
			`down by several orders of magnitude if it was accessed directly. Usually, the`
			`L1 cache resides directly in the computation core, while the L2 and L3 caches`
			`are shared between multiple cores.`