phd-thesis/manuscrit/20_foundations/10_cpu_arch.tex

90 lines
4.7 KiB
TeX
Raw Normal View History

2023-10-22 23:04:34 +02:00
\section{A dive into processors' microarchitecture}
A modern computer can roughly be broken down into a number of functional parts:
a processor, a general-purpose computation unit; accelerators, such
as GPUs, computation units specialized on specific tasks; memory, both volatile
but fast (RAM) and persistent but slower (SSD, HDD); hardware specialized for
interfacing, such as networks cards or USB controllers; power supplies,
responsible for providing smoothed, adequate electric power to the previous
components.
This manuscript will largely focus on the processor. While some of the
techniques described here might possibly be used for accelerators, we did not
experiment in this direction, nor are we aware of efforts in this direction.
\subsection{High-level abstraction of processors}
A processor, in its coarsest view, is simply a piece of hardware that can be
fed with a flow of instructions, which will, each after the other, modify the
machine's internal state.
2023-11-03 17:31:01 +01:00
The processor's state, the available instructions themselves and their effect
on the state are defined by an \emph{Instruction Set Architecture}, or ISA\@;
such as x86-64 or A64 (ARM's ISA). More generally, the ISA defines how software
will interact with a given processor, including the registers available to the
programmer, the instructions' semantics ---~broadly speaking, as these are
often informal~---, etc. These instructions are represented, at a
human-readable level, by \emph{assembly code}, such as \lstxasm{add (\%rax),
\%rbx} in x86-64. Assembly code is then transcribed, or \emph{assembled}, to a
binary representation in order to be fed to the processor ---~for instance,
\lstxasm{0x480318} for the previous instruction. This instruction computes the
sum of the value held at memory address \reg{rax} and of the value \reg{rbx},
but it does not, strictly speaking, \emph{return} or \emph{produce} a result:
instead, its stores the result of the computation in register \reg{rbx},
altering the machine's state.
2023-10-22 23:04:34 +02:00
This state, generally, is composed of a small number of \emph{registers}, small
pieces of memory on which the processor can directly operate ---~to perform
arithmetic operations, index the main memory, etc. It is also composed of the
whole memory hierarchy, including the persistent memory, the main memory
(usually RAM) and the hierarchy of caches between the processor and the main
memory. This state can also be extended to encompass external effects, such as
networks communication, peripherals, etc.
The way an ISA is implemented, in order for the instructions to alter the state
as specified, is called a microarchitecture. Many microarchitectures can
implement the same ISA, as it is the case for instance with the x86-64 ISA,
implemented both by Intel and AMD, each with multiple generations, which
2023-11-03 17:31:01 +01:00
translates into multiple microarchitectures. It is thus frequent for ISAs to
have many extensions, which each microarchitecture may or may not implement.
2023-10-22 23:04:34 +02:00
\subsection{Microarchitectures}
2023-11-03 17:31:01 +01:00
\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{cpu_big_picture.svg}
\caption{Simplified and generalized global representation of a CPU
microarchitecture}\label{fig:cpu_big_picture}
\end{figure}
2023-10-22 23:04:34 +02:00
While many different ISAs are available and used, and even many more
microarchitectures are industrially implemented and widely distributed, some
generalities still hold for the vast majority of processors found in commercial
or server-grade computers. Such a generic view is obviously an approximation
and will miss many details and specificities; it should, however, be sufficient
for the purposes of this manuscript.
A microarchitecture can be broken down into a few functional blocks, shown in
2023-11-03 17:31:01 +01:00
\autoref{fig:cpu_big_picture}, roughly amounting to a \emph{frontend}, a \emph{backend}, a
2023-11-03 17:47:11 +01:00
\emph{register file}, multiple \emph{data caches} and a \emph{retire buffer}.
2023-10-22 23:04:34 +02:00
\medskip{}
\paragraph{Frontend.} The frontend is responsible for fetching the flow of
2023-11-03 17:47:11 +01:00
instruction bytes to be executed, break it down into operations executable by
the backend and issue them to execution units.
2023-10-22 23:04:34 +02:00
\paragraph{Backend.} The backend is composed of \emph{execution ports}, which
act as gateways to the actual \emph{execution units}. Those units are
responsible for the actual computations made by the processor.
2023-11-03 17:47:11 +01:00
\paragraph{Register file.} The register file holds the processor's registers,
on which computations are made.
\paragraph{Data caches.} The cache hierarchy (usually L1, L2 and L3) caches
data rows from the main memory, whose access latency would slow computation
down by several orders of magnitude if it was accessed directly. Usually, the
L1 cache resides directly in the computation core, while the L2 and L3 caches
are shared between multiple cores.