Notations: introduce references

This commit is contained in:
Théophile Bastian 2024-03-28 16:11:56 +01:00
parent d3fe719105
commit 5914a5a165
4 changed files with 49 additions and 32 deletions

View file

@ -1,31 +1,47 @@
\chapter*{Notations} \chapter*{Notations}
\addcontentsline{toc}{chapter}{Notations} \addcontentsline{toc}{chapter}{Notations}
Throughout this whole document, the following notations are used. Throughout this whole document, the following non-standard notations are used.
\begin{center} \begin{center}
\begin{tabular}{c p{0.65\textwidth} p{0.15\textwidth}} \begin{tabular}{c p{0.65\textwidth} p{0.15\textwidth}}
\toprule \toprule
\textbf{Notation} & \textbf{Meaning} & \textbf{(See also)} \\ \textbf{Notation} & \textbf{Meaning} & \textbf{(See also)} \\
\midrule \midrule
$\cyc{\kerK}$ & $\cyc{\kerK}$
Reciprocal throughput of $\kerK$, in cycles per occurrence of $\kerK$. & Reciprocal throughput of $\kerK$, in cycles per occurrence of
& §\ref{def:cyc_kerK} \\ $\kerK$.
$\cycB{\kerK}$ & & §\ref{def:cyc_kerK} \\
Reciprocal throughput of $\kerK$ if it was only limited by the $\cycmes{\kerK}{n}$
& Measured reciprocal throughput of $\kerK$, over $n$ iterations of
$\kerK$. When there is no ambiguity and $n$ is sufficiently large,
we often write $\cyc{\kerK}$ instead.
& §\ref{def:cycmes_kerK} \\
$\cycB{\kerK}$
& Reciprocal throughput of $\kerK$ if it was only limited by the
CPU's backend. CPU's backend.
& \qtodo{ref} \\ & §\ref{def:cycB} \\
$\cycF{\kerK}$ & $\cycF{\kerK}$
Reciprocal throughput of $\kerK$ if it was only limited by the & Reciprocal throughput of $\kerK$ if it was only limited by the
CPU's frontend. CPU's frontend.
& \qtodo{ref} \\ & §\ref{def:cycF} \\
$\kerK^n$ & $C(\kerK)$
$\kerK$ repeated $n$ times. & Number of cycles of a kernel $\kerK$.
& §\ref{not:kerK_N} \\ & §\ref{def:ker_cycles} \\
$\mucount{}i$ & $\kerK^n$
Number of \uops{} the instruction $i$ is decoded into. This can be & $\kerK$ repeated $n$ times.
extended to a kernel: $\mucount{}\kerK$. & §\ref{not:kerK_N} \\
& \qtodo{ref} \\ $\operatorname{IPC}(\kerK)$
& Instructions Per Cycle in the execution of the kernel $\kerK$, in
steady state, averaged.
& §\ref{def:ipc} \\
$\mucount{}i$
& Number of \uops{} the instruction $i$ is decoded into. This can
be extended to a kernel: $\mucount{}\kerK$.
& §\ref{def:mucount} \\
$\tau_K$
& Kendall's $\tau$ coefficient of correlation.
& §\ref{ssec:palmed_eval_metrics}, \cite{kendalltau} \\
\bottomrule \bottomrule
\end{tabular} \end{tabular}
\end{center} \end{center}

View file

@ -297,7 +297,7 @@ define this notion here more formally.
of $\kerK$ concatenated $n$ times. of $\kerK$ concatenated $n$ times.
\end{notation} \end{notation}
\begin{definition}[$C(\kerK)$] \begin{definition}[$C(\kerK)$]\label{def:ker_cycles}
The \emph{number of cycles} of a kernel $\kerK$ is defined, \emph{in The \emph{number of cycles} of a kernel $\kerK$ is defined, \emph{in
steady-state}, as the number of elapsed cycles from the moment the first steady-state}, as the number of elapsed cycles from the moment the first
instruction of $\kerK$ starts to be decoded to the moment the last instruction of $\kerK$ starts to be decoded to the moment the last
@ -474,7 +474,7 @@ stead.
\medskip \medskip
\begin{definition}[Throughput of a kernel] \begin{definition}[Throughput of a kernel]\label{def:ipc}
The \emph{throughput} of a kernel $\kerK$, measured in \emph{instructions The \emph{throughput} of a kernel $\kerK$, measured in \emph{instructions
per cycle}, or IPC, is defined as the number of instructions in $\kerK$, divided per cycle}, or IPC, is defined as the number of instructions in $\kerK$, divided
by the steady-state execution time of $\kerK$: by the steady-state execution time of $\kerK$:
@ -486,7 +486,7 @@ stead.
In the literature or in analyzers' reports, the throughput of a kernel is often In the literature or in analyzers' reports, the throughput of a kernel is often
referred to as its \emph{IPC} (its unit). referred to as its \emph{IPC} (its unit).
\begin{notation}[Experimental measure of $\cyc{\kerK}$] \begin{notation}[Experimental measure of $\cyc{\kerK}$]\label{def:cycmes_kerK}
We note $\cycmes{\kerK}{n}$ the experimental measure of $\kerK$, realized We note $\cycmes{\kerK}{n}$ the experimental measure of $\kerK$, realized
by: by:
\begin{itemize} \begin{itemize}

View file

@ -48,7 +48,7 @@ To evaluate \palmed{}, the same kernel is run:
The raw results are saved (as a Python \pymodule{pickle} file) for reuse and The raw results are saved (as a Python \pymodule{pickle} file) for reuse and
archival. archival.
\subsection{Metrics extracted} \subsection{Metrics extracted}\label{ssec:palmed_eval_metrics}
As \palmed{} internally works with Instructions Per Cycle (IPC) metrics, and as As \palmed{} internally works with Instructions Per Cycle (IPC) metrics, and as
all these tools are also able to provide results in IPC, the most natural all these tools are also able to provide results in IPC, the most natural

View file

@ -66,17 +66,18 @@ distinction.
For each of these ports, we note $\basic{p}$ the basic instruction for For each of these ports, we note $\basic{p}$ the basic instruction for
port \texttt{p}; \eg{}, $\basic{Int01}$ is \lstarmasm{ADC_RD_X_RN_X_RM_X}. port \texttt{p}; \eg{}, $\basic{Int01}$ is \lstarmasm{ADC_RD_X_RN_X_RM_X}.
\paragraph{Counting the micro-ops of an instruction.} There are three main \paragraph{Counting the micro-ops of an
sources of bottleneck for a kernel $\kerK$: backend, frontend and dependencies. instruction.}\label{def:cycB}\label{def:cycF}\label{def:mucount} There are
When measuring the execution time with \pipedream{}, we eliminate (as far as three main sources of bottleneck for a kernel $\kerK$: backend, frontend and
possible) the dependencies, leaving us with only backend and frontend. We note dependencies. When measuring the execution time with \pipedream{}, we
$\cycF{\kerK}$ the execution time of $\kerK$ if it was only limited by its eliminate (as far as possible) the dependencies, leaving us with only backend
frontend, and $\cycB{\kerK}$ the execution time of $\kerK$ if it was only and frontend. We note $\cycF{\kerK}$ the execution time of $\kerK$ if it was
limited by its backend. If we consider a kernel $\kerK$ that is simple enough only limited by its frontend, and $\cycB{\kerK}$ the execution time of $\kerK$
to exhibit a purely linear frontend behaviour ---~that is, the frontend's if it was only limited by its backend. If we consider a kernel $\kerK$ that is
throughput is a linear function of the number of \uops{} in the kernel~---, we simple enough to exhibit a purely linear frontend behaviour ---~that is, the
then know that either $\cyc{\kerK} = \cycF{\kerK}$ or $\cyc{\kerK} = frontend's throughput is a linear function of the number of \uops{} in the
\cycB{\kerK}$. kernel~---, we then know that either $\cyc{\kerK} = \cycF{\kerK}$ or
$\cyc{\kerK} = \cycB{\kerK}$.
For a given instruction $i$ and for a certain $k \in \nat$, we then construct a For a given instruction $i$ and for a certain $k \in \nat$, we then construct a
kernel $\kerK_k$ kernel $\kerK_k$