From 103e6a06874f1100b691db67801152482db5df0b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Th=C3=A9ophile=20Bastian?= <contact@tobast.fr> Date: Sun, 1 Sep 2024 16:38:12 +0200 Subject: [PATCH] Minor fixes --- manuscrit/60_staticdeps/10_types_of_deps.tex | 4 +++- manuscrit/60_staticdeps/30_static_principle.tex | 2 +- manuscrit/60_staticdeps/35_rob_proof.tex | 8 +++++--- 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/manuscrit/60_staticdeps/10_types_of_deps.tex b/manuscrit/60_staticdeps/10_types_of_deps.tex index 7897b6c..10630e1 100644 --- a/manuscrit/60_staticdeps/10_types_of_deps.tex +++ b/manuscrit/60_staticdeps/10_types_of_deps.tex @@ -37,7 +37,9 @@ physical registers than what is exposed to the user through the ISA; a renaming phrase will allocate those registers to avoid the effects of WaR and WaW dependencies as much as possible. -\smallskip{} For the present chapter, \textit{we only consider read-after-write +\smallskip{} + +For the present chapter, \textit{we only consider read-after-write dependencies}; however, all the techniques we present are applicable to other types of dependencies if the considered architecture requires to take them into account. diff --git a/manuscrit/60_staticdeps/30_static_principle.tex b/manuscrit/60_staticdeps/30_static_principle.tex index 862cf30..eaa4e55 100644 --- a/manuscrit/60_staticdeps/30_static_principle.tex +++ b/manuscrit/60_staticdeps/30_static_principle.tex @@ -11,7 +11,7 @@ detected by keeping track of which instruction last wrote each register in a Loop-carried dependencies can, to some extent, be detected the same way. As the basic block is always assumed to be the body of an infinite loop, a straight-line analysis can be performed on a duplicated kernel. This strategy -is \eg{} adopted by \osaca{}~\cite{osaca2} (§II.D). When dealing only with +is \eg{} adopted by \osaca{}~\cite[§II.D]{osaca2}. When dealing only with register accesses, this strategy is always sufficient: as each iteration always executes the same basic block, it is not possible for an instruction to depend on another instruction two iterations earlier or more. diff --git a/manuscrit/60_staticdeps/35_rob_proof.tex b/manuscrit/60_staticdeps/35_rob_proof.tex index 7e0f4bf..8ad85a2 100644 --- a/manuscrit/60_staticdeps/35_rob_proof.tex +++ b/manuscrit/60_staticdeps/35_rob_proof.tex @@ -20,10 +20,11 @@ performance}\label{ssec:staticdeps:rob_proof} instructions $\left(I_{p+kn}, I_{q+kn}\right)_{k\in\nat}$. \end{theorem} -To prove this assertion, we require a few postulates describing the functioning -of a CPU and, in particular, how \uops{} transit in (decoded) and out (retired) -the reorder buffer. +To prove this assertion, we require a few postulates to model the functioning +of a CPU and, in particular, how \uops{} transit in (decoded) and out +(retired) the reorder buffer. +\needspace{10em} \begin{postulate}[Reorder buffer as a circular buffer] The reorder buffer is a circular buffer of size $R \in \nat^+$. It contains only decoded \uops{}. @@ -90,6 +91,7 @@ Consequently, if both $i_q$ and $i_{q'}$ are in-flight then $|q'-q|<R$, and thus $\distance{I_p}{I_{p'}} < R - 1$. \end{proof} +\needspace{15em} \begin{postulate}[Issue delay] Reasons why the issue of a \uop{} $i$ is delayed can be: \begin{enumerate}